How to use Improse

Once you have installed Improse, you can type:

improse --help

to find the available commands and required parameters to run Improse.

Improse demo

To run a demo using Random Forest model and validate it using 10-fold cross-validation, you can type:

improse --demo

This will save the results in the current working directory with a folder named Improse_results. If you wish to save the results in a specific folder, you can type:

improse --demo --output ~/path/to/your/folder

Select model

Improse comes with six state-of-the-art machine learning models including Random Forest (RF), Support Vector Machines (SVM), K-Nearest Neighbor (kNN), AdaBoost (AB), Decision Tree (DT) and Naive Bayes (NB). Random Forest is the default model.

To select model you need to type:

improse --model MODEL_NAME

MODEL_NAME can be rf, svm, knn, ab, dt, nb or use all if you want use all models one by one.

Define features and feature subsets

To tell the model to use specific features you need to type:

improse --model svm --feature H3K27ac,Brd4,p300,pGC

Make sure the features names are coma separated.

If you want to compare the individual predictive power or combinatorial predictive power of different features, you need to pass the argument --compare with --features:

mprose --model svm --feature H3K27ac,Brd4,p300,pGC --compare

To check the combinatorial predictive power of features, you need to combine features with + symbol:

improse --model svm --feature H3K27ac+Brd4,p300,pGC+pAT --compare

Here model will test the combinatorial predictive power [H3K27ac,Brd4] and [pGC,pAT] along with p300.

Run model with cross-validation

By default all models use 10-fold cross-validation. If you want to set different fold lets say 5, set --cv parameter as:

improse --model rf --feature H3K27ac,Brd4,p300,pGC,pAT,phastCons --cv 5

Run model with test data

To run the model with a test data you need the feature data saved a CSV file. Next, you need to tell the model, features you have to make prediction with using --feature and also provide the CSV file to ‘–input’ and next type --test to tell model it is test datasets:

improse --model rf --feature H3K27ac,Brd4,p300,pGC,pAT,phastCons --input ~/path/to/CSV/file.csv --test

This will generate an ROC plot and save the performance evaluations [precision, recall, f1-score, AUC, PRC] to Improse_tesults.txt.

Make predictions

To make predictions should have computed available features and saved a CSV file. Next, you need to tell the model the features you have to make prediction with using --feature and also provide the CSV file to --input and next type --pred to make predictions:

improse --model rf --feature H3K27ac,Brd4,p300,pGC,pAT,phastCons --input ~/path/to/CSV/file.csv --pred

This will save the predictions results as CSV file Improse_[MODEL_NAME]_predictions.csv. In the CSV file the field Class is 1=SE and 0=TE. We also report probability score for each prediction to tell the user how good and bad a prediction is. This will help to decide which candidates to select for further analysis.