Postprocessing of Learning Curve Analysis (LCA)

This script produces aggregated LCA scores, run times, and plots LCA mean absolute error curves when given the output directory from any of the IMPROVE LCA workflows (bruteforce, swarm).

Requirements

Installation and Setup

Create the IMPROVE general environment:

conda create -n IMPROVE python=3.6
conda activate IMPROVE
pip install improvelib

If you wish to use the included plotting functionality, install seaborn and matplotlib:

conda install seaborn matplotlib

Parameter Configuration

This workflow uses command line parameters. The first (positional) parameter (runtimes, lca_scores, plot_learning_curve, or whole_analysis) specifies the analysis to run. Other optional parameters are as follows:

  • --input_dir: Path to the LCA results (default: './').

  • --output_dir: Path to the directory where the postprocessing will be saved (default: './').

  • --y_col_name: The y_col_name in test_y_data_predicted.csv (default: 'auc').

  • --metric_type: Metric type to use (default: 'regression').

  • --model_name: Name of the model, if you would like it saved in the file name / data / title of the plot (default: None).

  • --dataset: Name of the dataset, if you would like it saved in the file name / data / title of the plot (default: None).

Usage

To generate run-time analysis:

python lca_postprocess.py runtimes <arguments>

This will output a table runtimes.csv in the specified output_dir.

To generate aggregate scores:

python lca_postprocess.py lca_scores <arguments>

This will output a table all_scores.csv in the specified output_dir.

To generate learning curve plot:

python lca_postprocess.py plot_learning_curve <arguments>

This will use all_scores.csv the specified output_dir and output a plot fig.png in the specified output_dir.

To run all analyses:

python lca_postprocess.py whole_analysis <arguments>

This will run the run-time analysis, aggregate scores, and plot the learning curve.

Output

The processed results will be in output_dir as follows:

output_dir/
├── <model>_<dataset>_all_scores.csv
├── <model>_<dataset>_fig.png
└── <model>_<dataset>_runtimes.csv