Postprocessing of Learning Curve Analysis (LCA)
This script produces aggregated LCA scores, run times, and plots LCA mean absolute error curves when given the output directory from any of the IMPROVE LCA workflows (bruteforce, swarm).
Requirements
seaborn (for plotting only)
matplotlib (for plotting only)
a directory containing LCA output
Installation and Setup
Create the IMPROVE general environment:
conda create -n IMPROVE python=3.6
conda activate IMPROVE
pip install improvelib
If you wish to use the included plotting functionality, install seaborn and matplotlib:
conda install seaborn matplotlib
Parameter Configuration
This workflow uses command line parameters. The first (positional) parameter (runtimes
, lca_scores
, plot_learning_curve
, or whole_analysis
) specifies the analysis to run. Other optional parameters are as follows:
--input_dir
: Path to the LCA results (default:'./'
).--output_dir
: Path to the directory where the postprocessing will be saved (default:'./'
).--y_col_name
: The y_col_name intest_y_data_predicted.csv
(default:'auc'
).--metric_type
: Metric type to use (default:'regression'
).--model_name
: Name of the model, if you would like it saved in the file name / data / title of the plot (default:None
).--dataset
: Name of the dataset, if you would like it saved in the file name / data / title of the plot (default:None
).
Usage
To generate run-time analysis:
python lca_postprocess.py runtimes <arguments>
This will output a table runtimes.csv
in the specified output_dir
.
To generate aggregate scores:
python lca_postprocess.py lca_scores <arguments>
This will output a table all_scores.csv
in the specified output_dir
.
To generate learning curve plot:
python lca_postprocess.py plot_learning_curve <arguments>
This will use all_scores.csv
the specified output_dir
and output a plot fig.png
in the specified output_dir
.
To run all analyses:
python lca_postprocess.py whole_analysis <arguments>
This will run the run-time analysis, aggregate scores, and plot the learning curve.
Output
The processed results will be in output_dir
as follows:
output_dir/
├── <model>_<dataset>_all_scores.csv
├── <model>_<dataset>_fig.png
└── <model>_<dataset>_runtimes.csv