Ensuring the Curated Model is IMPROVE-compliant
Following is a list of requirements for a curated model to be IMPROVE-compliant. Adhering to these requirements will allow the model to function with IMPROVE workflows.
Required files and their naming conventions
A list of files required for compliance, along with their standardized naming patterns:
<model>_preprocess_improve.py
<model>_train_improve.py
<model>_infer_improve.py
<model>_params.ini
model_params_def.py
<model>_environment.yml
All three stage scripts (preprocess, train, infer)
Each of the three stage scripts uses a main()
function that:
Sets params with the appropriate stage/application config
Records execution time of
run()
usingfrom improvelib.utils import Timer
.Calls
run()
, which executes the stage-specific code and returns: - Preprocessing:output_dir
- Training:val_scores
- Inference:True
Preprocessing
Use the appropriate data loader for your application.
Save the X data (features).
Save the Y data using
save_stage_ydf()
.
Training
Save the model using a path determined by
build_model_path()
.Save validation predictions with
store_predictions_df()
.Save validation scores using
compute_performance_scores()
.Implement early stopping and use the
patience
parameter.Use a GPU by default, if available.
Inference
Load the model using a path determined by
build_model_path()
.Save test predictions with
store_predictions_df()
.Optionally save test scores using
compute_performance_scores()
ifcalc_infer_scores
is set toTrue
.Use a GPU by default, if available.
Parameters and configuration file
Avoid duplicating IMPROVE-defined parameters While you can define model-specific parameters, do not redefine those already established by IMPROVE. Refer to the IMPROVE-defined parameters for preprocess, train, and infer as needed.
Use IMPROVE-defined parameters for consistent handling If your model relies on an IMPROVE-defined parameter, access it via the IMPROVE
params
rather than creating a separate variable or using a hard-coded value. For example, useparams['patience']
instead of introducing a new model-specific parameter (e.g.,early_stop
) or typing a value directly (e.g.,100
). Additionally, any model-specific parameter that users may need to adjust should integrate with the IMPROVE parameter handling system.Set unused IMPROVE-defined parameters to :code:`None` If your model does not require a particular IMPROVE-defined parameter, set it to
None
in the config file. This ensures clarity about which parameters are actually used.
Model repository
README - Follow the template to ensure a unified structure.
Include a :code:`setup_improve.sh` script - Base it on this template. - Ensure
download_csa.sh
is present in the repo. - If the model requires supplemental data (not included in the benchmark data), it should be downloaded via a shell script present in the repo, and this script should be integrated insetup_improve.sh
- In the preprocessing stage, use theinput_supp_data_dir
parameter to specify the default location wheresetup_improve.sh
places any supplemental data.