Ensuring the Curated Model is IMPROVE-compliant
Following is a list of requirements for a curated model to be IMPROVE-compliant. Adhering to these requirements will allow the model to function with IMPROVE workflows.
Required files and their naming conventions
A list of files required for compliance, along with their standardized naming patterns:
<model>_preprocess_improve.py<model>_train_improve.py<model>_infer_improve.py<model>_params.inimodel_params_def.py<model>_environment.yml
All three stage scripts (preprocess, train, infer)
Each of the three stage scripts uses a main() function that:
Sets params with the appropriate stage/application config
Records execution time of
run()usingfrom improvelib.utils import Timer.Calls
run(), which executes the stage-specific code and returns: - Preprocessing:output_dir- Training:val_scores- Inference:True
Preprocessing
Use the appropriate data loader for your application.
Save the X data (features).
Save the Y data using
save_stage_ydf().
Training
Save the model using a path determined by
build_model_path().Save validation predictions with
store_predictions_df().Save validation scores using
compute_performance_scores().Implement early stopping and use the
patienceparameter.Use a GPU by default, if available.
Inference
Load the model using a path determined by
build_model_path().Save test predictions with
store_predictions_df().Optionally save test scores using
compute_performance_scores()ifcalc_infer_scoresis set toTrue.Use a GPU by default, if available.
Parameters and configuration file
Avoid duplicating IMPROVE-defined parameters While you can define model-specific parameters, do not redefine those already established by IMPROVE. Refer to the IMPROVE-defined parameters for preprocess, train, and infer as needed.
Use IMPROVE-defined parameters for consistent handling If your model relies on an IMPROVE-defined parameter, access it via the IMPROVE
paramsrather than creating a separate variable or using a hard-coded value. For example, useparams['patience']instead of introducing a new model-specific parameter (e.g.,early_stop) or typing a value directly (e.g.,100). Additionally, any model-specific parameter that users may need to adjust should integrate with the IMPROVE parameter handling system.Set unused IMPROVE-defined parameters to :code:`None` If your model does not require a particular IMPROVE-defined parameter, set it to
Nonein the config file. This ensures clarity about which parameters are actually used.
Model repository
README - Follow the template to ensure a unified structure.
Include a :code:`setup_improve.sh` script - Base it on this template. - Ensure
download_csa.shis present in the repo. - If the model requires supplemental data (not included in the benchmark data), it should be downloaded via a shell script present in the repo, and this script should be integrated insetup_improve.sh- In the preprocessing stage, use theinput_supp_data_dirparameter to specify the default location wheresetup_improve.shplaces any supplemental data.