Ensuring the Curated Model is IMPROVE-compliant

Following is a list of requirements for a curated model to be IMPROVE-compliant. Adhering to these requirements will allow the model to function with IMPROVE workflows.

Required files and their naming conventions

A list of files required for compliance, along with their standardized naming patterns:

<model>_preprocess_improve.py
<model>_train_improve.py
<model>_infer_improve.py
<model>_params.ini
model_params_def.py
<model>_environment.yml

All three stage scripts (preprocess, train, infer)

Each of the three stage scripts uses a main() function that:

Sets params with the appropriate stage/application config
Records execution time of run() using from improvelib.utils import Timer.
Calls run(), which executes the stage-specific code and returns: - Preprocessing: output_dir - Training: val_scores - Inference: True

Preprocessing

Use the appropriate data loader for your application.
Save the X data (features).
Save the Y data using save_stage_ydf().

Training

Save the model using a path determined by build_model_path().
Save validation predictions with store_predictions_df().
Save validation scores using compute_performance_scores().
Implement early stopping and use the patience parameter.
Use a GPU by default, if available.

Inference

Load the model using a path determined by build_model_path().
Save test predictions with store_predictions_df().
Optionally save test scores using compute_performance_scores() if calc_infer_scores is set to True.
Use a GPU by default, if available.

Parameters and configuration file

Avoid duplicating IMPROVE-defined parameters While you can define model-specific parameters, do not redefine those already established by IMPROVE. Refer to the IMPROVE-defined parameters for preprocess, train, and infer as needed.
Use IMPROVE-defined parameters for consistent handling If your model relies on an IMPROVE-defined parameter, access it via the IMPROVE params rather than creating a separate variable or using a hard-coded value. For example, use params['patience'] instead of introducing a new model-specific parameter (e.g., early_stop) or typing a value directly (e.g., 100). Additionally, any model-specific parameter that users may need to adjust should integrate with the IMPROVE parameter handling system.
Set unused IMPROVE-defined parameters to :code:`None` If your model does not require a particular IMPROVE-defined parameter, set it to None in the config file. This ensures clarity about which parameters are actually used.

Model repository

README - Follow the template to ensure a unified structure.
Include a :code:`setup_improve.sh` script - Base it on this template. - Ensure download_csa.sh is present in the repo. - If the model requires supplemental data (not included in the benchmark data), it should be downloaded via a shell script present in the repo, and this script should be integrated in setup_improve.sh - In the preprocessing stage, use the input_supp_data_dir parameter to specify the default location where setup_improve.sh places any supplemental data.