Running a Curated Model

NATASHA: have the container instructions checked

From the model repository

For a step-by-step example of how run LGBM see: quickstart.

1. Clone the model and set up the environment

  1. IMPROVE model repositories can be found here: `https://github.com/JDACS4C-IMPROVE<https://github.com/JDACS4C-IMPROVE>`_. Clone the model repo with:

git clone <model_repo>
cd <model_repo>
git checkout v0.1.0-alpha
  1. Set up the environment according to the instructions. The IMPROVE library is installed with:

pip install improvelib

2. Download the benchmark data

To download Drug Response Prediction benchmark data, run the following in the directory where you would like to store the data:

wget --cut-dirs=8 -P ./ -nH -np -m https://web.cels.anl.gov/projects/IMPROVE_FTP/candle/public/improve/benchmarks/single_drug_drp/benchmark-data-pilot1/csa_data/

3. Run the model scripts

By default everything will be saved in the current working directory. This, and all other parameters can be changed by setting parameters on the command line. See IMPROVE API for more information about parameters.

python <model>_preprocess_improve.py --input_dir /path/to/dir/csa_data/raw_data
python <model>_train_improve.py
python <model>_infer_improve.py

From a container

NATASHA: have someone check and edit this

General Form:

singularity exec –nv –bind $HOST_DATA_DIR:/candle_data_dir $IIL/$IMAGE $CMD $ARGS

Use Case 1: A custom model file is provided as an argument to train.sh

Example:

singularity exec --nv --bind /home/brettin/improve_data_dir:/candle_data_dir \
     /software/improve/images/GraphDRP:0.0.1-20221109.sif                    \
     train.sh 2 /candle_data_dir GraphDRP/graphdrp_default_model.txt
Options

Parameters

Description

–nv

This option instructs Singularity to use the host NVidia drivers and cuda libraries.

–bind

This option instructs Singularity to mount a host file system inside the container. The mount point inside the container will always be candle_data_dir.

GraphDRP:0.0.1-20221109.sif

This is the Singularity image that contains the GraphDRP model code.

train.sh

This is the command that runs inside the container. All containers will have this command. All arguments to follow are passed to train.sh.

2

This is the host GPU device id to be used by code running inside the container. The value is used to set the CUDA_VISIBLE_DEVICES environment variable inside the container.

candle_data_dir

The value of the CANDLE_DATA_DIR environment variable that is set inside the container. This enables candle_lib code running inside the container to rewrite input and output paths so that inputs and outputs will be saved on the host’s file system.

GraphDRP/graphdrp_default_model.txt

This is a custom model file created and saved on the host’s filesystem. The location of the file is relative to CANDLE_DATA_DIR

Use Case 2: Train a model with the default model file.

Example:

singularity exec --nv --bind /home/brettin/candle_data_dir:/candle_data_dir \
    /home/brettin/Singularity/sandboxes/GraphDRP-GraphDRP:0.0.1-20221028    \
    train.sh 2 /candle_data_dir

Because the number of positional arguments to train.sh is 2, the default model file in the model directory inside the container is used. It still expects /candle_data_dir to be available inside the container as /candle_data_dir is prepended to input and output paths. The default model file would be the one used to closely reproduce the original authors results.

Use Case 3: Train a model using supported command line arguments

In this case, the default model file in the model directory inside the container is read first, and then command line arguments passed to train.sh are used. In the event that both the default model file and the command line arguments contain the same parameter, the command line parameter’s value will take precedence. The first two positional arguments to train.sh must still always be the GPU device id and the mount point inside the container as specified by the –bind option.

In this use case, train.sh shifts the two positional arguments $1 and $2 and then simply pass the remaining args in $@ to the model python script.