Workflow Overview

IMPROVE workflows utilize a clear and consistent three-level script structure designed to integrate seamlessly with high-performance computing (HPC) environments. This structured approach allows IMPROVE to run models efficiently, maintain clear separation of dependencies, and ensure reproducibility.

Overview of the 3-Level Structure

Every workflow in IMPROVE is organized into three distinct script types:

Launcher (Python)
- Serves as the main entry point for initiating a workflow.
- Directly interacts with the HPC management framework (e.g., Parsl, DeepHyper).
- Coordinates high-level workflow tasks, distributing them across available resources.
Example:
python workflow_csa.py
Runner (Bash)
- Called by the Launcher script.
- Activates the isolated computational environment (e.g., conda, container) specific to the predictive model.
- Executes the Stage Scripts within this isolated environment.
Example:
bash run_model.sh
Stage Script (Python)
- Performs the actual computational work for the stage (e.g., preprocess, train, infer).
- Runs within a model-specific isolated environment, free from HPC dependencies.
Example:
python graphdrp_train_improve.py

Visual Representation

The following schematic illustrates the interactions between these scripts:

Launcher (workflow_csa.py)
│
└─→ Runner (run_model.sh)
│
└─→ Stage Script (graphdrp_train_improve.py)

Why This Structure?

The separation into these three levels ensures:

Clarity: Each script type has a clearly defined role.

Isolation: Dependencies for models and HPC tools are separated to avoid conflicts.

Flexibility: Easily adapts to various HPC environments and tools without modifying the underlying model scripts.

Next Steps

Explore individual workflow guides: