save_stage_ydf
improvelib.utils.save_stage_ydf(ydf, stage, output_dir)
Saves a subset of y data samples. The “subset” refers to one of the three stages involved in developing ML models, ‘train’, ‘val’, or ‘test’.
Saves the ydf
in the output_dir
with the file name <stage>_y_data.csv.
Used in preprocess.
Parameters:
- ydfpd.DataFrame
DataFrame with y data samples for the specified stage. This should only include the samples that have feature data. The minimum columns to be saved are the columns coorresponding to the IDs (e.g. for DRP, the cancer cell ID and drug ID) and the
y_col_name
(e.g. for DRP, usually ‘auc’).- stagestr
Stage of the data. Should be one of [‘train’, ‘val’, ‘test.’]
- output_dirstr
The output directory where the preprocessed data is saved. Should be
params['output_dir']
.
Returns:
None
Example
frm.save_stage_ydf(ydf, stage, params["output_dir"])
Add a bit about best practices to coordinate data.