store_predictions_df

improvelib.utils.store_predictions_df(y_pred, y_col_name, stage, output_dir, input_dir = None, y_true = None, round_decimals = 4)

Save predictions with accompanying dataframe.

This allows to trace original data evaluated (e.g. drug and cell pairs) if corresponding dataframe is available (output from save_stage_ydf in preprocess), in which case the whole structure as well as the model predictions are stored. If the dataframe is not available, only ground truth and model predictions are stored.

Used in train and infer.

Parameters:

y_prednp.array: Model predictions.
y_col_namestr: Name of the column in the y_data predicted on (e.g. ‘auc’, ‘ic50’).
stagestr: Specify if evaluation is with respect to val or test set (‘val’, or ‘test’).
output_dirstr: The output directory where the results should be saved. Should be params['output_dir'].
y_truenp.array, optional: Ground truth, if available.
input_dirstr, optional: Directory where df with ground truth with metadata is stored.
round_decimalsint, optional: Number of decimals in output (default is 4).

Returns:

None

Example

To store validation predictions in train:

frm.store_predictions_df(
    y_true=val_true,
    y_pred=val_pred,
    stage="val",
    y_col_name=params["y_col_name"],
    output_dir=params["output_dir"],
    input_dir=params["input_dir"]
)

To store inference predictions in infer, when ground truth is available:

frm.store_predictions_df(
    y_true=test_true,
    y_pred=test_pred,
    stage="test",
    y_col_name=params["y_col_name"],
    output_dir=params["output_dir"],
    input_dir=params["input_data_dir"]
)