store_predictions_df

improvelib.utils.store_predictions_df(y_pred, y_col_name, stage, output_dir, input_dir = None, y_true = None, round_decimals = 4)

Save predictions with accompanying dataframe.

This allows to trace original data evaluated (e.g. drug and cell pairs) if corresponding dataframe is available (output from save_stage_ydf in preprocess), in which case the whole structure as well as the model predictions are stored. If the dataframe is not available, only ground truth and model predictions are stored.

Used in train and infer.

Parameters:

y_prednp.array

Model predictions.

y_col_namestr

Name of the column in the y_data predicted on (e.g. ‘auc’, ‘ic50’).

stagestr

Specify if evaluation is with respect to val or test set (‘val’, or ‘test’).

output_dirstr

The output directory where the results should be saved. Should be params['output_dir'].

y_truenp.array, optional

Ground truth, if available.

input_dirstr, optional

Directory where df with ground truth with metadata is stored.

round_decimalsint, optional

Number of decimals in output (default is 4).

Returns:

None

Example

To store validation predictions in train:

frm.store_predictions_df(
    y_true=val_true,
    y_pred=val_pred,
    stage="val",
    y_col_name=params["y_col_name"],
    output_dir=params["output_dir"],
    input_dir=params["input_dir"]
)

To store inference predictions in infer, when ground truth is available:

frm.store_predictions_df(
    y_true=test_true,
    y_pred=test_pred,
    stage="test",
    y_col_name=params["y_col_name"],
    output_dir=params["output_dir"],
    input_dir=params["input_data_dir"]
)