transform_data

improvelib.applications.drug_response_prediction.drp_utils.transform_data(df, transform_file_name, preprocess_dir)

Transforms (imputes, scales, and/or subsets) features based the transformations determined on the training set with determine_transform. Reads the saved dictionary containing the details needed to perform the specified transformations on all sets, and performs the transformations on the given data.

Used in preprocess.

Parameters:

dfpd.DataFrame

The input feature DataFrame, column names must be feature IDs (e.g. gene names), index must be IDs (e.g. cell line names).

transform_file_namestr

Name of the file name used in determine_transform().

preprocess_dirstr

The directory where the tranformation dictionary was saved. Should be set to params[‘output_dir’].

Returns:

dfpd.DataFrame

The transformed DataFrame.

Example

After using the features in the training data to determine the transformation values with determine_transform, the transformations can be applied as follows:

omics_stage = drp.transform_data(omics_stage, 'omics_transform', params['output_dir'])
drugs_stage = drp.transform_data(drugs_stage, 'drugs_transform', params['output_dir'])