get_y_data_with_features

improvelib.utils.get_y_data_with_features(y_data_df, feature_df, column_name)

Takes a y data DataFrame and feature DataFrame(s) and returns a y data DataFrame that contains only rows that have available features for the feature type(s) provided. All features in the list must have the same ID type (e.g. drug or cell). If a list is given, only rows will be retained if all features in the list are available.

Used in preprocess.

Parameters:

y_data_dfpd.DataFrame

Y data DataFrame.

feature_dfpd.DataFrame or List of pd.DataFrame

Feature DataFrame or a list of feature DataFrames of the same ID (drug or cell). ID must be index, as with all improvelib functions.

column_namestr

Name of ID column for x data.

Returns:

y_data_dfpd.DataFrame

Y data DataFrame containing only the rows with features available.

Example

Before determining the transformations using the training set, it is important to only use features that are in the training set and have features for both drug and cell. This can be easily performed by calling get_y_data_with_features and get_features_in_y_data like so:

response_train = drp.get_y_data_with_features(response_train, omics, params['canc_col_name'])
response_train = drp.get_y_data_with_features(response_train, drugs, params['drug_col_name'])