get_y_data_with_features
improvelib.utils.get_y_data_with_features(y_data_df, feature_df, column_name)
Takes a y data DataFrame and feature DataFrame(s) and returns a y data DataFrame that contains only rows that have available features for the feature type(s) provided. All features in the list must have the same ID type (e.g. drug or cell). If a list is given, only rows will be retained if all features in the list are available.
Used in preprocess.
Parameters:
- y_data_dfpd.DataFrame
Y data DataFrame.
- feature_dfpd.DataFrame or List of pd.DataFrame
Feature DataFrame or a list of feature DataFrames of the same ID (drug or cell). ID must be index, as with all improvelib functions.
- column_namestr
Name of ID column for x data.
Returns:
- y_data_dfpd.DataFrame
Y data DataFrame containing only the rows with features available.
Example
Before determining the transformations using the training set, it is important to only use features that are in the training set and have features for both drug and cell.
This can be easily performed by calling get_y_data_with_features
and get_features_in_y_data
like so:
response_train = drp.get_y_data_with_features(response_train, omics, params['canc_col_name'])
response_train = drp.get_y_data_with_features(response_train, drugs, params['drug_col_name'])