get_features_in_y_data
improvelib.utils.get_features_in_y_data(feature_df, y_data_df, column_name)
Takes a feature DataFrame and a y data DataFame and returns the feature DataFrame that contains only features that are present in the given y data DataFrame.
Used in preprocess.
Parameters:
- feature_dfpd.DataFrame
Feature DataFrame. ID must be index, as with all improvelib functions.
- y_data_dfpd.DataFrame
Y data DataFrame.
- column_namestr
Name of ID column for x data.
Returns:
- feature_dfpd.DataFrame
Feature DataFrame containing only the rows with features that are used in the y data.
Example
Before determining the transformations using the training set, it is important to only use features that are in the training set and have features for both drug and cell.
This can be easily performed by calling get_y_data_with_features
and get_features_in_y_data
like so:
print("Find intersection of training data.")
response_train = drp.get_y_data_with_features(response_train, omics, params['canc_col_name'])
response_train = drp.get_y_data_with_features(response_train, drugs, params['drug_col_name'])
omics_train = drp.get_features_in_y_data(omics, response_train, params['canc_col_name'])
drugs_train = drp.get_features_in_y_data(drugs, response_train, params['drug_col_name'])