get_response_data

improvelib.applications.drug_response_prediction.drp_utils.get_response_data(split_file, benchmark_dir, response_file, split_id=’split_id’, sep=’t’)

Gets response data for a given split file.

Used in preprocess.

Parameters:

split_filestr, Path, List of str, List of Path: Name of split file if in benchmark data, otherwise path to split file. Can be a list of str or Path.
benchmark_dirstr, Path: Path to benchmark data directory.
response_filestr: Name of response file.
split_idstr, optional: Name of column containing the split ID (default: ‘split_id’).
sepstr, optional: Separator for response file (default: ‘t’).

Returns:

dfpd.DataFrame: Response dataframe for given split.

Example

To load response data for the training set:

response_train = drp.get_response_data(split_file=params["train_split_file"],
                               benchmark_dir=params['input_dir'],
                               response_file=params['y_data_file'])

Loading response data to preprocess data for all three stages is typically by looping through the stages as follows:

stages = {"train": params["train_split_file"],
          "val": params["val_split_file"],
          "test": params["test_split_file"]}

for stage, split_file in stages.items():

Within this loop, response data for each stage can be loaded with:

response_stage = drp.get_response_data(split_file=split_file,
                    benchmark_dir=params['input_dir'],
                    response_file=params['y_data_file'])