get_response_data
improvelib.applications.drug_response_prediction.drp_utils.get_response_data(split_file, benchmark_dir, response_file, split_id=’split_id’, sep=’t’)
Gets response data for a given split file.
Used in preprocess.
Parameters:
- split_filestr, Path, List of str, List of Path
Name of split file if in benchmark data, otherwise path to split file. Can be a list of str or Path.
- benchmark_dirstr, Path
Path to benchmark data directory.
- response_filestr
Name of response file.
- split_idstr, optional
Name of column containing the split ID (default: ‘split_id’).
- sepstr, optional
Separator for response file (default: ‘t’).
Returns:
- dfpd.DataFrame
Response dataframe for given split.
Example
To load response data for the training set:
response_train = drp.get_response_data(split_file=params["train_split_file"],
benchmark_dir=params['input_dir'],
response_file=params['y_data_file'])
Loading response data to preprocess data for all three stages is typically by looping through the stages as follows:
stages = {"train": params["train_split_file"],
"val": params["val_split_file"],
"test": params["test_split_file"]}
for stage, split_file in stages.items():
Within this loop, response data for each stage can be loaded with:
response_stage = drp.get_response_data(split_file=split_file,
benchmark_dir=params['input_dir'],
response_file=params['y_data_file'])