get_y_data

improvelib.utils.get_y_data(split_file, benchmark_dir, y_data_file, split_id=’split_id’, sep=’t’)

Gets y data for a given split file.

Used in preprocess.

Parameters:

split_filestr, Path, List of str, List of Path: Name of split file if in benchmark data, otherwise path to split file. Can be a list of str or Path.
benchmark_dirstr, Path: Path to benchmark data directory.
y_data_filestr: Name of y data file.
split_idstr, optional: Name of column containing the split ID (default: ‘split_id’).
sepstr, optional: Separator for y data file (default: ‘t’).

Returns:

dfpd.DataFrame: Y data dataframe for given split.

Example

To load y data for the training set:

response_train = drp.get_y_data(split_file=params["train_split_file"],
                               benchmark_dir=params['input_dir'],
                               y_data_file=params['y_data_file'])

Loading y data to preprocess data for all three stages is typically by looping through the stages as follows:

stages = {"train": params["train_split_file"],
          "val": params["val_split_file"],
          "test": params["test_split_file"]}

for stage, split_file in stages.items():

Within this loop, y data for each stage can be loaded with:

response_stage = drp.get_y_data(split_file=split_file,
                    benchmark_dir=params['input_dir'],
                    y_data_file=params['y_data_file'])