build_ml_data_file_name
improvelib.utils.build_ml_data_file_name(data_format, stage)
Returns the name of the ML data file in the form of <stage>_data.<format>, e.g. train_data.pt.
Used in preprocess, train, and infer.
Parameters:
- data_formatstr
Format to save the data (e.g. ‘.tsv’). Values with or without the period (‘.’) are acceptable.
- stagestr
Stage of the data. Should be one of [‘train’, ‘val’, ‘test.’]
Returns:
- ml_data_file_namestr
Data file name
Example
Creates the appropriate file name to save the preprocessed data. For example, in preprocess:
data_fname = frm.build_ml_data_file_name(data_format=params["data_format"], stage=stage)
xdf.to_parquet(Path(params["output_dir"]) / data_fname)
Also insures that the appropriate files are loaded in train and infer. For example, to load training data:
train_data_fname = frm.build_ml_data_file_name(data_format=params["data_format"], stage="train")
train_data = pd.read_parquet(Path(params["input_dir"]) / train_data_fname)