Learning Curve Analysis Workflows

Learning curve analysis (LCA) allows to evaluate how prediction performance improves as the training set size increases, providing insights into the model’s data scaling properties.

Workflow

The LCA workflow takes training data (referred to as training shards) created by Learning Curve Split Generator and runs preprocess, train, and infer for all available training shards, with all specified splits for a given dataset.

Metrics

The inference scores resulting from each training shard can be aggregated and plotted using Postprocessing of Learning Curve Analysis (LCA). When the right tail of the curve flattens, the model performance is no longer improving with additional training data.

References

1. A. Partin et al. “Learning curves for drug response prediction in cancer cell lines”, BMC Bioinformatics, 2021