Implementation of beyond neural scaling beating power laws through data pruning.
The implementation covers deep learning and prototype-based models and shows how the optimal pruning algorithm can transition from power law scaling to exponential law scaling.
We exemplify practical use cases for TensorFlow object detection for mobile/edge devices and prototype-based models with learning vector quantization.
The implementation covers deep models with illustrations for computer vision in the dataprune.py module.
There is also an extension of the implementation to cover ML practitioners in the area of prototype-based models with illustrations for LVQ(s) in the dataprune1.py module.
usage: dataprune.py [-h] -m -n -x [-p | -b | -a]
Executes self supervised learning metric for data pruning
options:
-h, --help show this help message and exit
-m , --ssl_model self supervised model type
-n , --number_of_clusters
number of cluster under consideration
-x , --prune_fraction
fraction for pruning the dataset
-p, --prune prune data set
-b, --get_cluster_results
populates cluster folders with clustering results
-a, --all populates cluster folders with clustering results and pruned data set for all specifications
[1] Sorscher, Ben, et al. Beyond neural scaling laws: beating power law scaling via data pruning. arXiv preprint arXiv:2206.14486 (2022).