EigenPro [1-3] is a GPU-enabled fast and scalable solver for training kernel machines. It applies a projected stochastic gradient method with dual preconditioning to enable major speed-ups. It is currently based on a PyTorch backend.
- Fast: EigenPro is the fastest kernel method at large scale.
- Plug-and-play: Our method learns a quality model with little hyper-parameter tuning in most cases.
- Scalable: The training time of one epoch is nearly linear in both model size and data size. This is the first kernel method that achieves such scalability without any compromise on testing performance.
- Support for multi-GPU and model-parallelism: We are adding support for multiple GPUs and model-parallelism.
pip install git+ssh://git@github.com/EigenPro/EigenPro.git@main
Linux:
bash examples/run_fmnist.sh
Windows:
examples\run_fmnist.bat
Jupyter Notebook: examples/notebook.ipynb
See files under examples/
for more details.
In the experiments described below, P
denotes the number of centers (model size), essentially representing the model size, while 'd' signifies the ambient dimension. For all experiments, a Laplacian kernel with a bandwidth of 20.0 was employed.
We used extracted features from the pretrained 'mobilenet-2' network available in the timm library. The benchmarks processed the full 5 million samples of CIFAR5M with d = 1280 for one epoch for two versions of EigenPro and FALKON [4-6]. All of these experiments were run on a single A100 GPU. The maximum RAM we had access to was 1.2TB, which was not sufficient for FALKON with 1M centers.
We used 10 million samples with d = 1024 for one epoch for two versions of EigenPro and FALKON. All of these experiments were run on a single V100 GPU. The maximum RAM available for this experiment was 300GB, which was not sufficient for FALKON with more than 128K centers. The features are extracted using an acoustic model (a VGG+BLSTM architecture in [7]) to align the length of audio and text.
- Abedsoltan, Amirhesam and Belkin, Mikhail and Pandit, Parthe, "Toward Large Kernel Models," Proceedings of the 40th International Conference on Machine Learning, ICML'23, JMLR.org, 2023. Link
- Siyuan Ma, Mikhail Belkin, "Kernel machines that adapt to GPUs for effective large batch training," Proceedings of the 2nd SysMLConference, 2019. Link
- Siyuan Ma, Mikhail Belkin, "Diving into the shallows: a computational perspective on large-scale shallow learning," Advances in Neural Information Processing Systems 30 (NeurIPS 2017). Link
- Giacomo Meanti, Luigi Carratino, Lorenzo Rosasco, Alessandro Rudi, “Kernel methods through the roof: handling billions of points efficiently,” Advances in Neural Information Processing Systems, 2020. Link
- Alessandro Rudi, Luigi Carratino, Lorenzo Rosasco, “FALKON: An optimal large scale kernel method,” Advances in Neural Information Processing Systems, 2017. Link
- Ulysse Marteau-Ferey, Francis Bach, Alessandro Rudi, “Globally Convergent Newton Methods for Ill-conditioned Generalized Self-concordant Losses,” Advances in Neural Information Processing Systems, 2019. Link
- Hui, L. and Belkin, M. "Evaluation of Neural Architectures Trained with Square Loss vs Cross-Entropy in Classification Tasks." In International Conference on Learning Representations, 2021. Link