An efficient method for solving bilevel optimization problems appearing in the field of machine learning, specifically for data denoising by importance learning, few-shot learning and training-data poisoning.
This is the reporsitory for the experiments of the paper "Penalty Method for Inversion-Free Deep Bilevel Optimization" (ACML 2021)
Bilevel optimizations are at the center of several important machine learning problems such as hyperparameter tuning, data denoising, few-shot learning, data poisoning. Different from simultaneous or multi-objective optimization, obtaining the exact descent direction for continuous bilevel optimization requires computing the inverse of the hessian of the lower-level cost function, even for first order methods. In this paper, we propose a new method for solving bilevel optimization, using the penalty function, which avoids computing the inverse of the hessian. We prove convergence of the method under mild conditions and show that it computes the exact hypergradient asymptotically. Small space and time complexity of our method allows us to solve large-scale bilevel optimization problems involving deep neural networks with up to 3.8M upper-level and 1.4M lower-level variables. We present results of our method for data denoising on MNIST/CIFAR10/SVHN datasets, for few-shot learning on Omniglot/Mini-Imagenet datasets and for training-data poisoning on MNIST/Imagenet datasets. In all experiments, our method outperforms or is comparable to previously proposed methods both in terms of accuracy and run-time.
The codes for each experiment are specified in different folders of this repository. After installing the necessary packages (as mentioned below) create a folder structure similar to this repository. The codes for different settings and datasets for a particular experiment are included in their folders along with their dependencies. For some of the codes the datasets used are from Keras and so will be automatically available once Keras is installed. For others separate links to the places where we obtained the dataset are included in the experiment description section below. Specific instruction for pre-processing the data are also included below along with the code needed to do that pre-processing. Since the same code can be used to test different experimental settings, we have identified the line numbers where you can make changes and run the codes to compare results in the Tables of the paper.
- Tensorflow 1.15
- Keras
- Cleverhans
- H5py
- Scipy
- Numpy
Below we provide the link to the files that need to be run for replicating the results of Penalty as reported in the paper. We have also included all the codes for running the ApproxGrad comparisons in the corresponding directories as well.
Run test_synthetic.py after changing the settings of the test (e.g., number of iterations, or the location to store results.)
Run test_bilevel_importance_learning_mnist.py with appropriate noise level specified on line 62.
Run test_bilevel_importance_learning_mnist.py with appropriate noise level specified on line 64.
Run test_bilevel_importance_learning_cifar10.py with appropriate noise level specified on line 70.
Obtain data from here
Split data into 72257 digits for training, 1000 digits for validation, 26032 digits for testing using pre_process_svhn_data.py
Run test_bilevel_importance_learning_svhn.py with appropriate noise level specified on line 62.
Obtain Omniglot and Mini-Imagenet datasets from the Github page of the paper Meta-Learning for Semi-Supervised Few-Shot Classification
Run test_bilevel_few_shot_learning_omniglot.py by setting N and K on lines 59 & 60 for N-way K-shot classification
Run test_bilevel_few_shot_learning_miniimagenet.py by setting N and K on lines 37 & 38 for N-way K-shot classification
Run test_bilevel_poisoning_untargeted.py by specifying number of poisoned points to add on line 15
Run test_bilevel_poisoning_targeted.py by specifying number of poisoned points to add on line 36
Download the dogfish dataset from here and store them in dogfish_dataset named directory
Run extract_inception_features.py to extract 2048 dimensional features for all the images
Run test_bilevel_clean_label_attack.py from outside dogfish_dataset
If you use this package please cite
@misc{mehra2019penalty,
title={Penalty Method for Inversion-Free Deep Bilevel Optimization},
author={Akshay Mehra and Jihun Hamm},
year={2019},
eprint={1911.03432},
archivePrefix={arXiv},
primaryClass={cs.LG}
}