You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BenchOpt is a package to simplify and make more transparent and
reproducible the comparisons of optimization algorithms.
This benchmark is dedicated to solvers for bilevel optimization:
where $d_1, \dots, d_n$ are training data samples, $z$ are the parameters of the machine learning model, and the loss function $\ell$ measures how well the model parameters $z$ predict the data $d_i$.
There is also a regularization $\mathcal{R}$ that is parametrized by the regularization strengths $x$, which aims at promoting a certain structure on the parameters $z$.
The outer function $f$ is defined as the unregularized loss on unseen data
This is a logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\in\mathbb{R}^p$ are the features and $y_i=\pm1$ is the binary target.
For this problem, the loss is $\ell(d_i, z) = \log(1+\exp(-y_i a_i^T z))$, and the regularization is simply given by
$$\mathcal{R}(x, z) = \frac12\sum_{j=1}^p\exp(x_j)z_j^2,$$
each coefficient in $z$ is independently regularized with the strength $\exp(x_j)$.
This is a multicalss logistic regression problem, where the data is of the form $d_i = (a_i, y_i)$ with $a_i\in\mathbb{R}^p$ are the features and $y_i\in \{1,\dots, k\}$ is the integer target, with k the number of classes.
For this problem, the loss is $\ell(d_i, z) = \text{CrossEntropy}(za_i, y_i)$ where $z$ is now a k x p matrix. The regularization is given by
$$\mathcal{R}(x, z) = \frac12\sum_{j=1}^k\exp(x_j)\|z_j\|^2,$$
each line in $z$ is independently regularized with the strength $\exp(x_j)$.
2 - Hyper data cleaning
This problem was first introduced by [Fra2017] .
In this problem, the data is the MNIST dataset.
The training set has been corrupted: with a probability $p$, the label of the image $y\in\{1,\dots,10\}$ is replaced by another random label between 1 and 10.
We do not know beforehand which data has been corrupted.
We have a clean testing set, which has not been corrupted.
The goal is to fit a model on the corrupted training data that has good performances on the test set.
To do so, a set of weights -- one per train sample -- is learned as well as the model parameters.
Ideally, we would want a weight of 0 for data that has been corrupted, and a weight of 1 for uncorrupted data.
The problem is cast as a bilevel problem with $g$ given by
where the $d_i$ are the corrupted training data, $\ell$ is the loss of a CNN parameterized by $z$, $\sigma$ is a sigmoid function, and C is a small regularization constant.
Here the outer variable $x$ is a vector of dimension $n$, and the weight of data $i$ is given by $\sigma(x_i)$.
The test function is
$$f(x, z) =\frac1m \sum_{j=1}^n \ell(d'_j, z)$$
where the $d_j$ are uncorrupted testing data.
Install
This benchmark can be run using the following commands:
You can also use config files to setup the benchmark run:
$ benchopt run benchmark_bilevel --config config/X.yml
where X.yml is a config file. See https://benchopt.github.io/index.html#run-a-benchmark for an example of a config file. This will possibly launch a huge grid search. When available, you can rather use the file X_best_params.yml in order to launch an experiment with a single set of parameters for each solver.
If you use this benchmark in your research project, please cite the following paper:
@inproceedings{saba,
title = {A Framework for Bilevel Optimization That Enables Stochastic and Global Variance Reduction Algorithms},
booktitle = {Advances in {{Neural Information Processing Systems}} ({{NeurIPS}})},
author = {Dagr{\'e}ou, Mathieu and Ablin, Pierre and Vaiter, Samuel and Moreau, Thomas},
year = {2022}
}