A MATLAB implementation of Sever: A Robust Meta-Algorithm for Stochastic Optimization.
This project requires installation of the following packages:
The following are different methods for filtering points.
baselineGradient.m
: A baseline that removes the points with the largest gradients.baselineLosses.m
: A baseline that removes the points with the largest losses.baselineOracleL2.m
: A baseline that removes the points which have the largest L2 norm with respect to some given point. Can be used in either the gradient or data space.filterSimple.m
: Our method, which projects gradients onto the top principal component and then removes points based on their resulting magnitude.
The following are the code and data for our SVM evaluation.
data
: Folder containing the two datasets, corresponding to the Enron dataset and our synthetic dataset.diaries
: Folder containing a collection of attacks for the two datasets. Subdirectories first split based on dataset, and then based on corruption fraction and method used for generating attacks. Each of these folders contains a variety of attacks, corresponding to different settings of hyperparameters during generation.testSingleAttack.m
,testSingleSuite.m
, andtestAll.m
: Scripts for testing a single attack, a suite of attacks (i.e., all attacks for a particular corruption fraction and a generation method), and all attacks.aggregateScores.m
andevaluateDefenses.m
: Parse and set various options, and then run the actual defenses and measure their accuracy.train.m
: Train a (non-robust) classifier.nabla_Loss.m
,nabla_Loss_multiclass.m
,process.m
: Compute gradients for single and multiclass classification.filterByClass.m
: Runs the given filter function on a specified class.
data
: Folder containing the drug discovery dataset.scriptOptions
: Folder containing different choices of parameters for the attacks, tuned to attack different defenses on different datasets. Documentation for which parameter choice is supposed to have which outcome is intestAll.m
, and the options are parsed byparseOptionsLinReg.m
testAll.m
: Scripts for running the attacks (with options as specified in scriptOptions) against all defenses.linReg.m
: Trains a (non-robust) linear classifier.linRegAttack.m
: A simple data poisoning attack on linear regression, as described in the paper.filterLinReg.m
: Runs the filter with a chosen defense on the dataset given.robustCentering.m
: Uses robust mean estimation to robustly center the data points, as described in the paper.compute_gradients.m
: Given a dataset, a model, and a ridge parameter, computes the gradients of the model evaluated at the datapoints and the ridge parameter.squaredLoss.m
: Computes squared loss of model on dataset.
Figures in the paper can be approximately reproduced by running the following scripts. Note that these scripts currently operate on pre-computed data, which we include for convenience, but could be re-computed by running the appropriate scripts in other directories.
plotEnron.m
: Plots for SVM results on Enron dataset.plotSVMSynthetic.m
: Plots for SVM results on synthetic dataset.plotFigsLinReg.m
: Plots for linear regression results on drug discovery dataset and synthetic dataset.writeErrs.m
: Writes accuracies to file, for plotting by other methods.
This repository is an implementation of our paper Sever: A Robust Meta-Algorithm for Stochastic Optimization in ICML 2019, authored by Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Jacob Steinhardt, and Alistair Stewart.
If you use our code or paper, we ask that you please cite:
@inproceedings{DiakonikolasKKLSS19,
author = {Diakonikolas, Ilias and Kamath, Gautam and Kane, Daniel M. and Li, Jerry and Steinhardt, Jacob and Stewart, Alistair},
title = {Sever: A Robust Meta-Algorithm for Stochastic Optimization},
booktitle = {Proceedings of the 36th International Conference on Machine Learning},
series = {ICML '19},
year = {2019},
pages = {1596--1606},
publisher = {JMLR, Inc.}
}