MISSION: Ultra Large-Scale Feature Selection using Count-Sketches
An ICML 2018 paper by Amirali Aghazadeh*, Ryan Spring*, Daniel LeJeune, Gautam Dasarathy, Anshumali Shrivastava, Richard G. Baraniuk
* These authors contributed equally and are listed alphabetically.
- All data files are formatted using the VW input format
- Build executables by running Makefile
- Mission Logistic Regression
// Hyperparameters
// Size of Top-K Heap
const size_t TOPK = (1 << 14) - 1;
// Size of Count-Sketch Array
const size_t D = (1 << 18) - 1;
// Number of Arrays in Count-Sketch
const size_t N = 3;
// Learning Rate
const float LR = 5e-1;
./mission_logistic train_data test_data
- Fine-Grained Mission Softmax Regression
// Hyperparameters
// Size of Top-K Heap
const size_t TOPK = (1 << 20) - 1;
// Number of Classes
const size_t K = 193;
// Size of Count-Sketch Array
const size_t D = (1 << 24) - 1;
// Number of Arrays in Count-Sketch
const size_t N = 3;
// Learning Rate
const float LR = 1e-2;
// Length of String Feature Representation
const size_t LEN = 12;
./fine_mission_softmax train_data test_data
- Coarse-Grained Mission Softmax Regression
// Hyperparameters
// Size of Top-K Heap
const size_t TOPK = (1 << 22) - 1;
// Number of Classes
const size_t K = 193;
// Size of Count-Sketch Array
const size_t D = (1 << 24) - 1;
// Number of Arrays in Count-Sketch
const size_t N = 3;
// Learning Rate
const float LR = 1e-1;
// Length of String Feature Representation
const size_t LEN = 12;
./coarse_mission_softmax [train_data_part_1 train_data_part_2 ... train_data_part_n] test_data
- Feature Hashing Softmax Regression
// Hyperparameters
// Number of Classes
const size_t K = 193;
// Size of Count-Sketch Array
const size_t D = (1 << 24) - 1;
// Learning Rate
const float LR = 1e-2;
// Length of String Feature Representation
const size_t LEN = 12;
./softmax [train_data_part_1 train_data_part_2 ... train_data_part_n] test_data
- Mission streams in the dataset via Memory-Mapped I/O instead of loading everything directly into memory -
Necessary for Tera-Scale Datasets - AVX SIMD optimization for fast Softmax Regression
- The code is currently optimized for the Splice-Site and DNA Metagenomics datasets.
- Fine-Grained Feature Set - Each class maintains a separate feature set, so there is a top-k heap for each class.
- Coarse-Grained Feature Set - All the classes share a common set of features, so there is only one top-k heap. -
Each feature is measured by its L1 Norm for all classes. - Data Parallelism - Each worker maintains a separate heap, while aggregating gradients in the same count-sketch.