This is the pytorch implementation of our paper "How Do Adam and Training Strategies Help BNNs Optimization?", published in ICML 2021.
In this work, we explore the intrisic reasons why Adam is superior to other optimizers like SGD for BNN optimization and provide analytical explanations that support specific training strategies. By visualizing the optimization trajectory, we show that the optimization lies in extremely rugged loss landscape and the second-order momentum in Adam is crucial to revitalize the weights that are dead due to the activation saturation in BNNs. Based on analysis, we derive a specific training scheme and achieve 70.5% top-1 accuracy on the ImageNet dataset using the same achitecture as ReActNet while achieving 1.1% higher accuracy.
If you find our code useful for your research, please consider citing:
@conference{liu2021how,
title = {How do adam and training strategies help bnns optimization?},
author = {Liu, Zechun and Shen, Zhiqiang and Li, Shichao and Helwegen, Koen and Huang, Dong and Cheng, Kwang-Ting},
booktitle = {International Conference on Machine Learning},
year = {2021},
organization={PMLR}
}
- python3, pytorch 1.7.1, torchvision 0.8.2
- Download ImageNet dataset
(1) Step1: binarizing activations
- Change directory to
./step1/
- run
bash run.sh
(2) Step2: binarizing weights + activations
- Change directory to
./step2/
- run
bash run.sh
Methods | Backbone | Top1-Acc | FLOPs | Trained Model |
---|---|---|---|---|
ReActNet | ReActNet-A | 69.4% | 0.87 x 10^8 | Model-ReAct |
AdamBNN | ReActNet-A | 70.5% | 0.87 x 10^8 | Model-ReAct-AdamBNN-Training |
Zechun Liu, HKUST and CMU (zliubq at connect.ust.hk / zechunl at andrew.cmu.edu)
Zhiqiang Shen, CMU (zhiqians at andrew.cmu.edu)