My collection of notes on deep learning papers.
To take a look at some of my projects and notes on deep learning that's not directly related to literature research, go here: @episodeyang/deep_learning_notes
This repository is motivated by Andrew Ng's The Saturday Story, with the hope that eventually I will become a good deep learning researcher.
I will keep this todo list short. This is what I'm working on this week.
see deep_learning_notes
repo for the code.
-
Boltzmann, Entropy, and Kullback-Leibler Divergence
I was inspired to understand the physics foundation of Restricted Boltzmann Machine. In the first installment of a series of posts, I take a physicist's approach to derive Shannon's entropy from statistical mechanics. Then I went on to derive various information theoretic enties. (Work in progress)
Closely related to statistical ideas of ¹
- optimal design of experiments, dating back to Kirstine Smith in 1918.
- response surface methods, they date back to Box and Wilson in 1951.
- Bayesian optimization, studied first by Kushner in 1964 and then Mockus in 1978.
Methodologically, it touches on several important machine learning areas: active learning, contextual bandits, Bayesian nonparametrics
- Started receiving serious attention in ML in 2007,
- Brochu, de Freitas & Ghosh, NIPS 2007 [preference learning]
- Krause, Singh & Guestrin, JMLR 2008 [optimal sensor placement]
- Srinivas, Krause, Kakade & Seeger, ICML 2010 [regret bounds]
- Brochu, Hoffman & de Freitas, UAI 2011 [portfolios]
- Interest exploded when it was realized that Bayesian optimization provides an excellent tool for finding good ML hyperparameters.
- "Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data" by Nicolas Papernot, Martín Abadi, Úlfar Erlingsson, Ian Goodfellow, Kunal Talwar https://openreview.net/forum?id=HkwoSDPgg¬eId=HkwoSDPgg
-
Authors propose to augment neural networks by a key abstraction: recursion. Work applied this idea to Program-Interpreter from Reed & de Freitas 2006, and demonstrates superior generalizability, tractability with proven garantee.
Recursion greatly reduces the domain of each neural program component, also greatly reduces the amount of training data required, and make it easier to interpret and validate.
-
Notes on "Understanding deep learning requires rethinking generalization" Google Brain
Through experimentation, authors shows how traditional approaches to generalization failes to explain why large neural networks generalize well in practice. They also explains why deep learning requires rethinking generalization.
-
2005, Han et al., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [pdf]
gist: A three stage pipeline:
- zero out small weights (prunning) 9x-13x
- Trained Quantization 27x-31x
- Huffman Encoding 35x-49x
without suffering any loss in accuracy.
-
ICLR 2017, Han et al., Dense-Sparse-Dense Training for Deep Neural Networks [pdf]
gist: Sparse training and dense retrian improves network performance
- train normally
- mask out small weights (bimodal distribution), then retrain
- remove mask and set small weights to zero, then retrain.
profit: 1
2% abs. imprv. across vision, speech and caption tasks, 413% rel. imprv.
The source of this repo is mostly:
- various online reading lists
- Conferences
- Courses: this is probably the most important source because they are structured
- friends and colleague's recommendations