Skip to content

Latest commit

 

History

History
113 lines (75 loc) · 5.12 KB

README.md

File metadata and controls

113 lines (75 loc) · 5.12 KB

Deep Learning Papers TLDR

My collection of notes on deep learning papers.

To take a look at some of my projects and notes on deep learning that's not directly related to literature research, go here: @episodeyang/deep_learning_notes

This repository is motivated by Andrew Ng's The Saturday Story, with the hope that eventually I will become a good deep learning researcher.

Current Week

I will keep this todo list short. This is what I'm working on this week.

Neural-Programmer Interpreter implementation (pyTorch)

see deep_learning_notes repo for the code.

Information Theory Notes (WIP)

  • Boltzmann, Entropy, and Kullback-Leibler Divergence

    I was inspired to understand the physics foundation of Restricted Boltzmann Machine. In the first installment of a series of posts, I take a physicist's approach to derive Shannon's entropy from statistical mechanics. Then I went on to derive various information theoretic enties. (Work in progress)

Bayesian Optimizations

Closely related to statistical ideas of ¹

  • optimal design of experiments, dating back to Kirstine Smith in 1918.
  • response surface methods, they date back to Box and Wilson in 1951.
  • Bayesian optimization, studied first by Kushner in 1964 and then Mockus in 1978.

Methodologically, it touches on several important machine learning areas: active learning, contextual bandits, Bayesian nonparametrics

  • Started receiving serious attention in ML in 2007,
    • Brochu, de Freitas & Ghosh, NIPS 2007 [preference learning]
    • Krause, Singh & Guestrin, JMLR 2008 [optimal sensor placement]
    • Srinivas, Krause, Kakade & Seeger, ICML 2010 [regret bounds]
    • Brochu, Hoffman & de Freitas, UAI 2011 [portfolios]
  • Interest exploded when it was realized that Bayesian optimization provides an excellent tool for finding good ML hyperparameters.

ICLR 2017 Best Papers

Attention

Table of Contents

ICLR 2017 Best Papers

Neural Compression and Techniques

  • 2005, Han et al., Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding [pdf]

    gist: A three stage pipeline:

    1. zero out small weights (prunning) 9x-13x
    2. Trained Quantization 27x-31x
    3. Huffman Encoding 35x-49x

    without suffering any loss in accuracy.

  • ICLR 2017, Han et al., Dense-Sparse-Dense Training for Deep Neural Networks [pdf]

    gist: Sparse training and dense retrian improves network performance

    1. train normally
    2. mask out small weights (bimodal distribution), then retrain
    3. remove mask and set small weights to zero, then retrain.

    profit: 12% abs. imprv. across vision, speech and caption tasks, 413% rel. imprv.

Sources

The source of this repo is mostly:

  • various online reading lists
  • Conferences
  • Courses: this is probably the most important source because they are structured
  • friends and colleague's recommendations

Other Repos

My lab-mate Nelson's notes can be seen here.