tf-hypergrad

This is a simple example of gradient-based hyperparameter optimization, as discussed in Maclaurin, Duvenaud, and Adams' "Gradient-based Hyperparameter Optimization through Reversible Learning" link. We consider a simple linear regression model.

We learn per-iteration step sizes and a momentum parameter by differentiating the dev set loss. The dev set loss depends on the learned parameters, so we back-propagate through the process of learning the parameters using sgd with momentum. Essentially, the learning process is represented as an unrolled computation graph. Note that the paper uses special-case implementations of certain arithmetic operations in order to avoid underflow. We don't do this, since we only unroll training for a relatively small number of iterations (and it would be hard to do in TF). Also, unlike the paper, this code explicitly stores a separate array of weights for every training iteration, which would be unreasonable for large problems.

See the paper and the comments in the code for more explanation of the method and the specific application. This was intended mainly as a means for me to learn tensorflow. If you have any suggestions, please let me know.

Use the --log_dir option to specify where to write out logging information. Then, you can visualize lots of diagnostics in tensorboard, including the very long computation graph representing learning.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
README.md		README.md
hypergrad.py		hypergrad.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tf-hypergrad

About

Releases

Packages

Languages

davidBelanger/tf-hypergrad

Folders and files

Latest commit

History

Repository files navigation

tf-hypergrad

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages