-
Notifications
You must be signed in to change notification settings - Fork 40
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ICLR 2019 Reproducibility report: H-detach #148
ICLR 2019 Reproducibility report: H-detach #148
Conversation
Hi, please find below a review submitted by one of the reviewers: Score: 7 Summary: The authors re-ran all of the major experiments to prove the validity of the h-detach algorithm. The authors communicated with the writers to understand the problem in order to fairly reproduce the original work. Additionally, the authors provide a CUDA based implementation of the algorithm in order to improve its speed during training. Problem Statement: The report clearly states the problem statement of the paper and provides a summary of the method used in the paper. Code: The authors re-used writer’s original repository but made changes to it to run experiments with other values for hyperparameters and initial seeds. The report additionally provides a CUDA based implementation for the original work in order to speed up it's training time. Communication with writers: The report outlines the communication with the writers in regards to the slow training speed of the original implementation. It highlights that the original implementation is slow due to the sequential nature of data intake as opposed to the vanilla version of LSTM. The writers relayed that this was done to ensure the correctness of the h-detach algorithm. Hyperparameter Search: The report experimented with different seed values for the copying task and learning rate for Sequential MNIST task. But the authors did not perform/mention any hyperparameter sweep for the probability of h-detach itself, which is the core of the work.However, the report cites that writers did hyperparameter sweep for h-detach probability, which they did not mention in the paper. The authors state that they find this from writer's comment in the openreview forum. Ablation Study: The report performed both the ablation studies mentioned in the paper and replicated the results. No additional ablation study was done in the report. Discussion on results: The report clearly states that they were able to reproduce the original work. Recommendations for reproducibility: The report recommends the writers to try out the h-detach algorithm on stacked LSTMs. The original work mentions this as part of the future direction and outlines different ways of doing the same. Additionally, the report recommends the writers to try the algorithm on bidirectional RNNs with LSTM cells. Overall organization and clarity: This report does a good job of reproducing the major results of the paper. It would have been interesting to see the results of hyperparameter sweep for h-detach probability in the report. The report did not reproduce the results of the transfer copying task which evaluates the generalization capability of the h-detach algorithm. Confidence : 4 |
Hi, please find below a review submitted by one of the reviewers: Score: 7 More details: Problem statement: well-understood. Code: original code used, an extension in CUDA is also proposed to speed-up the algorithm and make it competitive against vanilla LSTM implementation in pytorch. Well-documented. Missing version of CUDA, and tensorboardx in requirements. Communication with original authors: yes, sufficient. Hyperparameter Search: no additional hyperparameter search, replication of results using same hyperparameters. Ablation Study: yes, 2 ablations, already studied in the original paper (gradient clipping, and Discussion on results: The authors have reproduced most of the results in the original paper. The range of parameters tested is relatively small (only 2 random init, 2 learning rate, etc...). More particularly,
Recommendations for reproducibility: not many, only recommends extension to other architectures/types of RNNs. Overall organization and clarity: Good, overall clear. Some grammar/typos (e.g., last paragrah in page 4.) |
Hi, please find below a review submitted by one of the reviewers: Score: 8
Confidence : 5 |
Meta Reviewer Decision: Accept |
Submission of ICLR 2019 reproducibility report for the paper: h-detach: Modifying the LSTM Gradient Towards Better Optimization
Issue number: #53