ICLR 2019 Reproducibility report: H-detach #148

dido1998 · 2019-01-07T20:51:46Z

Submission of ICLR 2019 reproducibility report for the paper: h-detach: Modifying the LSTM Gradient Towards Better Optimization
Issue number: #53

reproducibility-org · 2019-02-26T03:31:55Z

Hi, please find below a review submitted by one of the reviewers:

Score: 7
Reviewer 2 comment : We shall refer to the authors of the reproducibility report as authors and the authors of the original paper as writers. Similarly, we shall refer to the reproducibility report as the report and the original ICLR submission as the paper for the rest of this document.

Summary: The authors re-ran all of the major experiments to prove the validity of the h-detach algorithm. The authors communicated with the writers to understand the problem in order to fairly reproduce the original work. Additionally, the authors provide a CUDA based implementation of the algorithm in order to improve its speed during training.

Problem Statement: The report clearly states the problem statement of the paper and provides a summary of the method used in the paper.

Code: The authors re-used writer’s original repository but made changes to it to run experiments with other values for hyperparameters and initial seeds. The report additionally provides a CUDA based implementation for the original work in order to speed up it's training time.

Communication with writers: The report outlines the communication with the writers in regards to the slow training speed of the original implementation. It highlights that the original implementation is slow due to the sequential nature of data intake as opposed to the vanilla version of LSTM. The writers relayed that this was done to ensure the correctness of the h-detach algorithm.

Hyperparameter Search: The report experimented with different seed values for the copying task and learning rate for Sequential MNIST task. But the authors did not perform/mention any hyperparameter sweep for the probability of h-detach itself, which is the core of the work.However, the report cites that writers did hyperparameter sweep for h-detach probability, which they did not mention in the paper. The authors state that they find this from writer's comment in the openreview forum.

Ablation Study: The report performed both the ablation studies mentioned in the paper and replicated the results. No additional ablation study was done in the report.

Discussion on results: The report clearly states that they were able to reproduce the original work.

Recommendations for reproducibility: The report recommends the writers to try out the h-detach algorithm on stacked LSTMs. The original work mentions this as part of the future direction and outlines different ways of doing the same. Additionally, the report recommends the writers to try the algorithm on bidirectional RNNs with LSTM cells.

Overall organization and clarity: This report does a good job of reproducing the major results of the paper. It would have been interesting to see the results of hyperparameter sweep for h-detach probability in the report. The report did not reproduce the results of the transfer copying task which evaluates the generalization capability of the h-detach algorithm.

Confidence : 4

reproducibility-org · 2019-02-26T04:56:20Z

Hi, please find below a review submitted by one of the reviewers:

Score: 7
Reviewer 1 comment : - The report's authors have understood the problem + confirm reproducibility of the original paper. The original paper presents an algorithmic contribution to avoid the vanishing gradient problem in recurrent neural networks. The report is clear, and sufficiently detailed. Source code is well-documented, including the README. I could not run the source code due to some error with CUDA.
As drawback, reproducibility results are a little bit limited, the authors could have explored more random inits and hyperparameters at almost no extra cost (e.g., Fig 1 uses only 2 random inits).

More details:

Problem statement: well-understood.

Code: original code used, an extension in CUDA is also proposed to speed-up the algorithm and make it competitive against vanilla LSTM implementation in pytorch. Well-documented. Missing version of CUDA, and tensorboardx in requirements.

Communication with original authors: yes, sufficient.

Hyperparameter Search: no additional hyperparameter search, replication of results using same hyperparameters.

Ablation Study: yes, 2 ablations, already studied in the original paper (gradient clipping, and

Discussion on results: The authors have reproduced most of the results in the original paper. The range of parameters tested is relatively small (only 2 random init, 2 learning rate, etc...). More particularly,

* Fig 1: more than 2 seeds is required. It is hard to get very convincing conclusions regarding the stability and speed of convergence of the alg with just 2 random inits, given the high variance between the two seeds for both methods. I encourage the authors to replot Fig.1 including 5 seeds at least. Ideally, I would plot the average of 100 seeds with +- std deviations across epochs.
* Fig 2: satisfactory
* Fig 5: it would be nice to also include vanilla LSTM baseline in these plots.

Recommendations for reproducibility: not many, only recommends extension to other architectures/types of RNNs.

Overall organization and clarity: Good, overall clear. Some grammar/typos (e.g., last paragrah in page 4.)
Confidence : 4

reproducibility-org · 2019-03-20T04:40:13Z

Hi, please find below a review submitted by one of the reviewers:

Score: 8
Reviewer 3 comment :

Problem statement
The authors of the report conveyed good and clear understanding of the problem, though they have provided a lot of details unnecessarily. For example, repeating all the LSTM equations and their derivations in their report. This is not the goal of the reproducibility challenge, as one can read them directly from the paper. The goal, however, is to reproduce the results based on the experiments’ details provided by the authors in the paper and verify that the baseline results are (maybe) the best that we can get from these baselines.
Code
The authors used the original code provided with the paper and modified the LSTM implementation that was done by hand by the authors to use the CuDNNLSTM implementation to increase the speed of the experiments.
Communication with original authors
Done over OpenReview, and they discussed it in detail in the report.
Hyperparameter Search
Nothing beyond what was tested in the original paper.
Ablation Study
Done the same studies that were done in the original paper.
Discussion on results
The authors of this report have replicated the results from the paper. Therefore, a detailed discussion is not required.
Recommendations for reproducibility
The authors mentioned that all the details were included in the original paper. The only recommendation/observation that they made was to use the CUDA implementation of LSTM which gave similar results to the authors’ implementation in the original paper.
Overall organization and clarity
The report, overall, is clear and well-written. My only comment on it (or for future reports) is to reduce the amount of details that the original paper already has. Nothing wrong with the authors providing their own explanation of the problem/methodology, but to just repeat the same equations from the same paper is, in my opinion, a waste of space that could be utilized for more experiments.

Confidence : 5

reproducibility-org · 2019-03-31T17:13:09Z

Meta Reviewer Decision: Accept

dido1998 added 2 commits January 8, 2019 02:09

adding report for issue reproducibility-challenge#53

ebc5fec

Update README.md

a877282

reproducibility-org added the checks-complete Submission criteria checks complete label Jan 7, 2019

koustuvsinha added reviewer-assigned Reviewer has been assigned and removed reviewer-assigned Reviewer has been assigned labels Feb 1, 2019

reproducibility-org added the review-complete Review is done by all reviewers label Mar 20, 2019

reproducibility-org added the accept Meta Reviewer decision: Accept label Mar 31, 2019

dido1998 added 5 commits April 10, 2019 04:18

adding camera ready paper

e87df03

Merge branch 'master' of https://github.com/dido1998/iclr_2019

7563e6d

adding 1 more seed to figure 1

1f1ce9f

increasing some figure heights for clarity

6c51b18

moving camera ready paper inside folder

73bc083

reproducibility-org merged commit b2faa01 into reproducibility-challenge:master May 3, 2019

reproducibility-org mentioned this pull request May 3, 2019

Reproducibility Report of "h-detach: Modifying the LSTM gradient towards better optimization" ReScience/submissions#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ICLR 2019 Reproducibility report: H-detach #148

ICLR 2019 Reproducibility report: H-detach #148

dido1998 commented Jan 7, 2019

reproducibility-org commented Feb 26, 2019

reproducibility-org commented Feb 26, 2019

reproducibility-org commented Mar 20, 2019 •

edited

Loading

reproducibility-org commented Mar 31, 2019

ICLR 2019 Reproducibility report: H-detach #148

ICLR 2019 Reproducibility report: H-detach #148

Conversation

dido1998 commented Jan 7, 2019

reproducibility-org commented Feb 26, 2019

reproducibility-org commented Feb 26, 2019

reproducibility-org commented Mar 20, 2019 • edited Loading

reproducibility-org commented Mar 31, 2019

reproducibility-org commented Mar 20, 2019 •

edited

Loading