Word and Constituent Boundaries in Hierarchical Multiscale Recurrent Neural Networks

Branches under development

Parser benchmark on Penn Treebank: ptb
Question Answering: train_qa

References

Corpus

Text8
Penn Treebank partially available from NLTK

>>> import nltk
>>> nltk.download()
...
Identifier> treebank

Generate groundtruth boundary labels from Penn Treebank under treebank/: python convert_boundary.py --path TARGET_PATH --threshold MIN_TOKENS

Usages

Parser Benchmark

End-to-end training, testing, and evaluation on NYU HPC clusters:

sbatch ptb_pipe.sbt

Tuning configurations: modify hierarchical-rnn/config.yml
Relax, wait, and collect pickled output(s)

Updated Progress

F1 score of HM-RNN boundary detection:
1. (finished) Convert parsing in PTB to 1s/0s boundary indicators, and use that as ground truth boundaries
2. (finished) Test trained HM-LSTM models on PTB, and store layer-wise indicators
3. (finished) calculate F1 scores of HM-LSTM for some layer’s boundary indicators (TODO: plot fancy figures)
4. (finished) Calculate BPC (LM evaluation metric) by these HM-LSTM on PTB
5. Train more models; compare the correlation/trending of F1 and BPC
Statistically analyze with PCFG from PTB:
1. (finished) Compute PCFGs from PTB
2. Pick the model with best syntactic meanings of HM-LSTM boundary indicators / highest F1 score
3. Find out if/what constituencies detected by HM-LSTM boundary coincide with PCFGs
QA on children book dataset
1. (finished) Setup data preprocessing, pipeline to hm-lstm model
2. (finished) Tune to improve test precision
3. Replace self embedding nets with GloVe pre-trained word embeddings
4. Beat the baseline performance of vanilla LSTM

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
hierarchical-rnn		hierarchical-rnn
treebank		treebank
.gitignore		.gitignore
README.md		README.md
ptb_pipe.sbt		ptb_pipe.sbt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Word and Constituent Boundaries in Hierarchical Multiscale Recurrent Neural Networks

Branches under development

References

Corpus

Usages

Parser Benchmark

Updated Progress

About

Releases

Packages

Contributors 2

Languages

guangyuzh/nlu-hmrnn

Folders and files

Latest commit

History

Repository files navigation

Word and Constituent Boundaries in Hierarchical Multiscale Recurrent Neural Networks

Branches under development

References

Corpus

Usages

Parser Benchmark

Updated Progress

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages