by Tomas Mikolov, 2010-2012
Neural network based language models are nowdays among the most successful techniques for statistical language modeling. They can be easily applied in wide range of tasks, including automatic speech recognition and machine translation, and provide significant improvements over classic backoff n-gram models. The 'rnnlm' toolkit can be used to train, evaluate and use such models.
The goal of this toolkit is to speed up research progress in the language modeling field. First, by providing useful implementation that can demonstrate some of the principles. Second, for the empirical experiments when used in speech recognition and other applications. And finally third, by providing a strong state of the art baseline results, to which future research that aims to "beat state of the art techniques" should compare to.
rnnlm-0.1h - some older version of the toolkit
[rnnlm-0.4b](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/ho st/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz) - latest version of the toolkit
Basic examples - very useful for quick introduction (training, evaluation, hyperparameter selection, simple n-best list rescoring, etc.) - 35MB
Advanced examples - includes large scale experiments with speech lattices (n-best list rescoring, ...) - 235MB, by Stefan Kombrink
Slides from my presentation at Google - pdf
RNNLM is now integrated into Kaldi toolkit! Check this.
Example of data generated by 4-gram language model, by RNN model and by RNNME model (all models are trained on Broadcast news data, 400M/320M words) - check which generated sentences are easier to read!
Word projections from RNN-80 and [RNN-640](http://www.fit.vutbr.cz/~imikolov/rnnlm/word_projections-640.txt.g z) models trained on Broadcast news data + tool for computing the closest words. (extra large 1600-dimensional features from 3 models are here)
Tomas Mikolov - tmikolov@gmail.com
Stefan Kombrink - kombrink@fit.vutbr.cz
We would like to thank to all who have helped us with the development of this toolkit, either by providing advices or by testing it. Specially, thanks to Anoop Deoras, Sanjeev Khudanpur, Scott Novotney, Stefan Kombrink, Dan Povey, YongZhe Shi, Geoff Zweig.
Mikolov Tomá¹: Statistical Language Models based on Neural Networks. PhD thesis, Brno University of Technology, 2012. All the details that did not make it into the papers, more results on additional taks.
Mikolov Tomá¹, Sutskever Ilya, Deoras Anoop, Le Hai-Son, Kombrink Stefan, Èernocký Jan: Subword Language Modeling with Neural Networks. Not published (rejected from ICASSP 2012). Using subwords as basic units for RNNLMs has several advantages: no OOV rate, smaller model size and better speed. Just split the infrequent words into subword units.
Mikolov Tomá¹, Deoras Anoop, Povey Daniel, Burget Luká¹, Èernocký Jan: Strategies for Training Large Scale Neural Network Language Models, In: Proceedings of ASRU 2011 How to train RNN LM on a single core on 400M words in a few days, with 1% absolute improvement in WER on state of the art setup.
Mikolov Tomá¹, Kombrink Stefan, Deoras Anoop, Burget Luká¹, Èernocký Jan: RNNLM - Recurrent Neural Network Language Modeling Toolkit, In: ASRU 2011 Demo Session Brief description of the RNN LM toolkit that is available on this website.
Mikolov Tomá¹, Deoras Anoop, Kombrink Stefan, Burget Luká¹, Èernocký Jan: Empirical Evaluation and Combination of Advanced Language Modeling Techniques, In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT Comparison to other LMs shows that RNN LMs are state of the art by a large margin. Improvements inrease with more training data.
Kombrink Stefan, Mikolov Tomá¹, Karafiát Martin, Burget Luká¹: [Recurrent Neural Network based Language Modeling in Meeting Recognition](http://www.fit.vutbr.cz/~imikolov/rnnlm/ApplicationOfRNNinMeetingRe cognition_IS2011.pdf), In: Proceedings of the 12th Annual Conference of the International Speech Communication Association (INTERSPEECH 2011), Florence, IT Easy way how to adapt RNN LM + speedup tricks for rescoring (can be faster than 0.05 RT)
Deoras Anoop, Mikolov Tomá¹, Kombrink Stefan, Karafiát Martin, Khudanpur Sanjeev: [Variational Approximation of Long-span Language Models for LVCSR](http://www.fit.vutbr.cz/research/groups/speech/publi/2011/deoras_icassp20 11_5532.pdf), In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ RNN LM can be approximated by n-gram model, and used directly in the decoder at no compuational cost.
Mikolov Tomá¹, Kombrink Stefan, Burget Luká¹, Èernocký Jan, Khudanpur Sanjeev: [Extensions of Recurrent Neural Network Language Model](http://www.fit.vutbr.cz/research/groups/speech/publi/2011/mikolov_icassp2 011_5528.pdf), In: Proceedings of the 2011 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2011, Prague, CZ Better results by using Backpropagation through time and better speed by using classes.
Mikolov Tomá¹, Karafiát Martin, Burget Luká¹, Èernocký Jan, Khudanpur Sanjeev: [Recurrent neural network based language model](http://www.fit.vutbr.cz/research/groups/speech/publi/2010/mikolov_intersp eech2010_IS100722.pdf), In: Proceedings of the 11th Annual Conference of the International Speech Communication Association (INTERSPEECH 2010), Makuhari, Chiba, JP We show that RNN LM can be trained just by simple backpropagation, despite the popular beliefs.