-
Notifications
You must be signed in to change notification settings - Fork 507
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Element-Research rnn #60
Comments
Great work on Element-Research/rnn! (1) I wrote it partly because I enjoy implementing things, and tend to learn a lot by doing so. It started off as a challenge to see whether I could implement RNNs without any cloning, because I always seem to run into subtle bugs with clones. It was then a fun challenge to see how efficient I could make things. (2) From my point of view, torch-rnn and rnn have slightly different design philosophies; I'd be interested to see if you agree. It seems to me that rnn is designed to be very general, with a lot of great modular pieces that give you a ton of freedom to hook up different kinds of recurrences. However this freedom comes at a price: the API has a much larger surface area, and although I haven't run benchmarks myself I imagine that it ends up being a bit more inefficient, especially with regards to memory. I think that as it stands, if you just want to use a basic RNN or LSTM on sequences, then torch-rnn is a good choice; if you want to do anything fancier and especially if you want to play around with different recurrences, then rnn is the way to go. (3) I'd be open to merging. Currently torch-rnn is in a weird state where there are some reusable modules that feel more like a library, and also an application built around those modules for character-level language modeling. I'd like for the reusable modules to be easy to use in other projects, so I had already been thinking about splitting them out into a separate "library" repo and making this repo entirely an application. Merging with rnn might be a good way to achieve this effect. Justin |
Nice to meet you @jcjohnson! (1) Clones are so evil. It took me a lot to make the clone + param sharing as efficient as possible : https://github.com/Element-Research/dpnn/blob/master/Module.lua#L32-L161. Took me too many iterations and unit tests to get that working. And I can't agree more about the learning from your own implementations bit. I am the same. (2) Yeah I just noticed that you basically implement LSTM/RNN without using any existing modules. That must have been a lot of work! It must indeed result in blazing speed as you basically keep the Lua overhead to a bare minimum. (3) I would love to merge your code into the rnn package. We could definitely benefit from the speedup. For backward-compatibility I would still need to preserve the current version of LSTM, FastLSTM and what not. But we could include your LSTM as SequenceLSTM, FastestLSTM, FullLSTM, or whatnot. Basically, I would love it if you could use rnn as your library repo. If you are up for it, just fork rnn and send a PR. If you want, I can get you commit rights to the repository. I could use the help anyway. So what do you say? |
@jcjohnson I started working on a PR to integrate LSTM. I want to modify it to work with T x N inputs instead of N x T (like the rest of rnn). Hope that works for you? |
I can definitely concur with memory problems of cloning. I've got an in-house cloning implementation that's somewhat similar to the one in In my case, it seems like cloning In any case thank you all for this wonderful work and I do look forward to having the fast clone-less LSTM modules merged into |
Although it's an amazing effort and a very advanced Given that the learning curve of Torch is already steeper than others due to Lua, Tensor interface whatnot, it should be considered to make the rnn implementation a bit simpler. In other high level frameworks like Keras and Lasagne, all you need is just one (1) class, |
@gokceneraslan Yes Also for new users, the main point of confusion is what to use to build their RNN : lstm, torch-rnn, rnn, char-rnn, Recurrent? So many RNN packages. It would be nice to consolidate the main components into a single package so as not to duplicate effort... This point was raised at the NVIDIA GTC last week. |
@jcjohnson Here is the SeqLSTM so far : Element-Research/rnn#207 . For now I made a simple unit test to match its behaviour to FastLSTM. I still need to modify it to take inputs of size seqlen x batchsize x inputsize instead of batchsize x seqlen x inputsize. Also needs benchmarks and documentation. |
Ok so SeqLSTM is merged, unit tested and documented. |
Hi @jcjohnson,
So I work on the Element-Research/rnn package. I have a couple of questions:
The last questions came out from the NVIDIA GPU Tech Conference this week. If you prefer, we can talk offline.
Regards,
Nicholas Leonard
The text was updated successfully, but these errors were encountered: