Train from scratch #170

brandenchan · 2019-12-12T14:33:44Z

This branch implements language model training from scratch

brandenchan · 2019-12-12T14:34:07Z

Currently the models are not able to overfit to even small datasets

tholor · 2019-12-16T13:06:11Z

This might be an issue due to parameter initialization or it's just the normal model behaviour in this combination of dataset, drop_out, low learning rate etc.
We should try to run the same training using google's original script and compare if that one actually overfits.

tholor

Looking good to me for now. This should cover the basic functionality. We will have some more features in upcoming PRs, which will enable to really scale this (incl. training on AWS spot instances)

tholor · 2020-01-22T14:19:01Z

Closing as this was already merged as a part of #188

brandenchan added 2 commits December 9, 2019 11:32

Start implementation

8c2de60

latest hyperparams

9e6c9c5

tholor and others added 4 commits December 16, 2019 14:27

merging latest master

35a2f6c

Add dataset checkpointing

e26a339

Fix lm pretraining

56a4d8f

rename and update basic example

aa8eecd

tholor approved these changes Jan 16, 2020

View reviewed changes

tholor closed this Jan 22, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train from scratch #170

Train from scratch #170

brandenchan commented Dec 12, 2019

brandenchan commented Dec 12, 2019

tholor commented Dec 16, 2019

tholor left a comment

tholor commented Jan 22, 2020

Train from scratch #170

Train from scratch #170

Conversation

brandenchan commented Dec 12, 2019

brandenchan commented Dec 12, 2019

tholor commented Dec 16, 2019

tholor left a comment

Choose a reason for hiding this comment

tholor commented Jan 22, 2020