Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train from scratch #170

Closed
wants to merge 6 commits into from
Closed

Train from scratch #170

wants to merge 6 commits into from

Conversation

brandenchan
Copy link
Contributor

This branch implements language model training from scratch

@brandenchan
Copy link
Contributor Author

Currently the models are not able to overfit to even small datasets

@tholor
Copy link
Member

tholor commented Dec 16, 2019

This might be an issue due to parameter initialization or it's just the normal model behaviour in this combination of dataset, drop_out, low learning rate etc.
We should try to run the same training using google's original script and compare if that one actually overfits.

Copy link
Member

@tholor tholor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me for now. This should cover the basic functionality. We will have some more features in upcoming PRs, which will enable to really scale this (incl. training on AWS spot instances)

@tholor
Copy link
Member

tholor commented Jan 22, 2020

Closing as this was already merged as a part of #188

@tholor tholor closed this Jan 22, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants