Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate to sacremoses and add toktok tokenizer #361

Merged
merged 2 commits into from
Aug 7, 2018
Merged

migrate to sacremoses and add toktok tokenizer #361

merged 2 commits into from
Aug 7, 2018

Conversation

keon
Copy link
Contributor

@keon keon commented Aug 7, 2018

Moses tokenizer has been deleted from nltk because of a license issue.
The current build is broken because of this change.
This fix resolves #306
Also added toktok tokenizer based on the suggestion from @alvations

@mttk mttk merged commit da8bfac into pytorch:master Aug 7, 2018
@keon keon deleted the fix branch August 7, 2018 20:37
@alvations
Copy link

BTW, you can also use Toktok as a standalone now =)

https://github.com/alvations/toktok

@mttk
Copy link
Contributor

mttk commented Oct 19, 2018

@alvations thanks, will look into adding this soon :)

DavidHarrison added a commit to DavidHarrison/text that referenced this pull request Apr 25, 2019
Change documentation based on pytorch#361.
mttk pushed a commit that referenced this pull request May 7, 2019
Change documentation based on #361.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MosesTokenizer has been moved out of NLTK due to licensing issues
3 participants