Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve tokenizer performance of 20% #135

Merged
merged 2 commits into from
Sep 1, 2017
Merged

Improve tokenizer performance of 20% #135

merged 2 commits into from
Sep 1, 2017

Conversation

goetas
Copy link
Member

@goetas goetas commented Aug 31, 2017

this is an alternative version of #130 (thanks @MichaelHeerklotz)

It takes the same approach but it does not change method signatures

I've added also a basic benchmark:

Before:

Loading: 234.88136053085
Writing: 29.471714496613

After:

Loading: 189.25537729263
Writing: 30.500716209412

Performance change looks the same (20%)
@MichaelHeerklotz can you try it on your dataset?

@mattfarina
Copy link
Member

For SemVer compatibility I like the idea of not changing the method signature.

@goetas goetas merged commit b8afbae into 2.x Sep 1, 2017
@goetas goetas deleted the tokenizer-performance branch September 1, 2017 13:29
@goetas goetas mentioned this pull request Sep 1, 2017
@MichaelRoosz
Copy link

looks good!

@mundschenk-at
Copy link
Contributor

Maybe a good reason for a new 2.x release? :)

@goetas
Copy link
Member Author

goetas commented Sep 1, 2017 via email

@goetas goetas changed the title Tokenizer performance Improve tokenizer performance of 20% Sep 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants