-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
performance of countTokens #68
Comments
Hi @pczekaj. I cannot reproduce this. I'm when I benchmark it, even with your own sample text When including other samples in the benchmark (English, Chinese, French, code) it's even faster (3.5x faster than How are you running the benchmark? What tool do you use to benchmark? |
@niieani I'm executing it as jest test without any special benchmark software, I don't do anything special like invoking GC or warming it up. Screenshot is from IDE but I get similar results when executing it on command line:
I'm only checking total execution time, I don't track memory consumption, changing order of test cases didn't affect timing. |
Okay I've tried it with Got a couple of fixes and additional optimizations incoming... 💨 |
🎉 This issue has been resolved in version 2.8.0 🎉 The release is available on: Your semantic-release bot 📦🚀 |
Could you try again in 2.8.0 and let me know if it's any better? |
@niieani 2.8.0 is a lot faster than 2.7.0. Execution time went down from 11440 ms to just 615 ms which is much faster than tiktoken. Thank you very much |
Perfect! Thanks for your feedback. Pozdrawiam |
I comparing performance of
gpt-tokenizer 2.7.0
andtiktoken 1.0.17
, on Intel based Mac +node 22.11.0
I'm always getting worser times forgpt-tokenizer
than fortiktoken
. I'm I doing something wrong or is this expected?The text was updated successfully, but these errors were encountered: