-
Notifications
You must be signed in to change notification settings - Fork 816
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Profile-Guided Optimization (PGO) benchmark results #1426
Comments
Thanks for opening this. If I read correctly, the improvements are in the 5-10% range, correct ? The reason is that If you care about tokenizer performance that bad (in ML it's now mostly negligible runtime since it's not Python anymore), I encourage you to look at : https://github.com/microsoft/BlingFire which claims even faster tokenization (fastest claim I'm aware of).
|
In general - yes, you are right. However, in some tests like "BPE GPT2 encode, no cache" improvements are up to 20%
Hmm, it's interesting. What is the current purpose of these benchmarks?
Fair point. Even if you don't want to integrate PGO into the Tokenizers build pipeline with some predefined PGO workload - that's completely fine, I understand the difficulty of this way. At least the numbers above could be interesting for the Tokenizers users who care about performance (and have no way/time/money) to switch to another tokenizer implementation. I hope the results are visible enough in this issue :) Thanks a lot for the links to other tokenizers - I will try to optimize them with PGO as well. |
Well to have an idea of how tokenizers works on a particularly useful task, not enough to guide PGO :) |
This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days. |
Hi!
Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with
tokenizers
since the project cares about performance.I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). That's why I tried to optimize
tokenizers
with PGO too.Test environment
I performed tests on my Linux-based machine.
Linux:
main
branch on commitf1c23b868006ee27acdd31796677f82fa10d6bd7
Benchmarks
As a benchmark, I use built-in benchmarks with
cargo bench -- --verbose
command from the Makefile (if you want to reproduce my results - please check #1425 before). For the PGO optimization phase, I use cargo-pgo withcargo pgo optimize bench -- --verbose
. For the PGO training phase, I use the same benchmark withcargo pgo bench -- --verbose
.Results
I got the following results:
As you see, in general, the Tokenizers' performance can be improved with PGO. I think this information can be written somewhere into the documentation, so users will be aware of PGO effects on the Tokenizers' performance and can decide to apply PGO for their Tokenizers' builds.
I already see some PGO mentions in the CI scripts but it's not clear - are Tokenizers packages PGO-optimized or not. As far as I can understand from the build scripts - they are not (but I could be wrong - please correct me in this case).
Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that.
The text was updated successfully, but these errors were encountered: