Profile-Guided Optimization (PGO) benchmark results #1426

zamazan4ik · 2024-01-07T22:27:19Z

Hi!

Writing this for the history. Maybe these results will be interesting to someone who trying to achieve better performance with tokenizers since the project cares about performance.

I test Profile-Guided Optimization (PGO) on different kinds of software - the current results are available here (with a lot of other PGO-related information). That's why I tried to optimize tokenizers with PGO too.

Test environment

I performed tests on my Linux-based machine.

Linux:

Fedora 39
Linux kernel 6.6.9
AMD Ryzen 9 5900x
48 Gib RAM
SSD Samsung 980 Pro 2 Tib
Compiler - Rustc 1.75
Tokenizers version: the latest for now from the main branch on commit f1c23b868006ee27acdd31796677f82fa10d6bd7
Disabled Turbo boost (for more stable results across runs)

Benchmarks

As a benchmark, I use built-in benchmarks with cargo bench -- --verbose command from the Makefile (if you want to reproduce my results - please check #1425 before). For the PGO optimization phase, I use cargo-pgo with cargo pgo optimize bench -- --verbose. For the PGO training phase, I use the same benchmark with cargo pgo bench -- --verbose.

Results

I got the following results:

Release: https://gist.github.com/zamazan4ik/e06dfed470e94bb6e47134b1c58513fb
PGO-optimized compared to Release: https://gist.github.com/zamazan4ik/5e4b58395d71f5d2a1c2bb27293737ab
(just for reference) PGO-instrumented compared to Release: https://gist.github.com/zamazan4ik/5440096a8a9b3265b402d9481eab3e10

As you see, in general, the Tokenizers' performance can be improved with PGO. I think this information can be written somewhere into the documentation, so users will be aware of PGO effects on the Tokenizers' performance and can decide to apply PGO for their Tokenizers' builds.

I already see some PGO mentions in the CI scripts but it's not clear - are Tokenizers packages PGO-optimized or not. As far as I can understand from the build scripts - they are not (but I could be wrong - please correct me in this case).

Please treat the issue just as a benchmark report - it's not an actual error, crash, or something like that.

The text was updated successfully, but these errors were encountered:

Narsil · 2024-01-08T08:46:43Z

Thanks for opening this.

If I read correctly, the improvements are in the 5-10% range, correct ?
Overall that's nice, but those benchmarks are not really representative enough to be used currently.

The reason is that tokenizers is made super modular (in order to support many different kinds of tokenizers, pretty much all in ML). And performance is highly related to the combo choice of normalizers/pre_tokenizers/models. Therefore I wouldn't use PGO just yet.

If you care about tokenizer performance that bad (in ML it's now mostly negligible runtime since it's not Python anymore), I encourage you to look at : https://github.com/microsoft/BlingFire which claims even faster tokenization (fastest claim I'm aware of).
There are also other libraries out there which claim faster performance.

tokenizers being very general cannot be the fastest library compared to highly specialized code for a given tokenizer. In the real of LM though, it shouldn't matter that much anymore

zamazan4ik · 2024-01-08T13:45:53Z

If I read correctly, the improvements are in the 5-10% range, correct ?

In general - yes, you are right. However, in some tests like "BPE GPT2 encode, no cache" improvements are up to 20%

but those benchmarks are not really representative enough to be used currently.

Hmm, it's interesting. What is the current purpose of these benchmarks?

Therefore I wouldn't use PGO just yet.

Fair point. Even if you don't want to integrate PGO into the Tokenizers build pipeline with some predefined PGO workload - that's completely fine, I understand the difficulty of this way. At least the numbers above could be interesting for the Tokenizers users who care about performance (and have no way/time/money) to switch to another tokenizer implementation. I hope the results are visible enough in this issue :)

Thanks a lot for the links to other tokenizers - I will try to optimize them with PGO as well.

Narsil · 2024-01-10T17:59:26Z

Hmm, it's interesting. What is the current purpose of these benchmarks?

Well to have an idea of how tokenizers works on a particularly useful task, not enough to guide PGO :)
And yes performance is most likely biased towards that particular tokenizer (It is in general biased towards space separated tokenizers, which are less and less used)

github-actions · 2024-02-10T01:47:12Z

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 5 days.

zamazan4ik mentioned this issue Jan 7, 2024

Benchmarks against other runtimes robertknight/rten#38

Open

zamazan4ik mentioned this issue Jan 9, 2024

Evaluate Profile-Guided Optimization (PGO) google/sentencepiece#961

Open

github-actions bot added the Stale label Feb 10, 2024

github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Feb 15, 2024

zamazan4ik mentioned this issue Apr 11, 2024

Evaluate Profile-Guided Optimization (PGO) performance benefits for the library guillaume-be/rust-tokenizers#103

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Profile-Guided Optimization (PGO) benchmark results #1426

Profile-Guided Optimization (PGO) benchmark results #1426

zamazan4ik commented Jan 7, 2024 •

edited

Loading

Narsil commented Jan 8, 2024

zamazan4ik commented Jan 8, 2024

Narsil commented Jan 10, 2024

github-actions bot commented Feb 10, 2024

Profile-Guided Optimization (PGO) benchmark results #1426

Profile-Guided Optimization (PGO) benchmark results #1426

Comments

zamazan4ik commented Jan 7, 2024 • edited Loading

Test environment

Benchmarks

Results

Narsil commented Jan 8, 2024

zamazan4ik commented Jan 8, 2024

Narsil commented Jan 10, 2024

github-actions bot commented Feb 10, 2024

zamazan4ik commented Jan 7, 2024 •

edited

Loading