Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FR] 90% speed up by refactoring and optimizing some code #385

Closed
genivia-inc opened this issue Apr 24, 2024 · 2 comments
Closed

[FR] 90% speed up by refactoring and optimizing some code #385

genivia-inc opened this issue Apr 24, 2024 · 2 comments
Labels
enhancement New feature or request

Comments

@genivia-inc
Copy link
Member

genivia-inc commented Apr 24, 2024

ugrep can run faster by refactoring the search logic to break up the large code block in advance() into separate functions that get called quicker e.g. by a switch or function pointer to skip conditionals. Breaking up this large function helps the compiler a lot to optimize this code better than having to analyze a large function body.

A bit of experimentation shows significant speed improvements are attainable on ARM64 NEON at least. So it is worth the effort to refactor this code that is not fully optimized by the compiler.

Even adding a dummy printf() statement runs the code faster (!) despite the overhead of IO. So yeah, compiler optimizations aren't kicking in a much as I want to at the moment. On a more serious note, this is not new to me. I taught several years of graduate level high-performance computing. I will more closely follow (my own) advice with the next release cycles. It's just work, not difficult to do.

With these optimizations and omitting line counting when possible, such as for option -c, when searching a 13GB file we can go from

$ time ugrep -c rol en.txt
1171415
        4.54 real         2.86 user         1.40 sys

to a much lower timing

$ time ugrep -c rol en.txt
1171415
        2.40 real         0.83 user         1.39 sys

which runs 90% faster on AArch64/NEON. Other search options will benefit anywhere from 20% to 100% speedup on AArch64/NEON. Because the compiler's register allocation, instruction scheduling and alias analysis are improved, I expect these changes will also speed up searching with SSE2/AVX2. A quick test confirms this, with the same runs on Intel MacOS giving a 15% speed up and a 90% speed up when searching for the word the.

Now I have to find time to work on this. Stay tuned!

@genivia-inc genivia-inc added the enhancement New feature or request label Apr 24, 2024
@genivia-inc genivia-inc changed the title [FR] Speeding search up by refactoring some code [FR] Speed up search by refactoring some code Apr 25, 2024
@genivia-inc genivia-inc changed the title [FR] Speed up search by refactoring some code [FR] 90% speed up by refactoring and optimizing some code Apr 25, 2024
@genivia-inc
Copy link
Member Author

genivia-inc commented Apr 29, 2024

OK, implemented and mostly tested over the weekend. Still some work to do. The executable is not larger, but faster. This update will be a lot faster on ARM devices that support NEON and AArch64.

  • updated SIMD algorithms
  • improved selection and specialization based on pattern characteristics
  • faster line counting, especially NEON/AArch64 is now super fast with new vector code that I came up with, including a fast alternative for vaddvq_s8 for horizontal vector addition on NEON
  • fix an obscure pattern match bug I found today in testing using a large generative test set I wrote some time ago to hit ugrep hard (that's how I found a bug in rg which I mention in one of my articles)

All should be ready by next week to release 6.0.

@genivia-inc
Copy link
Member Author

The ugrep 6.0 benchmarks are already posted: https://github.com/Genivia/ugrep-benchmarks

This shows that ugrep is (one of) the fastest grep. Please note that no grep can (and should) absolutely claim to be always the fastest, because there are different algorithms involved with pros and cons.

Ugrep 6.0 will be released soon!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant