Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve PCRE2 match performance for JIT and interpreted #13146

Conversation

straight-shoota
Copy link
Member

@straight-shoota straight-shoota commented Mar 3, 2023

This patch contains a number of individual steps that improve performance of PCRE2 matching greatly.

  • Global MatchContext that is allocated only once and shared between all match executions. JIT stack is assigned via a callback which returns the appropriate thread-local stack. The callback won't be called if JIT is not used. This improve JIT performance.
  • Cache MatchData per instance and thread. This avoids re-allocating the backtracking stack which improve interpreted performance
  • Now that the pointer to MatchData does not leak outside the local scope, there's no need to allocate it with the GC. libpcre2 can manage its memory on its own and the MatchData instances are handled in the bindings. This reduces GC pressure.

Using the benchmark program from #13144 (comment), I get these results:

$ bin/crystal run .test/bm-pcre2.cr --release             # (master)
starts_with? 284.80k (  3.51µs) (±18.92%)  20.1kB/op   7.43× slower
    matches?   2.12M (472.51ns) (±114.77%)    128B/op        fastest
$ bin/crystal run .test/bm-pcre2.cr --release             # (performance/pcre2-match_context)
starts_with?  18.26M ( 54.77ns) (±18.56%)  0.0B/op   1.52× slower
    matches?  27.67M ( 36.14ns) (±16.79%)  0.0B/op        fastest
$ bin/crystal run .test/bm-pcre2.cr --release -Duse_pcre1 # (master)
starts_with?  19.73M ( 50.67ns) (±14.41%)  16.0B/op   2.22× slower
    matches?  43.73M ( 22.87ns) (±17.42%)   0.0B/op        fastest

This shows a great improvement in match performance. The PCRE1 implementation is still significantly more performant in JIT mode (matches?).
A factor for this could be that the PCRE1 bindings are not thread safe. I'll leave investigation into this as a follow-up and consider the main regression as resolved.

Resolves #13144

src/regex/lib_pcre2.cr Outdated Show resolved Hide resolved
@straight-shoota straight-shoota force-pushed the performance/pcre2-match_context branch from 90e66ef to 5b41dcb Compare March 3, 2023 15:06
@straight-shoota straight-shoota marked this pull request as ready for review March 3, 2023 15:06
src/regex/pcre2.cr Outdated Show resolved Hide resolved
@straight-shoota straight-shoota added this to the 1.8.0 milestone Mar 3, 2023
@straight-shoota straight-shoota merged commit 30f5d64 into crystal-lang:master Mar 6, 2023
@straight-shoota straight-shoota deleted the performance/pcre2-match_context branch March 6, 2023 10:04
@straight-shoota straight-shoota modified the milestones: 1.8.0, 1.7.3 Mar 7, 2023
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Regex performance regression on PCRE2
2 participants