Improve PCRE2 match performance for JIT and interpreted #13146

straight-shoota · 2023-03-03T11:19:44Z

This patch contains a number of individual steps that improve performance of PCRE2 matching greatly.

Global MatchContext that is allocated only once and shared between all match executions. JIT stack is assigned via a callback which returns the appropriate thread-local stack. The callback won't be called if JIT is not used. This improve JIT performance.
Cache MatchData per instance and thread. This avoids re-allocating the backtracking stack which improve interpreted performance
Now that the pointer to MatchData does not leak outside the local scope, there's no need to allocate it with the GC. libpcre2 can manage its memory on its own and the MatchData instances are handled in the bindings. This reduces GC pressure.

Using the benchmark program from #13144 (comment), I get these results:

$ bin/crystal run .test/bm-pcre2.cr --release             # (master)
starts_with? 284.80k (  3.51µs) (±18.92%)  20.1kB/op   7.43× slower
    matches?   2.12M (472.51ns) (±114.77%)    128B/op        fastest
$ bin/crystal run .test/bm-pcre2.cr --release             # (performance/pcre2-match_context)
starts_with?  18.26M ( 54.77ns) (±18.56%)  0.0B/op   1.52× slower
    matches?  27.67M ( 36.14ns) (±16.79%)  0.0B/op        fastest
$ bin/crystal run .test/bm-pcre2.cr --release -Duse_pcre1 # (master)
starts_with?  19.73M ( 50.67ns) (±14.41%)  16.0B/op   2.22× slower
    matches?  43.73M ( 22.87ns) (±17.42%)   0.0B/op        fastest

This shows a great improvement in match performance. The PCRE1 implementation is still significantly more performant in JIT mode (matches?).
A factor for this could be that the PCRE1 bindings are not thread safe. I'll leave investigation into this as a follow-up and consider the main regression as resolved.

Resolves #13144

src/regex/lib_pcre2.cr

src/regex/pcre2.cr

Refactor MatchContext via callback

41a9830

straight-shoota added performance topic:stdlib:text labels Mar 3, 2023

straight-shoota self-assigned this Mar 3, 2023

straight-shoota marked this pull request as draft March 3, 2023 11:38

straight-shoota mentioned this pull request Mar 3, 2023

Regex performance regression on PCRE2 #13144

Closed

HertzDevil reviewed Mar 3, 2023

View reviewed changes

src/regex/lib_pcre2.cr Outdated Show resolved Hide resolved

straight-shoota added 3 commits March 3, 2023 16:06

Cache MatchData per instance and thread

e77be1d

Drop unnecessary general_context configuration

a9d843e

Normalize and fix all void pointer types

5b41dcb

straight-shoota force-pushed the performance/pcre2-match_context branch from 90e66ef to 5b41dcb Compare March 3, 2023 15:06

straight-shoota marked this pull request as ready for review March 3, 2023 15:06

HertzDevil reviewed Mar 3, 2023

View reviewed changes

src/regex/pcre2.cr Outdated Show resolved Hide resolved

Free MatchData in all threads

91d4187

HertzDevil approved these changes Mar 3, 2023

View reviewed changes

straight-shoota added this to the 1.8.0 milestone Mar 3, 2023

straight-shoota merged commit 30f5d64 into crystal-lang:master Mar 6, 2023

straight-shoota deleted the performance/pcre2-match_context branch March 6, 2023 10:04

straight-shoota added a commit that referenced this pull request Mar 7, 2023

Improve PCRE2 match performance for JIT and interpreted (#13146)

4efd3c4

straight-shoota modified the milestones: 1.8.0, 1.7.3 Mar 7, 2023

This pull request was closed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve PCRE2 match performance for JIT and interpreted #13146

Improve PCRE2 match performance for JIT and interpreted #13146

straight-shoota commented Mar 3, 2023 •

edited

Loading

Improve PCRE2 match performance for JIT and interpreted #13146

Improve PCRE2 match performance for JIT and interpreted #13146

Conversation

straight-shoota commented Mar 3, 2023 • edited Loading

straight-shoota commented Mar 3, 2023 •

edited

Loading