Optimization: removed level of indirection for SubRuleContext. #173

svladykin · 2024-08-09T01:02:45Z

Description of changes:

Instead of Double id use SubRuleContext directly everywhere. SubRuleContext id is only used internally for backward compatibility in equals and hashCode (not sure if this is actually needed, but tests check that). The benchmarks are not stable (need JMH with forking, proper warmup and multiple longer iterations for individual benchmarks), but overall can see 3-8% in throughput improvement on some benchmarks with this easy fix.

Using long as id instead of double because long is typically faster and actually can produce more unique 64-bit patterns than double because for long every 64-bit pattern is valid while for double it is not true (NaN).

Changed only types but did not rename variables all over the place because it would be harder to review, can be done later as needed.

Benchmark / Performance (for source code changes):

Read 213068 events
Finding Rules...
Lots: 10000
Lots: 20000
Lots: 30000
Lots: 40000
Lots: 50000
Lots: 60000
Lots: 70000
Lots: 80000
Lots: 90000
Lots: 100000
Lots: 110000
Lots: 120000
Lots: 130000
Lots: 140000
Lots: 150000
Lots: 160000
Lots: 170000
Lots: 180000
Lots: 190000
Lots: 200000
Lots: 210000
Lines: 213068, Msec: 11459
Events/sec: 18593.9
 Rules/sec: 130157.6
Reading citylots2
Read 213068 events
EXACT events/sec: 274218.8
WILDCARD events/sec: 176526.9
PREFIX events/sec: 271078.9
PREFIX_EQUALS_IGNORE_CASE_RULES events/sec: 272465.5
SUFFIX events/sec: 260474.3
SUFFIX_EQUALS_IGNORE_CASE_RULES events/sec: 266002.5
EQUALS_IGNORE_CASE events/sec: 233883.6
NUMERIC events/sec: 145937.0
ANYTHING-BUT events/sec: 129919.5
ANYTHING-BUT-IGNORE-CASE events/sec: 142902.7
ANYTHING-BUT-PREFIX events/sec: 153507.2
ANYTHING-BUT-SUFFIX events/sec: 150365.6
ANYTHING-BUT-WILDCARD events/sec: 160322.0
COMPLEX_ARRAYS events/sec: 32078.9
PARTIAL_COMBO events/sec: 49037.5
COMBO events/sec: 20434.3

Overall the becnhmark results are not conclusive across diffrent jvm versions: the same benchmark can be noticeably better or worse, from what I see from main branch history they are quite flaky. Will try to improve benhcmarks in a separate PR.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

baldawar · 2024-08-29T19:45:50Z

hey @svladykin is this ready to review? its' been in draft state which has me confused on the state of the PR.

baldawar · 2024-08-29T19:48:03Z

src/main/software/amazon/event/ruler/SubRuleContext.java

-        private final Map<Object, Set<Double>> nameToIds = new ConcurrentHashMap<>();
-        private final Map<Double, Object> idToName = new ConcurrentHashMap<>();
+        private final Map<Object, Set<SubRuleContext>> nameToContext = new ConcurrentHashMap<>();
+        private long nextId;


Should this be AtomicLong ?

From what I see the only place where it is called is GenericMachine.addPatternRule() under synchronized(this), so AtomicLong does not make much sense.

baldawar · 2024-08-29T20:15:26Z

src/main/software/amazon/event/ruler/NameState.java

@@ -175,12 +175,11 @@ boolean isEmpty() {
     * @param pattern The pattern used by the sub-rule to transition to this NameState.
     * @param isTerminal True indicates that the sub-rule is using pattern to match on the final event field.
     */
-    void addSubRule(final Object rule, final double subRuleId, final Patterns pattern, final boolean isTerminal) {
+    void addSubRule(final Object rule, final SubRuleContext subRuleId, final Patterns pattern, final boolean isTerminal) {


wondering if here and elsewhere in the class if rule can be fetched from subRuleId instead

This is possible, but looks like if we do that everywhere the patch will be much larger, at the same time I don't see much improvement from this change. Either option is fine with me.

I think that's an acceptable next step for this PR, though digging into JMH variances would be better place to focus right now.

svladykin · 2024-08-30T04:13:04Z

@baldawar I did not like the Benchmark numbers for this PR, thus submitted a second one with JMH benchmarks, assuming that I will be able to get more meaningful perfromance numbers here when the second PR is merged. This is why it is still in draft status.

svladykin · 2024-09-17T04:51:41Z

Rebased on top of main to check JMH throughput numbers.

baldawar · 2024-09-19T16:27:27Z

odd, perf tests are mixed bag when compared to https://github.com/aws/event-ruler/actions/runs/10805542552/job/30018080464 . Is it because of most of the time is spent in other parts of the code (my recent tests showed ByteMachine.getTransitionOn and code tied to exists matcher to be quite expensive)

svladykin · 2024-09-19T20:09:56Z

Yes, I don't see any throughput improvement as well. We can either consider this patch to be just a code cleanup or just close it and stop spending time on it. I'm good with either option.

I also noticed that json parsing takes around 60% of the time for simple rules, which makes me think that using custom binary format instead of json could help a lot.

timbray · 2024-09-19T21:52:15Z

FWIW, Quamina has a custom hand crafted JSON parser for events, and the benefits of that were huge. But didn't bother for rule parsing.

baldawar · 2024-09-19T22:11:04Z

useful bookmark for the custom JSON parser :

Ruler's tests make parsing look as the worst offender because we keep adding / removing json entries to setup the tests. In the wild, most of the time ruler is limited by the time-spent doing array consistency checks, and parsing numbers. There's a fair amount of usage of exists and anything-but matchers which need cleanup (to follow how wildcard matcher was implemented). Optimizing JSON parsing will still be meaningful, but so far I've found most folks to be content with the current speed.

For this change, I think its worth merging if we can add a test that shows SubRuleContext is now faster. Benchmark would be over kill but unit test would be good enough with bunch of for-loops similar to ComparableNumberTest.java is good enough IMO.

baldawar · 2024-10-01T17:37:00Z

@svladykin let me know if you're still working on this.

svladykin · 2024-10-01T17:47:14Z

Yes, was a bit busy lately, will catch up this week.

svladykin · 2024-10-11T06:06:20Z

Added a simple performance test which takes ~2500ms on the old code and ~1400ms on the new code.

baldawar · 2024-10-15T16:08:26Z

Thanks @svladykin

baldawar reviewed Aug 29, 2024

View reviewed changes

svladykin force-pushed the id-self branch 2 times, most recently from af1fb23 to 4801b92 Compare September 17, 2024 04:49

svladykin marked this pull request as ready for review September 17, 2024 04:50

svladykin added 3 commits October 10, 2024 22:32

Use self instead of double id for SubRuleContext

0df2606

unused import

d729190

added a simple performance test

c1d3dc9

svladykin force-pushed the id-self branch from 4801b92 to c1d3dc9 Compare October 11, 2024 06:05

baldawar enabled auto-merge (squash) October 14, 2024 23:29

baldawar approved these changes Oct 15, 2024

View reviewed changes

baldawar merged commit d098560 into aws:main Oct 15, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimization: removed level of indirection for SubRuleContext. #173

Optimization: removed level of indirection for SubRuleContext. #173

svladykin commented Aug 9, 2024 •

edited

Loading

baldawar commented Aug 29, 2024

baldawar Aug 29, 2024

svladykin Aug 30, 2024

baldawar Aug 29, 2024

svladykin Aug 30, 2024

baldawar Aug 30, 2024

svladykin commented Aug 30, 2024

svladykin commented Sep 17, 2024

baldawar commented Sep 19, 2024 •

edited

Loading

svladykin commented Sep 19, 2024

timbray commented Sep 19, 2024

baldawar commented Sep 19, 2024 •

edited

Loading

baldawar commented Oct 1, 2024

svladykin commented Oct 1, 2024

svladykin commented Oct 11, 2024

baldawar commented Oct 15, 2024

Optimization: removed level of indirection for SubRuleContext. #173

Optimization: removed level of indirection for SubRuleContext. #173

Conversation

svladykin commented Aug 9, 2024 • edited Loading

Description of changes:

Benchmark / Performance (for source code changes):

Overall the becnhmark results are not conclusive across diffrent jvm versions: the same benchmark can be noticeably better or worse, from what I see from main branch history they are quite flaky. Will try to improve benhcmarks in a separate PR.

baldawar commented Aug 29, 2024

baldawar Aug 29, 2024

Choose a reason for hiding this comment

svladykin Aug 30, 2024

Choose a reason for hiding this comment

baldawar Aug 29, 2024

Choose a reason for hiding this comment

svladykin Aug 30, 2024

Choose a reason for hiding this comment

baldawar Aug 30, 2024

Choose a reason for hiding this comment

svladykin commented Aug 30, 2024

svladykin commented Sep 17, 2024

baldawar commented Sep 19, 2024 • edited Loading

svladykin commented Sep 19, 2024

timbray commented Sep 19, 2024

baldawar commented Sep 19, 2024 • edited Loading

baldawar commented Oct 1, 2024

svladykin commented Oct 1, 2024

svladykin commented Oct 11, 2024

baldawar commented Oct 15, 2024

svladykin commented Aug 9, 2024 •

edited

Loading

baldawar commented Sep 19, 2024 •

edited

Loading

baldawar commented Sep 19, 2024 •

edited

Loading