Replace two JavaCC lexers with a single JFlex one #2113

mateuszrzeszutek · 2021-01-25T17:09:07Z

I'mm introduce a single property to control the sanitization in next PR - this one just replaces the lexer implementation.

@breedx-splk if you find some time, could you run your petclinic benchmark against the agent built from this branch? I'd be very grateful for that 😊

mateuszrzeszutek · 2021-01-25T17:10:33Z

javaagent-api/javaagent-api.gradle

-    }
-  }
+def jflexTargetDir = file"${project.buildDir}/generated/jflex/sql"
+def jflexTask = tasks.create("sqlSanitizerJflex", org.xbib.gradle.plugin.JFlexTask) {


The jflex plugin was supposed to automatically create a task for this, but it's currently broken... I'll create an issue in their project. Meanwhile, a more manual approach still works

breedx-splk · 2021-01-25T23:24:42Z

javaagent-api/src/main/jflex/SqlSanitizer.flex

+        // main query FROM clause
+        expectingTableName = true;
+        return false;
+      } else {


would prefer to remove redundant else when if{} returns.

I think this may be subjective, and I don't think there's an established convention in this codebase (I've seen lots of both styles). Should we add to https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/docs/contributing/style-guideline.md? IIRC @jkwatson has expressed this preference previously as well.

breedx-splk

I really like this approach. I definitely appreciate that there are multiple duplicate blocks in type-specific normalizers that have collapsed into one simple statement now! 🎉

For those who wanted to see some additional context, @mateuszrzeszutek posted some benchmark results here:
#2065 (reply in thread)

I definitely glossed over the middle of the lexer impl, but didn't see anything too shocking. I'm going to also try running the other benchmarks (spring pet clinic rest + a6) and will report back.

breedx-splk · 2021-01-26T00:02:58Z

I ran the benchmarks this afternoon and got these results:
Looking great! 🎉

whch jar	avg (ms)	p95 (ms)
No agent	77.91	103.46
Latest snap (no cache, no jflex)	314.61	499.00
Latest snap normalizer disabled	200.10	306.28
Better-lexer	128.27	265.94
Better-lexer normalizer disabled	101.03	148.36
Better-lexer jdbc inst disabled	80.17	99.85

breedx-splk · 2021-01-26T00:24:52Z

I also looked at GC. The snapshot build had 85 GCs and this branch only had 29 GCs. The total time in stop the world GC decreased from 0.578s in the snapshot to 0.403s in this branch.

More data from the benchmarks:

Latest SNAPSHOT GC

Better-lexer branch

breedx-splk · 2021-01-26T01:04:47Z

And with the profile config, we can look at this significant reduction in TLABs. A very significant reduction in allocations! I guess it was a little surprising to me, because the app does so much HTTP as well (and I expected header and body parsing to also be hot), but the top methods clearly show that sanitization is 🔥 path.

SNAPSHOT TLABs

Better Lexer TLABs

javaagent-api/src/main/jflex/SqlSanitizer.flex

anuraaga · 2021-01-26T02:26:56Z

javaagent-api/src/main/jflex/SqlSanitizer.flex

@@ -0,0 +1,378 @@
+/*


While we're doing this, can we move to instrumentation-api folder? This isn't javaagent-specific. OK for another PR though.

I'll move the entire db package in the another PR - there are ~4 classes that need to be moved together.

anuraaga · 2021-01-26T02:44:45Z

javaagent-api/src/main/jflex/SqlSanitizer.flex

+    if (statement == null) {
+        return new SqlStatementInfo(null, null, null);
+    }
+    SqlSanitizer sanitizer = new SqlSanitizer(new java.io.StringReader(statement));


Man, if only we could replace zzBuffer with statement, reading from Reader into another char[] is just so pointless here. Oh well, if we find we don't need to maintain the grammar much, we might vendor in the generated code and make some tweaks to it later.

Yeah, that would be awesome... and probably would improve the performance even more, since char[] copying seems to take a lot of time in Jason's perf test. Unfortunately jflex does not allow this degree of customisation.

The generated code is pretty much unreadable.
I think that maybe we could use a semi-manual approach: keep a flex grammar file + e.g. patch file with customizations applied to the generated code; whenever somebody has to change something they would have to generate+apply changes by themselves. And fix the patch when it breaks 😅

anuraaga

Thanks!

mateuszrzeszutek · 2021-01-26T11:27:30Z

Thanks for performance tests @breedx-splk 🙏 It's great to see how much the response times & GC have improved - and this should get even better once we introduce the cache 🎉

I definitely appreciate that there are multiple duplicate blocks in type-specific normalizers that have collapsed into one simple statement now!

Those duplicates will disappear in next PR, when I introduce one property to rule them all 😄

johnbley · 2021-01-26T15:47:10Z

javaagent-api/src/main/jflex/SqlSanitizer.flex

+CLOSE_PAREN       = ")"
+OPEN_COMMENT      = "/*"
+CLOSE_COMMENT     = "*/"
+IDENTIFIER        = ([:letter:] | "_") ([:letter:] | [0-9] | [_.])*


I'm so happy that the abomination I created to represent the unicode letter character class is gone!

mateuszrzeszutek · 2021-01-26T18:17:19Z

(Rebased to get the netty 4.1 test fix)

trask

👍

mateuszrzeszutek requested review from johnbley and breedx-splk January 25, 2021 17:09

mateuszrzeszutek requested review from anuraaga, iNikem, jkwatson, pavolloffay, trask and tylerbenson as code owners January 25, 2021 17:09

mateuszrzeszutek commented Jan 25, 2021

View reviewed changes

breedx-splk reviewed Jan 25, 2021

View reviewed changes

breedx-splk approved these changes Jan 25, 2021

View reviewed changes

anuraaga reviewed Jan 26, 2021

View reviewed changes

Base automatically changed from master to main January 26, 2021 05:50

iNikem approved these changes Jan 26, 2021

View reviewed changes

anuraaga approved these changes Jan 26, 2021

View reviewed changes

mateuszrzeszutek mentioned this pull request Jan 26, 2021

feat(cassandra4): more attributes #1314

Merged

johnbley approved these changes Jan 26, 2021

View reviewed changes

Mateusz Rzeszutek added 4 commits January 26, 2021 19:16

Replace two JavaCC lexers with a single JFlex one

27ff463

Apply code review suggestions

eb2f516

spotless

c3bfb99

Fix tests

d56d116

mateuszrzeszutek force-pushed the better-lexer branch from 1de7e79 to d56d116 Compare January 26, 2021 18:16

trask approved these changes Jan 26, 2021

View reviewed changes

trask merged commit 20dadc1 into open-telemetry:main Jan 26, 2021

mateuszrzeszutek mentioned this pull request Jan 27, 2021

Use a single configuration property for db.statement sanitization #2125

Merged

mateuszrzeszutek deleted the better-lexer branch February 5, 2021 11:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace two JavaCC lexers with a single JFlex one #2113

Replace two JavaCC lexers with a single JFlex one #2113

mateuszrzeszutek commented Jan 25, 2021

mateuszrzeszutek Jan 25, 2021

breedx-splk Jan 25, 2021

trask Jan 26, 2021

breedx-splk left a comment

breedx-splk commented Jan 26, 2021 •

edited

Loading

breedx-splk commented Jan 26, 2021 •

edited

Loading

breedx-splk commented Jan 26, 2021 •

edited

Loading

anuraaga Jan 26, 2021

mateuszrzeszutek Jan 26, 2021

anuraaga Jan 26, 2021

mateuszrzeszutek Jan 26, 2021

anuraaga left a comment

mateuszrzeszutek commented Jan 26, 2021

johnbley Jan 26, 2021

mateuszrzeszutek commented Jan 26, 2021

trask left a comment

Replace two JavaCC lexers with a single JFlex one #2113

Replace two JavaCC lexers with a single JFlex one #2113

Conversation

mateuszrzeszutek commented Jan 25, 2021

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

breedx-splk left a comment

Choose a reason for hiding this comment

breedx-splk commented Jan 26, 2021 • edited Loading

breedx-splk commented Jan 26, 2021 • edited Loading

Latest SNAPSHOT GC

Better-lexer branch

breedx-splk commented Jan 26, 2021 • edited Loading

SNAPSHOT TLABs

Better Lexer TLABs

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anuraaga left a comment

Choose a reason for hiding this comment

mateuszrzeszutek commented Jan 26, 2021

Choose a reason for hiding this comment

mateuszrzeszutek commented Jan 26, 2021

trask left a comment

Choose a reason for hiding this comment

breedx-splk commented Jan 26, 2021 •

edited

Loading

breedx-splk commented Jan 26, 2021 •

edited

Loading

breedx-splk commented Jan 26, 2021 •

edited

Loading