-
Notifications
You must be signed in to change notification settings - Fork 894
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace two JavaCC lexers with a single JFlex one #2113
Conversation
} | ||
} | ||
def jflexTargetDir = file"${project.buildDir}/generated/jflex/sql" | ||
def jflexTask = tasks.create("sqlSanitizerJflex", org.xbib.gradle.plugin.JFlexTask) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The jflex plugin was supposed to automatically create a task for this, but it's currently broken... I'll create an issue in their project. Meanwhile, a more manual approach still works
// main query FROM clause | ||
expectingTableName = true; | ||
return false; | ||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would prefer to remove redundant else
when if{}
returns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this may be subjective, and I don't think there's an established convention in this codebase (I've seen lots of both styles). Should we add to https://github.com/open-telemetry/opentelemetry-java-instrumentation/blob/master/docs/contributing/style-guideline.md? IIRC @jkwatson has expressed this preference previously as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this approach. I definitely appreciate that there are multiple duplicate blocks in type-specific normalizers that have collapsed into one simple statement now! 🎉
For those who wanted to see some additional context, @mateuszrzeszutek posted some benchmark results here:
#2065 (reply in thread)
I definitely glossed over the middle of the lexer impl, but didn't see anything too shocking. I'm going to also try running the other benchmarks (spring pet clinic rest + a6) and will report back.
I ran the benchmarks this afternoon and got these results:
|
And with the SNAPSHOT TLABsBetter Lexer TLABs |
@@ -0,0 +1,378 @@ | |||
/* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we're doing this, can we move to instrumentation-api
folder? This isn't javaagent-specific. OK for another PR though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll move the entire db
package in the another PR - there are ~4 classes that need to be moved together.
if (statement == null) { | ||
return new SqlStatementInfo(null, null, null); | ||
} | ||
SqlSanitizer sanitizer = new SqlSanitizer(new java.io.StringReader(statement)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Man, if only we could replace zzBuffer
with statement
, reading from Reader into another char[]
is just so pointless here. Oh well, if we find we don't need to maintain the grammar much, we might vendor in the generated code and make some tweaks to it later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that would be awesome... and probably would improve the performance even more, since char[] copying seems to take a lot of time in Jason's perf test. Unfortunately jflex does not allow this degree of customisation.
The generated code is pretty much unreadable.
I think that maybe we could use a semi-manual approach: keep a flex grammar file + e.g. patch file with customizations applied to the generated code; whenever somebody has to change something they would have to generate+apply changes by themselves. And fix the patch when it breaks 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
Thanks for performance tests @breedx-splk 🙏 It's great to see how much the response times & GC have improved - and this should get even better once we introduce the cache 🎉
Those duplicates will disappear in next PR, when I introduce one property to rule them all 😄 |
CLOSE_PAREN = ")" | ||
OPEN_COMMENT = "/*" | ||
CLOSE_COMMENT = "*/" | ||
IDENTIFIER = ([:letter:] | "_") ([:letter:] | [0-9] | [_.])* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm so happy that the abomination I created to represent the unicode letter character class is gone!
1de7e79
to
d56d116
Compare
(Rebased to get the netty 4.1 test fix) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
I'mm introduce a single property to control the sanitization in next PR - this one just replaces the lexer implementation.
@breedx-splk if you find some time, could you run your petclinic benchmark against the agent built from this branch? I'd be very grateful for that 😊