-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Combine date processor patterns into single parser #83942
base: main
Are you sure you want to change the base?
Conversation
Pinging @elastic/es-data-management (Team:Data Management) |
Hi @danhermann, I've created a changelog YAML for you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM -- but even so I don't think it's fair for me to give a load bearing +1 on this.
@@ -72,10 +72,22 @@ | |||
this.targetField = targetField; | |||
this.formats = formats; | |||
this.dateParsers = new ArrayList<>(this.formats.size()); | |||
List<String> javaFormats = new ArrayList<>(this.formats.size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this can be final
for (String format : formats) { | ||
DateFormat dateFormat = DateFormat.fromString(format); | ||
dateParsers.add((params) -> dateFormat.getFunction(format, newDateTimeZone(params), newLocale(params))); | ||
if (DateFormat.Java == dateFormat) { | ||
javaFormats.add(format); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's probably worth doing our future selves a solid and adding a comment that explains what this is doing and why. Maybe something like:
// pull out the java formats separately so they can all be processed as a single combined date parser (see below)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (but, like I said, I think you should get a second set of 👀 on this)
The original motivation for this PR is the poor performance of the date processor when multiple formats are specified. For each format that fails to match the input, an exception is thrown by the JavaDateFormatter::doParse method (despite its somewhat confusing claim not to do so in the javadoc) and profiling that has shown that to be quite expensive in a number of common use cases. One potential solution proposed in #83801 is to provide a method that really does not throw exceptions on parsing failures. That solution would resolve all the performance problems with exceptions thrown for date parsing failures with no change in behavior for the date processor. Another approach proposed originally by @joegallo groups all the Java time formats specified in the date processor's |
Can you explain this a little bit more? The code appears to only throw an exception once all parsers have been tried. I'm not super familiar with the way that the ingest node date processor, so is it that it throws an exception for each date processor? Also, how is this expensive? I would not expect throwing a single exception for a document to be so expensive, is this still a problem after #83764? Have we done another flame graph after that change? |
That's the core issue this PR is addressing -- the date processor creates a distinct
In addition to the typical reasons for Java exceptions being slow and not recommended for flow control in tight loops, ingest pipelines tend to have deep stack traces which are extra expensive to gather. We have both profiler results and a number of bug reports in which the date processor accounts for more running time than the other 15 or 20 processors in the pipeline combined. |
Combines all custom patterns into a single parser so that no more than a single exception is thrown while searching for a matching pattern. This significantly improves performance in scenarios where multiple patterns are attempted and was suggested by @joegallo as a possible alternative to #83801.
Relates to #73918