-
Notifications
You must be signed in to change notification settings - Fork 24.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Ingest Pipeline] Date processor performance issues - Discussion #73918
Comments
Pinging @elastic/es-core-features (Team:Core/Features) |
I spent some time looking into this and saw that there was a significant increase in the execution time (~10x) of the date processor between the 7.4.2 and 7.5.0 releases. There were no changes to the date processor in that timeframe, but there were changes to several of the classes on which it depends and the changes to |
I think the difference between a custom I am not sure why #46654 could affect the performance. Maybe the creation of dateformatter has become more expensive? We used to have a mini optimisation when a format is a single pattern (not using ||) but it was removed in #48703 - just adding it back, did not change anything for me. |
I have tried running some benchmarks to get flame graphs and I am not sure if there really is a performance change. The flame graph from a 6min profile is not showing that more time is spent in There was a change in how the time is calculated for ingest processor https://github.com/elastic/elasticsearch/pull/46241/files |
Creating date formatter objects is indeed expensive if you need to create millions of them. Since these classes are also not thread safe, consider using a static ThreadLocal to keep an instance (or map of instances) around for each thread. |
I made a simple ThreadLocalMap for you here: |
Thanks for looking into this further, @pgomulka. The flame graph results are helpful. While there were significant changes in #46241, the date processor is not async so I would expect any timing changes to be equally relevant to the other processors in the example pipeline above. I'm going to talk to @martijnvg and see if we can narrow down some of the possibilities here. |
Pinging @elastic/es-data-management (Team:Data Management) |
#92880 will make a difference here, too. |
Version 8.7.0 contained several related performance changes for ingest pipelines. We're going to close this for now but will reopen if necessary. |
When working with beats and elastic-agent integrations, there are occasions in which the ingest pipelines might be slower than intended, looking at benchmarking stats it has pointed us towards the
date processor
being the culprit.With integrations that potentially have tens or hundreds of processors, a single date processor will still take up almost more time than almost the rest combined, no matter the complexity.
Just to give a small example to reproduce it on a brand new deployed ESS cluster, with 3 nodes, 8GB of ram and version 7.13.1:
First I create a new ingest pipeline, that has most of the available processors today:
After this, I start ingesting some test documents, I tested with 80 documents only, because I already started to see the results I wanted to prove:
After ingesting about 80 documents, this is the output of the node stats API for ingest metrics:
As you can see, the amount of difference already is quite significant, while most processors spent less than 1 millisecond going through 80 documents, the date processor, with a single date format is already up in 26ms, and this value will just keep on getting further and further, and would only increase with ingest rates being much higher than in my test scenario.
The text was updated successfully, but these errors were encountered: