-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] Performance Regression in 2.14 and 3.0 hourly_aggs in http_logs workload #13345
Comments
Tagging @getsaurabh02 @msfroh @bbarani to see if they may be aware of any changes. In parallel, looking through the commit history to see if I can find some commit which could've cause this. |
One of the commits (on the same day when regression started) which touch aggregation path slightly: 8332859 [Can be evaluated if this could have had some impact] |
@mgodwan This looks related to the #13179 where @bowenlan-amzn has added cluster setting to dynamically disable filter rewrite optimization. Based on the description it reduces the deciding threshold for rewrite filters from 1024 to 24. Meaning if the date histogram aggregation include more than 24 buckets (e.g. hourly aggregation of 1 day), we won't use the optimization After this change, we will probably see regression for date_histogram_hourly_agg of big5 workload. That will be handled after the long term solution merged in next. |
The change causing this is adding a dynamic cluster setting to decrease the threshold of apply our optimization on date histogram. The threshold is the number of filters rewritten from date histogram. Previous 1024 is reported to causing regression on pmc workload. Since it's a dynamic setting, it won't actually cause regression for users and instead giving them ability to tune for their workload. The PR for long term fix: #13317 |
Thanks @bowenlan-amzn
Is this setting enabled for the benchmark setup where we are seeing regression? |
The setting is a threshold. This operation of http workload currently exceed the threshold so our previous optimization is disabled, hence the regression. |
@bowenlan-amzn Do we need to revisit the threshold defaults in that case as the current ones have shown to cause regression? |
Fix/Improvements merged in |
Describe the bug
https://opensearch.org/benchmarks
The P90 latency observed for hourly_aggs query has regressed over the last weekk
Related component
Search:Aggregations
Expected behavior
Latency should not increase
Additional Details
No response
The text was updated successfully, but these errors were encountered: