-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Rollup] Only allow aggregating on multiples of configured interval #32052
[Rollup] Only allow aggregating on multiples of configured interval #32052
Conversation
We need to limit the search request aggregations to whole multiples of the configured interval for both histogram and date_histogram. Otherwise, agg buckets won't overlap with the rolled up buckets and the results will be incorrect. For histogram, the validation is very simple: request must be >= the config, and modulo evenly. Dates are more tricky. - If both request and config are fixed dates, we can convert to millis and treat them just like the histo - If both are calendar, we make sure the request is >= the config with a static lookup map that ranks the calendar values relatively. All calendar units are "singles", so they are evenly divisible already - We disallow any other combination (one fixed, one calendar, etc)
Pinging @elastic/es-search-aggs |
Oh, note, I want to describe the fixed/calendar semantics just a bit in the rollup docs... but I want to link to the date_histo docs once they are merged. So I'll revisit the docs some more after that PR goes in. |
Jenkins, run sample packaging tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good @polyfractal , I left a comment regarding one unit large interval that could be considered as fixed time interval (s
, m
, h
and d
).
x-pack/plugin/rollup/src/main/java/org/elasticsearch/xpack/rollup/RollupJobIdentifierUtils.java
Show resolved
Hide resolved
can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval. | ||
|
||
For example, if data is rolled up at hourlyintervals, the <<rollup-search>> API can aggregate on any time interval | ||
hourly or greater. Intervals that are less than an hour will throwan exception, since the data simply doesn't |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: throwan => "throw an"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: hourlyintervals => hourly intervals
@jimczi updated the docs to include a better description of the time units. Once the time docs PR merges I'll include a link to that as well. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, I left one question. Thanks for adding the docs.
@@ -190,7 +190,7 @@ GET /sensor_rollup/_rollup_search | |||
"timeline": { | |||
"date_histogram": { | |||
"field": "timestamp", | |||
"interval": "7d" | |||
"interval": "1w" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why changing to a calendar interval here ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I was changing units to fixed and just got carried away without thinking what I was doing. Thanks for spotting
…32052) We need to limit the search request aggregations to whole multiples of the configured interval for both histogram and date_histogram. Otherwise, agg buckets won't overlap with the rolled up buckets and the results will be incorrect. For histogram, the validation is very simple: request must be >= the config, and modulo evenly. Dates are more tricky. - If both request and config are fixed dates, we can convert to millis and treat them just like the histo - If both are calendar, we make sure the request is >= the config with a static lookup map that ranks the calendar values relatively. All calendar units are "singles", so they are evenly divisible already - We disallow any other combination (one fixed, one calendar, etc)
Backport of elastic#32052, with some changes. In 6.5, we forbid mixed intervals (one fixed, one calendar). But for a 6.4.x bugfix release this would be a pretty extreme break. So we continue to allow mixed intervals in 6.4, but log a deprecation warning. 6.4 also doesn't enforce that the request is a multiple of the config. We also log a deprecation if that is not the case, but don't prevent the query. Finally, mixed intervals are only used if a better job cap can't be found. To support the case where we have to compare fixed and calendar, there is a table of rough conversions (e.g. 1 day == 86400000ms). This lets us crudely ensure the user doesn't try to query `2d` on an index that is set for `1w`.
* 6.x: Mute test watcher usage stats output [Rollup] Fix FullClusterRestart test TEST: Disable soft-deletes in ParentChildTestCase TEST: Disable randomized soft-deletes settings Integrates soft-deletes into Elasticsearch (#33222) drop `index.shard.check_on_startup: fix` (#32279) Fix AwaitsFix issue number Mute SmokeTestWatcherWithSecurityIT testsi [DOCS] Moves ml folder from x-pack/docs to docs (#33248) TEST: mute more SmokeTestWatcherWithSecurityIT tests [DOCS] Move rollup APIs to docs (#31450) [DOCS] Rename X-Pack Commands section (#33005) Fixes SecurityIntegTestCase so it always adds at least one alias (#33296) TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307) MINOR: Remove Dead Code from PathTrie (#33280) (#33306) Fix pom for build-tools (#33300) Lazy evaluate java9home (#33301) SQL: test coverage for JdbcResultSet (#32813) Work around to be able to generate eclipse projects (#33295) Different handling for security specific errors in the CLI. Fix for #33230 (#33255) [ML] Refactor delimited file structure detection (#33233) SQL: Support multi-index format as table identifier (#33278) Enable forbiddenapis server java9 (#33245) [MUTE] SmokeTestWatcherWithSecurityIT flaky tests Add region ISO code to GeoIP Ingest plugin (#31669) (#33276) Don't be strict for 6.x Update serialization versions for custom IndexMetaData backport Replace IndexMetaData.Custom with Map-based custom metadata (#32749) Painless: Fix Bindings Bug (#33274) SQL: prevent duplicate generation for repeated aggs (#33252) TEST: Mute testMonitorClusterHealth Fix serialization of empty field capabilities response (#33263) Fix nested _source retrieval with includes/excludes (#33180) [DOCS] TLS file resources are reloadable (#33258) Watcher: Ensure TriggerEngine start replaces existing watches (#33157) Ignore module-info in jar hell checks (#33011) Fix docs build after #33241 [DOC] Repository GCS ADC not supported (#33238) Upgrade to latest Gradle 4.10 (#32801) Fix/30904 cluster formation part2 (#32877) Move file-based discovery to core (#33241) HLRC: add client side RefreshPolicy (#33209) [Kerberos] Add unsupported languages for tests (#33253) Watcher: Reload properly on remote shard change (#33167) Fix classpath security checks for external tests. (#33066) [Rollup] Only allow aggregating on multiples of configured interval (#32052) Added deprecation warning for rescore in scroll queries (#33070) Apply settings filter to get cluster settings API (#33247) [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability (#32743) HLRC: create base timed request class (#33216) HLRC: Use Optional in validation logic (#33104) Painless: Add Bindings (#33042)
…33284) Backport of #32052, with some changes. In 6.5, we forbid mixed intervals (one fixed, one calendar). But for a 6.4.x bugfix release this would be a pretty extreme break. So we continue to allow mixed intervals in 6.4, but log a deprecation warning. 6.4 also doesn't enforce that the request is a multiple of the config. We also log a deprecation if that is not the case, but don't prevent the query. Finally, mixed intervals are only used if a better job cap can't be found. To support the case where we have to compare fixed and calendar, there is a table of rough conversions (e.g. 1 day == 86400000ms). This lets us crudely ensure the user doesn't try to query `2d` on an index that is set for `1w`.
We need to limit request aggregations to whole multiples of the configured interval for both
histogram
anddate_histogram
. Otherwise, agg buckets won't overlap with the rolled buckets and the results will be incorrect.For histogram, the validation is very simple: request must be >= the config, and modulo evenly.
Dates are tricksier.
As an aside, the validation of dates was apparently broken before... it converted both request and config intervals to millis without regard to fixed vs calendar. So this is both an enhancement and a bugfix :)
/cc @colings86 this is what we were chatting about the other day
/cc @jen-huang @timroes