Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Rollup] Only allow aggregating on multiples of configured interval #32052

Merged
merged 13 commits into from
Aug 29, 2018

Conversation

polyfractal
Copy link
Contributor

We need to limit request aggregations to whole multiples of the configured interval for both histogram and date_histogram. Otherwise, agg buckets won't overlap with the rolled buckets and the results will be incorrect.

For histogram, the validation is very simple: request must be >= the config, and modulo evenly.

Dates are tricksier.

  • If both request and config are fixed dates, we can convert to millis and treat them just like a regular histo (request must be >= the config, and modulo evenly)
  • If both are calendar, we make sure the request is >= the config with a static lookup map that ranks the calendar values relatively. All calendar units are "singles", so they are evenly divisible already
  • We disallow any other combination (one fixed and one calendar, etc)

As an aside, the validation of dates was apparently broken before... it converted both request and config intervals to millis without regard to fixed vs calendar. So this is both an enhancement and a bugfix :)

/cc @colings86 this is what we were chatting about the other day

/cc @jen-huang @timroes

We need to limit the search request aggregations to whole multiples
of the configured interval for both histogram and date_histogram.
Otherwise, agg buckets won't overlap with the rolled up buckets
and the results will be incorrect.

For histogram, the validation is very simple: request must be >= the config,
and modulo evenly.

Dates are more tricky.
- If both request and config are fixed dates, we can convert to millis
and treat them just like the histo
- If both are calendar, we make sure the request is >= the config with
a static lookup map that ranks the calendar values relatively.  All
calendar units are "singles", so they are evenly divisible already
- We disallow any other combination (one fixed, one calendar, etc)
@polyfractal polyfractal added >enhancement review v7.0.0 :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.4.0 labels Jul 13, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@polyfractal
Copy link
Contributor Author

Oh, note, I want to describe the fixed/calendar semantics just a bit in the rollup docs... but I want to link to the date_histo docs once they are merged. So I'll revisit the docs some more after that PR goes in.

@polyfractal
Copy link
Contributor Author

Jenkins, run sample packaging tests

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good @polyfractal , I left a comment regarding one unit large interval that could be considered as fixed time interval (s, m, h and d).

can only search/aggregate the rollup data with an interval that is greater-than or equal to the configured rollup interval.

For example, if data is rolled up at hourlyintervals, the <<rollup-search>> API can aggregate on any time interval
hourly or greater. Intervals that are less than an hour will throwan exception, since the data simply doesn't
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: throwan => "throw an"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: hourlyintervals => hourly intervals

@lcawl lcawl added v6.4.1 and removed v6.4.0 labels Aug 23, 2018
@polyfractal
Copy link
Contributor Author

@jimczi updated the docs to include a better description of the time units. Once the time docs PR merges I'll include a link to that as well.

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, I left one question. Thanks for adding the docs.

@@ -190,7 +190,7 @@ GET /sensor_rollup/_rollup_search
"timeline": {
"date_histogram": {
"field": "timestamp",
"interval": "7d"
"interval": "1w"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why changing to a calendar interval here ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I was changing units to fixed and just got carried away without thinking what I was doing. Thanks for spotting

@polyfractal polyfractal added v6.5.0 and removed review labels Aug 29, 2018
@polyfractal polyfractal merged commit d93b2a2 into elastic:master Aug 29, 2018
polyfractal added a commit that referenced this pull request Aug 29, 2018
…32052)

We need to limit the search request aggregations to whole multiples
of the configured interval for both histogram and date_histogram.
Otherwise, agg buckets won't overlap with the rolled up buckets
and the results will be incorrect.

For histogram, the validation is very simple: request must be >= the config,
and modulo evenly.

Dates are more tricky.
- If both request and config are fixed dates, we can convert to millis
and treat them just like the histo
- If both are calendar, we make sure the request is >= the config with
a static lookup map that ranks the calendar values relatively.  All
calendar units are "singles", so they are evenly divisible already
- We disallow any other combination (one fixed, one calendar, etc)
polyfractal added a commit to polyfractal/elasticsearch that referenced this pull request Aug 30, 2018
Backport of elastic#32052, with some changes.

In 6.5, we forbid mixed intervals (one fixed, one calendar).  But for a
6.4.x bugfix release this would be a pretty extreme break.  So we continue
to allow mixed intervals in 6.4, but log a deprecation warning.

6.4 also doesn't enforce that the request is a multiple of the config.
We also log a deprecation if that is not the case, but don't prevent the
query.

Finally, mixed intervals are only used if a better job cap can't be found.

To support the case where we have to compare fixed and calendar, there is
a table of rough conversions (e.g. 1 day == 86400000ms).  This lets us
crudely ensure the user doesn't try to query `2d` on an index that is set
for `1w`.
dnhatn added a commit that referenced this pull request Sep 1, 2018
* 6.x:
  Mute test watcher usage stats output
  [Rollup] Fix FullClusterRestart test
  TEST: Disable soft-deletes in ParentChildTestCase
  TEST: Disable randomized soft-deletes settings
  Integrates soft-deletes into Elasticsearch (#33222)
  drop `index.shard.check_on_startup: fix` (#32279)
  Fix AwaitsFix issue number
  Mute SmokeTestWatcherWithSecurityIT testsi
  [DOCS] Moves ml folder from x-pack/docs to docs (#33248)
  TEST: mute more SmokeTestWatcherWithSecurityIT tests
  [DOCS] Move rollup APIs to docs (#31450)
  [DOCS] Rename X-Pack Commands section (#33005)
  Fixes SecurityIntegTestCase so it always adds at least one alias (#33296)
  TESTS: Fix Random Fail in MockTcpTransportTests (#33061) (#33307)
  MINOR: Remove Dead Code from PathTrie (#33280) (#33306)
  Fix pom for build-tools (#33300)
  Lazy evaluate java9home (#33301)
  SQL: test coverage for JdbcResultSet (#32813)
  Work around to be able to generate eclipse projects (#33295)
  Different handling for security specific errors in the CLI. Fix for #33230 (#33255)
  [ML] Refactor delimited file structure detection (#33233)
  SQL: Support multi-index format as table identifier (#33278)
  Enable forbiddenapis server java9 (#33245)
  [MUTE] SmokeTestWatcherWithSecurityIT flaky tests
  Add region ISO code to GeoIP Ingest plugin (#31669) (#33276)
  Don't be strict for 6.x
  Update serialization versions for custom IndexMetaData backport
  Replace IndexMetaData.Custom with Map-based custom metadata (#32749)
  Painless: Fix Bindings Bug (#33274)
  SQL: prevent duplicate generation for repeated aggs (#33252)
  TEST: Mute testMonitorClusterHealth
  Fix serialization of empty field capabilities response (#33263)
  Fix nested _source retrieval with includes/excludes (#33180)
  [DOCS] TLS file resources are reloadable (#33258)
  Watcher: Ensure TriggerEngine start replaces existing watches (#33157)
  Ignore module-info in jar hell checks (#33011)
  Fix docs build after #33241
  [DOC] Repository GCS ADC not supported (#33238)
  Upgrade to latest Gradle 4.10  (#32801)
  Fix/30904 cluster formation part2 (#32877)
  Move file-based discovery to core (#33241)
  HLRC: add client side RefreshPolicy (#33209)
  [Kerberos] Add unsupported languages for tests (#33253)
  Watcher: Reload properly on remote shard change (#33167)
  Fix classpath security checks for external tests. (#33066)
  [Rollup] Only allow aggregating on multiples of configured interval (#32052)
  Added deprecation warning for rescore in scroll queries (#33070)
  Apply settings filter to get cluster settings API (#33247)
  [Rollup] Re-factor Rollup Indexer into a generic indexer for re-usability   (#32743)
  HLRC: create base timed request class (#33216)
  HLRC: Use Optional in validation logic (#33104)
  Painless: Add Bindings (#33042)
jimczi pushed a commit that referenced this pull request Sep 3, 2018
…33284)

Backport of #32052, with some changes.

In 6.5, we forbid mixed intervals (one fixed, one calendar).  But for a
6.4.x bugfix release this would be a pretty extreme break.  So we continue
to allow mixed intervals in 6.4, but log a deprecation warning.

6.4 also doesn't enforce that the request is a multiple of the config.
We also log a deprecation if that is not the case, but don't prevent the
query.

Finally, mixed intervals are only used if a better job cap can't be found.

To support the case where we have to compare fixed and calendar, there is
a table of rough conversions (e.g. 1 day == 86400000ms).  This lets us
crudely ensure the user doesn't try to query `2d` on an index that is set
for `1w`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :StorageEngine/Rollup Turn fine-grained time-based data into coarser-grained data v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants