Fail variable_width_histogram that collects from many #58619

nik9000 · 2020-06-26T19:36:19Z

Adds an explicit check to variable_width_histogram to stop it from
trying to collect from many buckets because it can't. I tried to make it
do so but that is more than an afternoon's project, sadly. So for now we
just disallow it.

Relates to #42035

Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to elastic#42035

elasticmachine · 2020-06-26T19:36:20Z

Pinging @elastic/es-analytics-geo (:Analytics/Aggregations)

nik9000 · 2020-06-26T19:40:24Z

I think to fix this we'd want to think about replacing mergeBuckets(long[]) with mergeBuckets(LongFunction) and maybe to give it a similar treatment to variable_width_histogram where it uses "budgeting" mechanism to accumulate a pile of buckets to merge and then do the expensive merge process fewer times.

jamesdorfman · 2020-06-29T07:18:54Z

Hey, good catch with this!

I think the main cause of this problem relates to how the initial_buffer is initialized.
In VariableWidthHistogramAggregatorBuilder we do

private int initialBuffer = Math.min(10 * this.shardSize, 50000);

This will certainly lead to an explosion of bucket if the aggregation is nested, since each sub aggregation could have 50k buckets.

One way to fix this could be to just lower the default size of the initial_buffer parameter.
Math.min(10 * this.shardSize, 50000) seems very high, and if I recall correctly I chose it fairly arbitrarily.

I'm not quite sure what you mean by

replacing mergeBuckets(long[]) with mergeBuckets(LongFunction)

would you mind explaining?

nik9000 · 2020-06-29T14:34:15Z

I'm not quite sure what you mean by

replacing mergeBuckets(long[]) with mergeBuckets(LongFunction)

would you mind explaining?

Its pretty much the TODO that you left. I'm not 100% sure it is needed, but it feels like it'd be fairly important.

The trouble is right here. That assertion isn't valid when collecting from many buckets. The second parameter which I usually call owningBucketOrd is set to the ordinal of the bucket that "owns" this collection. Its only 0 for the first owning bucket. Meaning, if it is at the top level, it is always 0. Aggregations, post the removal of asMultiBucketAggregator have to be "aware" of the bucket they are collecting.

It'd be fairly simple to handle this by changing the collector member to an ObjectArray<CollectionStrategy> and looking up the strategy with the owningBucketOrd. That works fine. You have to "shift" the ordinals of the buckets that you collect, but that is fine because the agg has a maximum number of buckets that each owningBucketOrd can collect - so it is enough to shift by owningBucketOrd * numBuckets.

The trouble comes when merging buckets. Its both tricky to implement and inefficient. Tricky because the mergeMap that you get is only for the owningBucketOrd that you are collecting, but the rewrites have to be for all of the owningBucketOrds that you have already collected. So you have to build the right mergeMap. Its inefficient because that mergeMap is quite large and because the rewrites have to rewrite all of the buckets, not just the buckets in this owningBucketOrd.

auto_date_histogram deals with the inefficiency by only rewriting when there are a certain number of "wasted" buckets. It has a fairly high budget and it doubles it every time it rewrites so it runs an amortized linear number of rewrites no matter how many buckets it sees. You could do that too, but I'm not sure it is worth it. auto_date_histogram can come by this behavior fairly naturally. You'd need some juggling here to get it.

I think it'd be ok doing it the "inefficient but correct" way. But just getting the correct way is tricky.

And now we come to my comment about mergeBuckets - doing it the inefficient but correct way is going to make a zillion mergeMaps, and they are going to be big. But a LongFunction or, rather, a LongBinaryOperator is could replace it. In the worst case you'd wrap the mergeMap, maybe even wrapping a "smaller" merge map. In the best case, like for moveLastCluster it would be something like i -> i < index ? i : i + 1. So, it isn't required, but it feels nice.

polyfractal

👍, and cheers for the explanation. Makes sense to lock this down for now until we modify it to handle multiple buckets. It does seem reasonably tricky to do that efficiently, compared to auto-date

jamesdorfman · 2020-06-30T19:25:53Z

Thanks for the detailed explanation! That makes a lot of sense :)

Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to elastic#42035

Adds an explicit check to `variable_width_histogram` to stop it from trying to collect from many buckets because it can't. I tried to make it do so but that is more than an afternoon's project, sadly. So for now we just disallow it. Relates to #42035

nik9000 added >non-issue :Analytics/Aggregations Aggregations v8.0.0 v7.9.0 labels Jun 26, 2020

elasticmachine added the Team:Analytics Meta label for analytical engine team (ESQL/Aggs/Geo) label Jun 26, 2020

nik9000 mentioned this pull request Jun 26, 2020

Experimental feature: variable_width_histogram #58573

Closed

nik9000 requested a review from polyfractal June 26, 2020 19:37

$polyfractal$

polyfractal reviewed Jun 30, 2020

View reviewed changes

$polyfractal$

polyfractal approved these changes Jun 30, 2020

View reviewed changes

nik9000 merged commit 32bdf85 into elastic:master Jun 30, 2020

nik9000 added the backport pending label Jun 30, 2020

nik9000 removed the backport pending label Jul 8, 2021

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fail variable_width_histogram that collects from many #58619

Fail variable_width_histogram that collects from many #58619

nik9000 commented Jun 26, 2020

elasticmachine commented Jun 26, 2020

nik9000 commented Jun 26, 2020

jamesdorfman commented Jun 29, 2020

nik9000 commented Jun 29, 2020

$@polyfractal$ polyfractal left a comment

jamesdorfman commented Jun 30, 2020

Fail variable_width_histogram that collects from many #58619

Fail variable_width_histogram that collects from many #58619

Conversation

nik9000 commented Jun 26, 2020

elasticmachine commented Jun 26, 2020

nik9000 commented Jun 26, 2020

jamesdorfman commented Jun 29, 2020

nik9000 commented Jun 29, 2020

polyfractal left a comment

Choose a reason for hiding this comment

jamesdorfman commented Jun 30, 2020

$@polyfractal$ polyfractal left a comment