add auto_scale option to histogram configuration to auto scale interval #8139

nreese · 2016-08-31T20:21:32Z

pull request for issue #8138

elasticmachine · 2016-08-31T20:21:34Z

Can one of the admins verify this patch?

nreese · 2016-09-01T16:08:19Z

I signed the CLA after seeing the failed check.

Bargs · 2016-09-13T18:36:21Z

jenkins, test this

Bargs · 2016-09-13T20:40:16Z

Hey @nreese, this is pretty cool! I tested the functionality briefly and I can already see how this will be useful. A few things I noticed:

Interval form input in the UI isn't getting updated when a filter is activated
It would be nice if the interval input could be disabled when auto-scale is selected and it could just go full auto. We have an existing issue where intervals that are too small can crash the browser. It would be really cool to kill two birds with one stone by enhancing to auto-scale feature to work without a range filter (it would need to query the min/max value for the field for the given time range and then calculate the interval). Is that something you'd be interested in tackling?
I think you need to rebase on master to get the tests passing.

Bargs · 2016-09-13T20:42:16Z

src/ui/public/agg_types/buckets/histogram.js

+    return (Math.trunc(num) + '').length;
+  }
+
+  function roundToNearest(num, roundDigit) {


If I understand the purpose of this function correctly, I think you could just use Lodash's round with a negative precision https://lodash.com/docs/3.10.1#round

Bargs · 2016-09-13T21:12:07Z

If you decide to extend auto-scale to non-filtered queries like I mentioned above, I think it might make sense to do the pre-fetch of the min/max regardless of whether range filters exist or not. There are a number of things that could affect the real range of the dataset other than the filters, like a handwritten filter in the query bar, the time picker, or even a filter on another field. This would have the added benefit of decoupling the auto-interval logic from the range filter syntax, which could always change and cause breakages.

Bargs · 2016-09-13T21:20:07Z

src/ui/public/agg_types/buckets/histogram.js

 export default function HistogramAggDefinition(Private) {
  let BucketAggType = Private(AggTypesBucketsBucketAggTypeProvider);
  let createFilter = Private(AggTypesBucketsCreateFilterHistogramProvider);
+  let queryFilter = Private(FilterBarQueryFilterProvider);
+  const NUM_BUCKETS = 50;


Maybe this could be exposed as an additional form input when auto_scale is selected?

nreese · 2016-09-14T20:57:14Z

I have made the recommended changes. Looks like rebasing master adding a ton of commits

I can look into pre-fetching the min/max. Are there any visualizations with a pre-fetch that I can use as an example?

Bargs · 2016-09-14T21:07:03Z

Heh, yeah something went off the rails with that rebase! Here's what I would do to try to fix it:

Fetch the most recent changes from master and make sure your local master branch is up to date
Check out your PR branch auto_scale
Run git rebase --onto master 15bddfc14c9d0c3b212c28e8c38ac1e106795aa1 auto_scale
Force push auto_scale to your fork

Alternatively you could also use git cherry-pick

Bargs · 2016-09-14T21:14:50Z

As for the pre-fetch, we may be breaking new ground. @spalger do you know if anything like this exists already? (see my above comments for context). Any suggestions on an implementation approach?

spalger · 2016-09-15T19:51:36Z

Yeah, pre-fetching aggregation specific data is definitely unprecedented. The closest thing we have now is the way that we prefetch the field_stats to pick the correct indices, although that mechanism could potentially be extended to send other pre-fetch requests.

Auto scaling the number of buckets for a query would essentially require executing the entire query twice, once to figure out the matching documents and calculate the min/max of relevant fields, then the second to with the full aggregation tree. Since the search context for both of these requests will be the same (by design) it is possible that there aren't huge performance implications with this approach, but we should try to verify that somehow.

That said, it seems like a very serious project, and I don't necessarily want to encourage you down that path @nreese. That's not to say you shouldn't try though 😄

PS: if you check the "Allow edits from maintainers" checkbox we could help fix things like the history issue

…string concat

nreese · 2016-09-15T20:51:57Z

@Bargs, Thanks for the git help. Looks like the rebase is happy now.

I think the best long term solution is to move the parameter to Elasticsearch instead making the client issue multiple requests. It would make sense to update the histogram aggregation API so you could ask for a histogram with 25 buckets instead of providing an interval. That way, Elasticsearch could do all the work in a single request.

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "buckets" : 25
            }
        }
    }
}

What are your thoughts? How should I proceed with this pull request? The only thing missing is making the number of buckets a configurable field. I can add that. What would be a good maximum value? A user should not create a histogram asking for ten thousand buckets. Should the maximum value be configurable in the global settings?

@spalger, I checked the "Allow edits from maintainers" box.

Bargs · 2016-09-20T18:09:30Z

@nreese I agree, adding this option to the elasticsearch API would be ideal. I think there's some precedent for such an option in the terms agg size option. We essentially want the same thing for histograms.

Could you file a feature request ticket on the ES repo? I'd be interested to see if they'd be willing to implement such a feature before moving forward with this PR. I think getting auto scale to work smoothly with and without range filters is important, otherwise it's going to be difficult to clearly explain to users how this feature works in the context of a small tooltip.

nreese · 2016-09-20T21:00:36Z

@Bargs I have filed the issue with ES, 20590

Bargs · 2016-09-21T00:30:04Z

Thanks @nreese, I added my 2 cents to the ticket. Let's see what they say!

Bargs · 2016-10-17T18:03:43Z

@nreese the team discussed this today and we decided it makes the most sense to wait for an implementation in ES. Even if we implemented the ideal kibana solution with two requests, it's not going to be as efficient as it would be if ES made it a part of the histogram agg API so we could do everything in one request.

So the best course of action right now would be to add a comment to elastic/elasticsearch#9572 explaining your need.

nreese · 2016-10-17T19:06:56Z

Did the internal discussion include keeping the pull request as-is without any prefetch to ElasticSearch? This feature is really useful as filters are applied to the x-axis field and avoids lots of the complications.

Bargs · 2016-10-17T21:58:49Z

@nreese yes, we also talked about the PR's current state. The problem is that the current fix only takes the range filter into account. As I mentioned in a previous comment, there are a number of things that could affect the real range of the dataset other than the current field's filter, like a handwritten filter in the query bar, the time picker, or even a filter on another field. The auto-scale feature would also need to work in the absence of a filter. None of that can be accomplished without either a pre-fetch or additional support in ES.

I appreciate that the current implementation solves a particular use case, but when we implement new features we need to make sure it scales to all Kibana users.

jccq · 2016-10-17T22:46:37Z

Guys i believe you should reconsider.

How much more likely is one to COLLAPSE a ES cluster by setting a histogram wrong (e.g. having 1 outlier that forces a million bucket) than suffer because of.. an extra query silly easy query (max min.?) per kibana dashboard refresh?

really the balance is 1B to 1 here.

given that the Kibana side implementation would besically have an almost equivalent API than ES will evenctually have.. to me its a no brainer.

Also do have you notice that this issue ultimately depends on something that's marked "high hanging fruit".

If you care about Kibana adoption as data exploration tool, please reconsider as the temp solution is absolutely not a bit bad.

Bargs · 2016-10-17T23:07:56Z

@jccq I agree, that's exactly why auto_scale needs to work across the board and not just in one specific scenario. Fixing this the right way in Kibana is a non-trivial task. It might actually take us more time to implement it in Kibana than in ES, and we'd end up with an inferior solution. That's why I'd recommend throwing your weight behind elastic/elasticsearch#9572, it's the best solution for everyone involved.

jccq · 2016-10-18T07:37:16Z

@Bargs please excuse me if i misunderstand

I do not believe a ES solution is either problematic or will take at all comparably to waiting for ES to close a "high hanging fruit" feature? (note the other is marked "Stalled")
elastic/elasticsearch#12316

if you would please reopen this issue we will be working on a PR for this. Would this be agreeable?

tbragin · 2016-10-18T13:28:37Z

@jccq I'm not sure I completely understand -- the issue in Elasticsearch is open. I'm not sure what "stalled" indicates, but you can certainly comment on that issue and ask. You are welcome to work with the Elasticsearch team to propose a solution :)

Bargs added the review label Sep 8, 2016

Bargs self-assigned this Sep 8, 2016

Bargs reviewed Sep 13, 2016
View reviewed changes

Bargs added updates_needed and removed review labels Sep 13, 2016

nreese added 3 commits September 15, 2016 14:38

add auto_scale option to histogram configuration to auto scale interval

5e0a573

use lodash round, handle pinned filters, and use toString instead of …

dce0b62

…string concat

add auto_scale option to histogram configuration to auto scale interval

1c8c2f2

nreese force-pushed the auto_scale branch from 8a83aaa to 1c8c2f2 Compare September 15, 2016 20:40

Bargs mentioned this pull request Sep 21, 2016

Enhance histogram aggregation - specify number of buckets instead of interval elastic/elasticsearch#20590

Closed

Bargs mentioned this pull request Sep 21, 2016

Support dynamic interval and fixed buckets for histogram aggregation elastic/elasticsearch#9572

Closed

Bargs added discuss and removed updates_needed labels Oct 7, 2016

Bargs closed this Oct 17, 2016

Bargs removed the discuss label Oct 17, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add auto_scale option to histogram configuration to auto scale interval #8139

add auto_scale option to histogram configuration to auto scale interval #8139

nreese commented Aug 31, 2016

elasticmachine commented Aug 31, 2016

nreese commented Sep 1, 2016

Bargs commented Sep 13, 2016

Bargs commented Sep 13, 2016

Bargs Sep 13, 2016

Bargs commented Sep 13, 2016 •

edited

Loading

Bargs Sep 13, 2016

nreese commented Sep 14, 2016

Bargs commented Sep 14, 2016

Bargs commented Sep 14, 2016

spalger commented Sep 15, 2016 •

edited

Loading

nreese commented Sep 15, 2016

Bargs commented Sep 20, 2016

nreese commented Sep 20, 2016

Bargs commented Sep 21, 2016

Bargs commented Oct 17, 2016

nreese commented Oct 17, 2016

Bargs commented Oct 17, 2016

jccq commented Oct 17, 2016 •

edited

Loading

Bargs commented Oct 17, 2016

jccq commented Oct 18, 2016

tbragin commented Oct 18, 2016 •

edited

Loading

add auto_scale option to histogram configuration to auto scale interval #8139

add auto_scale option to histogram configuration to auto scale interval #8139

Conversation

nreese commented Aug 31, 2016

elasticmachine commented Aug 31, 2016

nreese commented Sep 1, 2016

Bargs commented Sep 13, 2016

Bargs commented Sep 13, 2016

Bargs Sep 13, 2016

Choose a reason for hiding this comment

Bargs commented Sep 13, 2016 • edited Loading

Bargs Sep 13, 2016

Choose a reason for hiding this comment

nreese commented Sep 14, 2016

Bargs commented Sep 14, 2016

Bargs commented Sep 14, 2016

spalger commented Sep 15, 2016 • edited Loading

nreese commented Sep 15, 2016

Bargs commented Sep 20, 2016

nreese commented Sep 20, 2016

Bargs commented Sep 21, 2016

Bargs commented Oct 17, 2016

nreese commented Oct 17, 2016

Bargs commented Oct 17, 2016

jccq commented Oct 17, 2016 • edited Loading

Bargs commented Oct 17, 2016

jccq commented Oct 18, 2016

tbragin commented Oct 18, 2016 • edited Loading

Bargs commented Sep 13, 2016 •

edited

Loading

spalger commented Sep 15, 2016 •

edited

Loading

jccq commented Oct 17, 2016 •

edited

Loading

tbragin commented Oct 18, 2016 •

edited

Loading