Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add auto_scale option to histogram configuration to auto scale interval #8139

Closed
wants to merge 3 commits into from

Conversation

nreese
Copy link
Contributor

@nreese nreese commented Aug 31, 2016

pull request for issue #8138

@elasticmachine
Copy link
Contributor

Can one of the admins verify this patch?

@nreese
Copy link
Contributor Author

nreese commented Sep 1, 2016

I signed the CLA after seeing the failed check.

@Bargs Bargs added the review label Sep 8, 2016
@Bargs Bargs self-assigned this Sep 8, 2016
@Bargs
Copy link
Contributor

Bargs commented Sep 13, 2016

jenkins, test this

@Bargs
Copy link
Contributor

Bargs commented Sep 13, 2016

Hey @nreese, this is pretty cool! I tested the functionality briefly and I can already see how this will be useful. A few things I noticed:

  • Interval form input in the UI isn't getting updated when a filter is activated
  • It would be nice if the interval input could be disabled when auto-scale is selected and it could just go full auto. We have an existing issue where intervals that are too small can crash the browser. It would be really cool to kill two birds with one stone by enhancing to auto-scale feature to work without a range filter (it would need to query the min/max value for the field for the given time range and then calculate the interval). Is that something you'd be interested in tackling?
  • I think you need to rebase on master to get the tests passing.

return (Math.trunc(num) + '').length;
}

function roundToNearest(num, roundDigit) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I understand the purpose of this function correctly, I think you could just use Lodash's round with a negative precision https://lodash.com/docs/3.10.1#round

@Bargs
Copy link
Contributor

Bargs commented Sep 13, 2016

If you decide to extend auto-scale to non-filtered queries like I mentioned above, I think it might make sense to do the pre-fetch of the min/max regardless of whether range filters exist or not. There are a number of things that could affect the real range of the dataset other than the filters, like a handwritten filter in the query bar, the time picker, or even a filter on another field. This would have the added benefit of decoupling the auto-interval logic from the range filter syntax, which could always change and cause breakages.

export default function HistogramAggDefinition(Private) {
let BucketAggType = Private(AggTypesBucketsBucketAggTypeProvider);
let createFilter = Private(AggTypesBucketsCreateFilterHistogramProvider);
let queryFilter = Private(FilterBarQueryFilterProvider);
const NUM_BUCKETS = 50;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this could be exposed as an additional form input when auto_scale is selected?

@nreese
Copy link
Contributor Author

nreese commented Sep 14, 2016

I have made the recommended changes. Looks like rebasing master adding a ton of commits

I can look into pre-fetching the min/max. Are there any visualizations with a pre-fetch that I can use as an example?

@Bargs
Copy link
Contributor

Bargs commented Sep 14, 2016

Heh, yeah something went off the rails with that rebase! Here's what I would do to try to fix it:

  1. Fetch the most recent changes from master and make sure your local master branch is up to date
  2. Check out your PR branch auto_scale
  3. Run git rebase --onto master 15bddfc14c9d0c3b212c28e8c38ac1e106795aa1 auto_scale
  4. Force push auto_scale to your fork

Alternatively you could also use git cherry-pick

@Bargs
Copy link
Contributor

Bargs commented Sep 14, 2016

As for the pre-fetch, we may be breaking new ground. @spalger do you know if anything like this exists already? (see my above comments for context). Any suggestions on an implementation approach?

@spalger
Copy link
Contributor

spalger commented Sep 15, 2016

Yeah, pre-fetching aggregation specific data is definitely unprecedented. The closest thing we have now is the way that we prefetch the field_stats to pick the correct indices, although that mechanism could potentially be extended to send other pre-fetch requests.

Auto scaling the number of buckets for a query would essentially require executing the entire query twice, once to figure out the matching documents and calculate the min/max of relevant fields, then the second to with the full aggregation tree. Since the search context for both of these requests will be the same (by design) it is possible that there aren't huge performance implications with this approach, but we should try to verify that somehow.

That said, it seems like a very serious project, and I don't necessarily want to encourage you down that path @nreese. That's not to say you shouldn't try though 😄


PS: if you check the "Allow edits from maintainers" checkbox we could help fix things like the history issue

image

@nreese
Copy link
Contributor Author

nreese commented Sep 15, 2016

@Bargs, Thanks for the git help. Looks like the rebase is happy now.

I think the best long term solution is to move the parameter to Elasticsearch instead making the client issue multiple requests. It would make sense to update the histogram aggregation API so you could ask for a histogram with 25 buckets instead of providing an interval. That way, Elasticsearch could do all the work in a single request.

{
    "aggs" : {
        "prices" : {
            "histogram" : {
                "field" : "price",
                "buckets" : 25
            }
        }
    }
}

What are your thoughts? How should I proceed with this pull request? The only thing missing is making the number of buckets a configurable field. I can add that. What would be a good maximum value? A user should not create a histogram asking for ten thousand buckets. Should the maximum value be configurable in the global settings?

@spalger, I checked the "Allow edits from maintainers" box.

@Bargs
Copy link
Contributor

Bargs commented Sep 20, 2016

@nreese I agree, adding this option to the elasticsearch API would be ideal. I think there's some precedent for such an option in the terms agg size option. We essentially want the same thing for histograms.

Could you file a feature request ticket on the ES repo? I'd be interested to see if they'd be willing to implement such a feature before moving forward with this PR. I think getting auto scale to work smoothly with and without range filters is important, otherwise it's going to be difficult to clearly explain to users how this feature works in the context of a small tooltip.

@nreese
Copy link
Contributor Author

nreese commented Sep 20, 2016

@Bargs I have filed the issue with ES, 20590

@Bargs
Copy link
Contributor

Bargs commented Sep 21, 2016

Thanks @nreese, I added my 2 cents to the ticket. Let's see what they say!

@Bargs
Copy link
Contributor

Bargs commented Oct 17, 2016

@nreese the team discussed this today and we decided it makes the most sense to wait for an implementation in ES. Even if we implemented the ideal kibana solution with two requests, it's not going to be as efficient as it would be if ES made it a part of the histogram agg API so we could do everything in one request.

So the best course of action right now would be to add a comment to elastic/elasticsearch#9572 explaining your need.

@Bargs Bargs closed this Oct 17, 2016
@Bargs Bargs removed the discuss label Oct 17, 2016
@nreese
Copy link
Contributor Author

nreese commented Oct 17, 2016

Did the internal discussion include keeping the pull request as-is without any prefetch to ElasticSearch? This feature is really useful as filters are applied to the x-axis field and avoids lots of the complications.

@Bargs
Copy link
Contributor

Bargs commented Oct 17, 2016

@nreese yes, we also talked about the PR's current state. The problem is that the current fix only takes the range filter into account. As I mentioned in a previous comment, there are a number of things that could affect the real range of the dataset other than the current field's filter, like a handwritten filter in the query bar, the time picker, or even a filter on another field. The auto-scale feature would also need to work in the absence of a filter. None of that can be accomplished without either a pre-fetch or additional support in ES.

I appreciate that the current implementation solves a particular use case, but when we implement new features we need to make sure it scales to all Kibana users.

@jccq
Copy link

jccq commented Oct 17, 2016

Guys i believe you should reconsider.

How much more likely is one to COLLAPSE a ES cluster by setting a histogram wrong (e.g. having 1 outlier that forces a million bucket) than suffer because of.. an extra query silly easy query (max min.?) per kibana dashboard refresh?

really the balance is 1B to 1 here.

given that the Kibana side implementation would besically have an almost equivalent API than ES will evenctually have.. to me its a no brainer.

Also do have you notice that this issue ultimately depends on something that's marked "high hanging fruit".

If you care about Kibana adoption as data exploration tool, please reconsider as the temp solution is absolutely not a bit bad.

@Bargs
Copy link
Contributor

Bargs commented Oct 17, 2016

@jccq I agree, that's exactly why auto_scale needs to work across the board and not just in one specific scenario. Fixing this the right way in Kibana is a non-trivial task. It might actually take us more time to implement it in Kibana than in ES, and we'd end up with an inferior solution. That's why I'd recommend throwing your weight behind elastic/elasticsearch#9572, it's the best solution for everyone involved.

@jccq
Copy link

jccq commented Oct 18, 2016

@Bargs please excuse me if i misunderstand

I do not believe a ES solution is either problematic or will take at all comparably to waiting for ES to close a "high hanging fruit" feature? (note the other is marked "Stalled")
elastic/elasticsearch#12316

if you would please reopen this issue we will be working on a PR for this. Would this be agreeable?

@tbragin
Copy link
Contributor

tbragin commented Oct 18, 2016

@jccq I'm not sure I completely understand -- the issue in Elasticsearch is open. I'm not sure what "stalled" indicates, but you can certainly comment on that issue and ask. You are welcome to work with the Elasticsearch team to propose a solution :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants