-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add auto_scale option to histogram configuration to auto scale interval #8139
Conversation
Can one of the admins verify this patch? |
I signed the CLA after seeing the failed check. |
jenkins, test this |
Hey @nreese, this is pretty cool! I tested the functionality briefly and I can already see how this will be useful. A few things I noticed:
|
return (Math.trunc(num) + '').length; | ||
} | ||
|
||
function roundToNearest(num, roundDigit) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand the purpose of this function correctly, I think you could just use Lodash's round
with a negative precision https://lodash.com/docs/3.10.1#round
If you decide to extend auto-scale to non-filtered queries like I mentioned above, I think it might make sense to do the pre-fetch of the min/max regardless of whether range filters exist or not. There are a number of things that could affect the real range of the dataset other than the filters, like a handwritten filter in the query bar, the time picker, or even a filter on another field. This would have the added benefit of decoupling the auto-interval logic from the range filter syntax, which could always change and cause breakages. |
export default function HistogramAggDefinition(Private) { | ||
let BucketAggType = Private(AggTypesBucketsBucketAggTypeProvider); | ||
let createFilter = Private(AggTypesBucketsCreateFilterHistogramProvider); | ||
let queryFilter = Private(FilterBarQueryFilterProvider); | ||
const NUM_BUCKETS = 50; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this could be exposed as an additional form input when auto_scale is selected?
I have made the recommended changes. Looks like rebasing master adding a ton of commits I can look into pre-fetching the min/max. Are there any visualizations with a pre-fetch that I can use as an example? |
Heh, yeah something went off the rails with that rebase! Here's what I would do to try to fix it:
Alternatively you could also use |
As for the pre-fetch, we may be breaking new ground. @spalger do you know if anything like this exists already? (see my above comments for context). Any suggestions on an implementation approach? |
Yeah, pre-fetching aggregation specific data is definitely unprecedented. The closest thing we have now is the way that we prefetch the field_stats to pick the correct indices, although that mechanism could potentially be extended to send other pre-fetch requests. Auto scaling the number of buckets for a query would essentially require executing the entire query twice, once to figure out the matching documents and calculate the min/max of relevant fields, then the second to with the full aggregation tree. Since the search context for both of these requests will be the same (by design) it is possible that there aren't huge performance implications with this approach, but we should try to verify that somehow. That said, it seems like a very serious project, and I don't necessarily want to encourage you down that path @nreese. That's not to say you shouldn't try though 😄 PS: if you check the "Allow edits from maintainers" checkbox we could help fix things like the history issue |
@Bargs, Thanks for the git help. Looks like the rebase is happy now. I think the best long term solution is to move the parameter to Elasticsearch instead making the client issue multiple requests. It would make sense to update the histogram aggregation API so you could ask for a histogram with 25 buckets instead of providing an interval. That way, Elasticsearch could do all the work in a single request.
What are your thoughts? How should I proceed with this pull request? The only thing missing is making the number of buckets a configurable field. I can add that. What would be a good maximum value? A user should not create a histogram asking for ten thousand buckets. Should the maximum value be configurable in the global settings? @spalger, I checked the "Allow edits from maintainers" box. |
@nreese I agree, adding this option to the elasticsearch API would be ideal. I think there's some precedent for such an option in the Could you file a feature request ticket on the ES repo? I'd be interested to see if they'd be willing to implement such a feature before moving forward with this PR. I think getting auto scale to work smoothly with and without range filters is important, otherwise it's going to be difficult to clearly explain to users how this feature works in the context of a small tooltip. |
Thanks @nreese, I added my 2 cents to the ticket. Let's see what they say! |
@nreese the team discussed this today and we decided it makes the most sense to wait for an implementation in ES. Even if we implemented the ideal kibana solution with two requests, it's not going to be as efficient as it would be if ES made it a part of the histogram agg API so we could do everything in one request. So the best course of action right now would be to add a comment to elastic/elasticsearch#9572 explaining your need. |
Did the internal discussion include keeping the pull request as-is without any prefetch to ElasticSearch? This feature is really useful as filters are applied to the x-axis field and avoids lots of the complications. |
@nreese yes, we also talked about the PR's current state. The problem is that the current fix only takes the range filter into account. As I mentioned in a previous comment, there are a number of things that could affect the real range of the dataset other than the current field's filter, like a handwritten filter in the query bar, the time picker, or even a filter on another field. The auto-scale feature would also need to work in the absence of a filter. None of that can be accomplished without either a pre-fetch or additional support in ES. I appreciate that the current implementation solves a particular use case, but when we implement new features we need to make sure it scales to all Kibana users. |
Guys i believe you should reconsider. How much more likely is one to COLLAPSE a ES cluster by setting a histogram wrong (e.g. having 1 outlier that forces a million bucket) than suffer because of.. an extra query silly easy query (max min.?) per kibana dashboard refresh? really the balance is 1B to 1 here. given that the Kibana side implementation would besically have an almost equivalent API than ES will evenctually have.. to me its a no brainer. Also do have you notice that this issue ultimately depends on something that's marked "high hanging fruit". If you care about Kibana adoption as data exploration tool, please reconsider as the temp solution is absolutely not a bit bad. |
@jccq I agree, that's exactly why auto_scale needs to work across the board and not just in one specific scenario. Fixing this the right way in Kibana is a non-trivial task. It might actually take us more time to implement it in Kibana than in ES, and we'd end up with an inferior solution. That's why I'd recommend throwing your weight behind elastic/elasticsearch#9572, it's the best solution for everyone involved. |
@Bargs please excuse me if i misunderstand I do not believe a ES solution is either problematic or will take at all comparably to waiting for ES to close a "high hanging fruit" feature? (note the other is marked "Stalled") if you would please reopen this issue we will be working on a PR for this. Would this be agreeable? |
@jccq I'm not sure I completely understand -- the issue in Elasticsearch is open. I'm not sure what "stalled" indicates, but you can certainly comment on that issue and ask. You are welcome to work with the Elasticsearch team to propose a solution :) |
pull request for issue #8138