-
Notifications
You must be signed in to change notification settings - Fork 8.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
other and missing bucket support for terms agg #15525
Conversation
@ppisljar looking good! A few comments, observations and open questions below. I realize this is early, but want to state we probably need a better value than Rather than Seems like some sort of bug. I chart the top 25 values and only produce one. However, if I check other we see additional values. I'm not sure if this is a side effect of other or a pie charts normal approach for visualizing too many slices in a small area. It looks like one single white value, maybe there's a better way for handling this. The other bucket does not seem to update with time Filtering on a dashboard seems to work well. We'll want to make sure the label here is updated as well. Open questions:
|
thanks @alexfrancoeur
my answers to open questions:
|
@alexfrancoeur @ppisljar I agree with Alex that the UI text "Show Other" should be more specific. Ping me when you're ready to discuss the text. I can also help with the wording for default values. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sweeto! this is going to be a yuge addition.
This is an early PR, so some of my comments might be stale (if so, then ignore).
There's two things to evaluate I think:
- I would decouple others and missing in two separte checkbox UIs.
- I'd do determination of what the actual query has to become up front, in a pre-flight request instead of a postFlight request. It'd preserve the contract of
Terms
better imho.
type="number" | ||
min="1" | ||
> | ||
<div class="vis-editor-agg-form-row"> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After using these for a while, I'd separate out both options. By combining them it does reduce the footprint a little, but it also makes it less clear what exactly is going to happen.
So I'd just have two parallel options:
- A 'show other' checkbox and the other bucket label as one thing we can toggle.
- A 'show missing' checkbox and the missing bucket label as one thing we can toggle.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also think two separate settings would be better. I am not sure if we would still need a checkbox for the "show missing" or if we could just use a textfield for that, labeled something like: "Replace missing values with" and if the field is empty just don't use it, and if it's set to something replace missing values with it. I think we don't need a "user wants to show missing, but don't want to specify a custom label" option for that.
@@ -60,11 +62,33 @@ export function AggTypesBucketsTermsProvider(Private) { | |||
return agg.getFieldDisplayName() + ': ' + params.order.display; | |||
}, | |||
createFilter: createFilter, | |||
postFlightRequest: async (aggConfigs, aggConfig, searchSourceAggs, resp, nestedSearchSource) => { | |||
const filterAgg = buildOtherBucketAgg(aggConfigs, searchSourceAggs, aggConfig, resp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this code-path should not be hit when the other functionality is turned off
}).then(async resp => { | ||
for (let i = 0; i < vis.aggs.length; i++) { | ||
const agg = vis.aggs[i]; | ||
if (!agg.type || !agg.type.postFlightRequest) continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add postFlightRequest
to every type but just have it be the identity-function. Then we don't have to do any typechecking here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should rather be as failsafe as possible and check if the function exists here before calling it, instead of relying on some complete other place always attaching the identity function for us.
@@ -0,0 +1,143 @@ | |||
import _ from 'lodash'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this module seems like an unnecessary abstraction imho, it just scatters the code. I'd add these functions as methods on the Terms
bucket agg.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since this file alrady contains 179 lines, we shouldn't merge it into the terms class for better readability and there is not much of a harm to make them an own module. Maybe just rename the file to begin with an underscore to make it more clear, this are private helper functions.
return; | ||
} | ||
|
||
filterAggDsl.filters[key] = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
imho it'd help readability when not creating temp vars. Just access this as resultAgg.filters
return []; | ||
}; | ||
|
||
const getAggConfigResult = (responseAggs, aggId, bucketKey) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we have a bug when splitting charts/slices. we get something like:
at getAggConfigResult (http://localhost:5601/prj/bundles/kibana.bundle.js?v=8467:246452:19)
at http://localhost:5601/prj/bundles/kibana.bundle.js?v=8467:246460:85
at http://localhost:5601/prj/bundles/commons.bundle.js?v=8467:21259:15
at baseForOwn (http://localhost:5601/prj/bundles/commons.bundle.js?v=8467:20232:14)
at http://localhost:5601/prj/bundles/commons.bundle.js?v=8467:21229:18
at Function.<anonymous> (http://localhost:5601/prj/bundles/commons.bundle.js?v=8467:21532:13)
at getAggConfigResult (http://localhost:5601/prj/bundles/kibana.bundle.js?v=8467:246459:20)
at updateMissingBucket (http://localhost:5601/prj/bundles/kibana.bundle.js?v=8467:246555:28)
at Object._callee$ (http://localhost:5601/prj/bundles/kibana.bundle.js?v=8467:246151:17)
at tryCatch (http://localhost:5601/prj/bundles/commons.bundle.js?v=8467:81181:40)```
_.each(responseAggs, agg => { | ||
resultBuckets = [ | ||
...resultBuckets, | ||
...getAggConfigResult(agg.aggs, aggId, bucketKey) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we probably don't want to recurse if we don't have agg.aggs. Or put a guard on top. in any case, we probably want a base-case in this recursive function somewhere.
const agg = vis.aggs[i]; | ||
if (!agg.type || !agg.type.postFlightRequest) continue; | ||
const nestedSearchSource = new SearchSource().inherits(searchSource); | ||
resp = await agg.type.postFlightRequest(vis.aggs, agg, searchSource.get('aggs')(), resp, nestedSearchSource); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a 100% convinced this postFlightRequest
is the correct abstraction.
This only works for terms, where we can conceive of something like "other", where this configuration requires more calls. But this wouldn't transfer to other aggregations, if a postflight makes sense. This also bleeds into courier in kind of a weird way. Also, the toDSL functionality is now broken for Terms. The configuration of terms and all its outputs (dsl, responses, ...) no longer matches the actual result.
Could we flip this? Instead of introducing a postFlightRequest
, do preliminary requests at the beginning.
- We could do something similar how the
geo_hash
aggregation rewrites the query ingetRequestAggs
(https://github.com/thomasneirynck/kibana/blob/05cbb929658590024e231a0791311c8b859e194f/src/ui/public/agg_types/buckets/geo_hash.js#L106-L106). This would mean of course that we make this method (getRequestAggs) async. - If the above doesn't work, we could introduce an async
preFlight
method that rewrites the query-object. This could do all the intermediate calls required to produce the final query.
In either case, I think it would preserve the contract of the "Terms" configuration better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think that is possible, we discussed this with @nreese and found few reasons why to do the postFlight instead of preFlight ... i think the main one was that i need the response for this to work
- i can not construct correct filters i need for the 'other' bucket construction if i don't have the response from original query back as this could be nested
- we are updating the actual response with fake buckets, which can't really happen in pre-flight
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea would be that you make the original request in the preflight, and collect the other responses in a similar fashion as here, except for the last one. Your last DSL will then be the correct one, matching the terms-config. Right now, we do the first request (not matching the terms config) and stringing together all the subsequent ones in the postflight.
const filter = _.cloneDeep(bucket.filter) || currentAgg.createFilter(bucketKey); | ||
delete filter.meta; | ||
const migratedFilter = migrateFilter(filter.query || filter); | ||
const newFilters = [...filters, migratedFilter]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might want to use the buildQueryFromFilters helper here. It'll migrate and clean (remove meta prop) the filters exactly how we do it in SearchSource and return a valid bool query body that you can use.
|
||
// create not filters for all the buckets | ||
const notKeys = agg.buckets.map(bucket => bucket.key); | ||
filterAggDsl.filters[key].bool.must_not[0].terms[aggWithOtherBucket.params.field.name] = notKeys; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think you want to use a terms
query. If a user is aggregating on an analyzed field (with field data turned on) these term queries won't match properly. Scripted fields also won't work with the current implementation. I think a negated phrases filter would work. It'll create match_phrase queries, so my only question would be whether the terms agg supports any data types that won't work with match_phrase.
It could also make more sense to use the agg's createFilter method, that way you know you're getting a valid filter for this particular agg and field type. It already uses a filter constructor under the hood. Negate them all and then combine them with a bool using the buildQueryFromFilters
helper I mentioned above.
(as a general rule, if you ever find yourself manually building query DSL in your code, there's a good chance you should be using one of our filter/query abstractions instead)
@ppisljar some quick weekend feedback. I noticed that the behavior of an empty pie chart is a bit odd. Not sure if it's related to the PR. At first I didn't realize I was missing data, just thought that there was an issue loading the pie chart. However, if I resize the window the error message occurs. Check out this gif A couple of additional comments below. As a Kibana end user - I don't know the difference between Should a user have a choice to choose a missing and/or other bucket? I understand that elasticsearch uses I feel we're running into issues here within the UI that we have in the past, using elasticsearch terminology vs. what would be intuitive for building the visualization. I don't have any good recommendations at the moment but it's something to think about. Maybe we can discuss some options early next week? |
@ppisljar Let's try this: Group other values into separate bucket Label for other bucket Show missing values Label for missing values Question: are the missing & null values in their own bucket or in the "other" bucket |
aba94e8
to
ae0e053
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried to break it and haven't found a way so far. Everything seems to work as expected, even if you try to nest different aggregations in weird ways.
I have some more code suggestions and questions, but since I am now away for 2 weeks, these shouldn't block this PR if it would get ready otherwise.
Also I would like to see some more tests and documentation (especially in the terms_other_bucket_helper
file), but that's anyway on the todo list.
src/ui/public/agg_types/agg_type.js
Outdated
* A function that will be called after the main request has been made | ||
* and should return an updated response | ||
*/ | ||
this.postFlightRequest = config.postFlightRequest || null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add the parameters, that will be passed to the postFlightRequest
function to the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And that the response type is allowed to be async
(or a promise).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the AggConfig
class tied towards the courier request handler, or could this possibly be used with other request handlers? As far as I understand, the AggConfig
will always be used when using the default editor with schemas. That doesn't necessarily mean you need the courier request handler though?
If that's the case, I think we should rather implement the postFlightRequest
outside of the courier request handler and inside the generic calling the request handler (in visualize.js
?). If there are reasons against it, I would at least document here, that the postFlightRequest only works when using the courier request handler.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
currently default editor Data tab is tied to the AggConfig and to Courier. I think AggCpnfigs are only used with courier.
@@ -0,0 +1,143 @@ | |||
import _ from 'lodash'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think since this file alrady contains 179 lines, we shouldn't merge it into the terms class for better readability and there is not much of a harm to make them an own module. Maybe just rename the file to begin with an underscore to make it more clear, this are private helper functions.
@@ -49,6 +49,7 @@ export function FilterBarClickHandlerProvider(Notifier, Private) { | |||
} | |||
} | |||
}) | |||
.flatten() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So simple and yet so powerful! ✨
@@ -26,7 +28,7 @@ const CourierRequestHandlerProvider = function (Private, courier, timefilter) { | |||
return new Promise((resolve, reject) => { | |||
if (shouldQuery()) { | |||
delete vis.reload; | |||
searchSource.onResults().then(resp => { | |||
searchSource.onResults().then(async resp => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the async
here isn't needed?
@@ -35,15 +37,22 @@ const CourierRequestHandlerProvider = function (Private, courier, timefilter) { | |||
}; | |||
|
|||
searchSource.rawResponse = resp; | |||
resolve(resp); | |||
|
|||
resolve(_.cloneDeep(resp)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason for cloning this, was the spy panel? So this would be redundant with the refactored requests panel? (Doesn't mean that we should remove it here, I just want to make sure I understand it.)
|
||
|
||
}).then(async resp => { | ||
for (let i = 0; i < vis.aggs.length; i++) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use for (const agg of vis.aggs)
here, since we don't need the index i
anywhere inside the loop.
const filterAgg = buildOtherBucketAgg(aggConfigs, searchSourceAggs, aggConfig, resp); | ||
nestedSearchSource.set('aggs', filterAgg); | ||
const response = await nestedSearchSource.fetchAsRejectablePromise(); | ||
// todo: refactor to not have side effects |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that comment can go away, seems, the function doesn't have any side effects anymore.
* @param otherAgg: AggConfig of the aggregation with other bucket | ||
*/ | ||
const getOtherAggTerms = (requestAgg, key, otherAgg) => { | ||
return requestAgg['other-filter'].filters.filters[key].bool.must_not.map(filter => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this version looks a bit nicer:
return requestAgg['other-filter'].filters.filters[key].bool.must_not.filter(filter =>
filter.match_phrase && filter.match_phrase[otherAgg.params.field.name]
).map(filter =>
filter.match_phrase[otherAgg.params.field.name].query
);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM with one final comment. While these new options are keyboard accessible, I don't seen a bounding box or indicator that I'm in the checkbox. Would you mind adding the appropriate styling here? We should visually indicate that the box is in focus.
9a11a10
to
70b6a57
Compare
1a98010
to
f4ce5e3
Compare
…tRequest function
84c55f0
to
faf0eca
Compare
@ppisljar This is great. Nice job. There is one point of UI confusion. When adding a filter on the The created filters work great and make a lot of sense |
thanks @nreese, above is not relevant to this PR, as the same happens with any NOT filter. please create an issue for it. |
release note: 'other' and 'missing' bucket for the terms aggregation
resolves #1961