-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add parent-aggregation to parent-join module #34210
Conversation
Pinging @elastic/es-search-aggs |
hooray! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @centic9 . This is a great start ! I only checked the aggregator impl for the moment and left some comments.
...arent-join/src/main/java/org/elasticsearch/join/aggregations/ChildrenToParentAggregator.java
Outdated
Show resolved
Hide resolved
...arent-join/src/main/java/org/elasticsearch/join/aggregations/ChildrenToParentAggregator.java
Outdated
Show resolved
Hide resolved
...arent-join/src/main/java/org/elasticsearch/join/aggregations/ChildrenToParentAggregator.java
Outdated
Show resolved
Hide resolved
As stated in the PR, we can avoid collecting buckets into the multiple-array as long as it is the same bucket as before
We can combine both aggregators into an abstract class, only the filter needs to be defined differently. Also the lookup via seenParents is not necessary any more after the previous change.
Refactored code to have an abstract class for similar code so the two aggregations only contain the different parts. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review @centic9 , I left more comments regarding the complexity and costs of this new aggregation. What do you think of restricting the new aggregation to the root level like @martijnvg proposed ? The memory and complexity required to register all buckets in the children => parent
case seems really prohibitive and I think that the majority of use cases would be fine using it as the root aggregation.
...nt-join/src/main/java/org/elasticsearch/join/aggregations/AbstractParentChildAggregator.java
Outdated
Show resolved
Hide resolved
...arent-join/src/main/java/org/elasticsearch/join/aggregations/ChildrenToParentAggregator.java
Outdated
Show resolved
Hide resolved
This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for elastic#34210 which adds the `parent` aggregation in the parent-join module. Relates elastic#34508
) This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for #34210 which adds the `parent` aggregation in the parent-join module. Relates #34508
) This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for #34210 which adds the `parent` aggregation in the parent-join module. Relates #34508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@centic9 I merged #34845 so you should be able to use the new ParentJoinAggregator
to implement the parent
aggregator. It should be identical to the ParentToChildrenAggregator
except that the inQuery
and outQuery
should be inversed.
Can you merge master and apply the changes in your code ? Also can you change the title of the pr to reflect the status (we're not in the first steps anymore ;) ).
The required changes are now even less as the new base class ParentJoinAggregator implements collecting buckets in a different way.
Thanks for all this preparatory work! I have now updated the parent-aggregation to make use of the new code. Mostly removing code, test still pass locally. |
) This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for #34210 which adds the `parent` aggregation in the parent-join module. Relates #34508
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @centic9 I was out last week. I left some comments but we're getting close. Can you also add an entry in the documentation next to https://github.com/elastic/elasticsearch/blob/master/docs/reference/aggregations/bucket/children-aggregation.asciidoc ?
...arent-join/src/main/java/org/elasticsearch/join/aggregations/ChildrenToParentAggregator.java
Outdated
Show resolved
Hide resolved
modules/parent-join/src/test/java/org/elasticsearch/join/aggregations/ParentIT.java
Outdated
Show resolved
Hide resolved
modules/parent-join/src/test/java/org/elasticsearch/join/aggregations/ParentIT.java
Outdated
Show resolved
Hide resolved
Adjust import/javadoc, remove commented code/TODOs
Hi Jim, I have now fixed the TODOs in the tests, added docu and merged in latest master to resolve a conflict in one test-case, so should be good to go from my side now, let me know if anything is still missing or incorrect :) |
test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @centic9, the pr looks good. Can you change the title ?
I triggered a CI build to check if the tests pass, I'll monitor the result.
The failure is not related to this pr. Let's try another one. |
test this please |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @centic9, tests pass ! I'll merge the pr in master and 6x
Perfect, nice that we get this into 6.x as well! |
Add `parent` aggregation, a special single bucket aggregation that joins children documents to their parent.
Relates: elastic/elasticsearch#34210 This commit adds Parent Aggregation to the high level client.
Relates: elastic/elasticsearch#34210 This commit adds Parent Aggregation to the high level client.
Relates: elastic/elasticsearch#34210 This commit adds Parent Aggregation to the high level client.
Relates: elastic/elasticsearch#34210 This commit adds Parent Aggregation to the high level client.
Relates: elastic/elasticsearch#34210 This commit adds Parent Aggregation to the high level client. (cherry picked from commit 1051a57)
This is a first try at implementing the parent-aggregation as part of the parent-join module. See #9705 for some previous discussion about this.
This is probably not fully merge-able yet, I would like to get feedback on a few points that I could not clarify myself: