Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add parent-aggregation to parent-join module #34210

Merged
merged 9 commits into from
Nov 8, 2018

Conversation

centic9
Copy link
Contributor

@centic9 centic9 commented Oct 2, 2018

This is a first try at implementing the parent-aggregation as part of the parent-join module. See #9705 for some previous discussion about this.

This is probably not fully merge-able yet, I would like to get feedback on a few points that I could not clarify myself:

  • For implementation I mostly do the same as the child-aggregation does with the main difference of only reporting each parent-document once as part of a bucket
  • Does this approach look feasible? It seems to work for our local use-cases, but there might be edge-cases that we do not trigger or detect right now.
  • Is there a way to do the Map<Integer, Set> in ChildrenToParentAggregator.doPostCollection() in a better way? I expect this will create quite some memory-churn. Something like IntMap from Trove or HPPC would be better, but wasn't sure if I can add a dependency on HPPC to the module.
  • new unit-tests and all other existing tests do pass, is there any other integration-style testing that we should add?

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@Kamapcuc
Copy link

Kamapcuc commented Oct 2, 2018

hooray!

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @centic9 . This is a great start ! I only checked the aggregator impl for the moment and left some comments.

As stated in the PR, we can avoid collecting buckets into the
multiple-array as long as it is the same bucket as before
We can combine both aggregators into an abstract class, only the filter
needs to be defined differently.

Also the lookup via seenParents is not necessary any more after the
previous change.
@centic9
Copy link
Contributor Author

centic9 commented Oct 4, 2018

Refactored code to have an abstract class for similar code so the two aggregations only contain the different parts.
For Child-To-Parent I now also do a check on the existing bucket before switching to an array and thus could remove the costly lookup in the Map.

@rjernst rjernst removed the review label Oct 10, 2018
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the late review @centic9 , I left more comments regarding the complexity and costs of this new aggregation. What do you think of restricting the new aggregation to the root level like @martijnvg proposed ? The memory and complexity required to register all buckets in the children => parent case seems really prohibitive and I think that the majority of use cases would be fine using it as the root aggregation.

jimczi added a commit to jimczi/elasticsearch that referenced this pull request Oct 25, 2018
This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for elastic#34210 which adds the `parent` aggregation in the parent-join module.

Relates elastic#34508
jimczi added a commit that referenced this pull request Oct 26, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
jimczi added a commit that referenced this pull request Oct 26, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@centic9 I merged #34845 so you should be able to use the new ParentJoinAggregator to implement the parent aggregator. It should be identical to the ParentToChildrenAggregator except that the inQuery and outQuery should be inversed.
Can you merge master and apply the changes in your code ? Also can you change the title of the pr to reflect the status (we're not in the first steps anymore ;) ).

The required changes are now even less as the new base class
ParentJoinAggregator implements collecting buckets in a different
way.
@centic9
Copy link
Contributor Author

centic9 commented Oct 26, 2018

Thanks for all this preparatory work! I have now updated the parent-aggregation to make use of the new code. Mostly removing code, test still pass locally.

kcm pushed a commit that referenced this pull request Oct 30, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry @centic9 I was out last week. I left some comments but we're getting close. Can you also add an entry in the documentation next to https://github.com/elastic/elasticsearch/blob/master/docs/reference/aggregations/bucket/children-aggregation.asciidoc ?

@centic9
Copy link
Contributor Author

centic9 commented Nov 7, 2018

Hi Jim, I have now fixed the TODOs in the tests, added docu and merged in latest master to resolve a conflict in one test-case, so should be good to go from my side now, let me know if anything is still missing or incorrect :)

@jimczi
Copy link
Contributor

jimczi commented Nov 8, 2018

test this please

Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @centic9, the pr looks good. Can you change the title ?
I triggered a CI build to check if the tests pass, I'll monitor the result.

@jimczi
Copy link
Contributor

jimczi commented Nov 8, 2018

The failure is not related to this pr. Let's try another one.

@jimczi
Copy link
Contributor

jimczi commented Nov 8, 2018

test this please

@centic9 centic9 changed the title First steps for adding Parent-Aggregation to parent-join module Add parent-aggregation to parent-join module Nov 8, 2018
Copy link
Contributor

@jimczi jimczi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @centic9, tests pass ! I'll merge the pr in master and 6x

@jimczi jimczi merged commit d351422 into elastic:master Nov 8, 2018
@centic9 centic9 deleted the parent_join_parent_aggregation branch November 8, 2018 13:35
@centic9
Copy link
Contributor Author

centic9 commented Nov 8, 2018

Perfect, nice that we get this into 6.x as well!

jimczi pushed a commit that referenced this pull request Nov 8, 2018
Add `parent` aggregation, a special single bucket aggregation that joins children documents to their parent.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Mar 18, 2019
Relates: elastic/elasticsearch#34210

This commit adds Parent Aggregation to the high level client.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Mar 19, 2019
Relates: elastic/elasticsearch#34210

This commit adds Parent Aggregation to the high level client.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Mar 20, 2019
Relates: elastic/elasticsearch#34210

This commit adds Parent Aggregation to the high level client.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Mar 21, 2019
Relates: elastic/elasticsearch#34210

This commit adds Parent Aggregation to the high level client.
russcam added a commit to elastic/elasticsearch-net that referenced this pull request Apr 3, 2019
Relates: elastic/elasticsearch#34210

This commit adds Parent Aggregation to the high level client.

(cherry picked from commit 1051a57)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants