Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check self references in metric agg after last doc collection (#33593) #34001

Merged
merged 8 commits into from
Oct 25, 2018
Merged

Check self references in metric agg after last doc collection (#33593) #34001

merged 8 commits into from
Oct 25, 2018

Conversation

cbismuth
Copy link
Contributor

I've read what has been said about a post-collection hook, it should probably be done in LeafBucketCollector as it is in BucketCollector.

I've started to work on it, but before spending more time looking in which places this hook should be called, I would like to suggest the small change in this PR.

Make it more generic doesn't seem obvious to me as the second argument of the LeafBucketCollectorBase is an Object instance.

I'm not a Lucene guru, I may be totally wrong.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@colings86
Copy link
Contributor

I've read what has been said about a post-collection hook, it should probably be done in LeafBucketCollector as it is in BucketCollector.

Could you explain your reasoning here? I would expect that we could check for self-references by overriding the doPostCollection() method in ScriptedMetricAggregator?

Also, I think in its current form the self-reference check will not be run in the query in the search request does not match all documents since the collect method is only called for each document that matches the query but we will only call the check if we collect all documents in the index.

@cbismuth
Copy link
Contributor Author

Could you explain your reasoning here? I would expect that we could check for self-references by overriding the doPostCollection() method in ScriptedMetricAggregator?

No good reason in my reasoning. I missed the o.e.s.a.AggregatorBase#doPostCollection entry point while browsing the code base.

Also, I think in its current form the self-reference check will not be run in the query in the search request does not match all documents since the collect method is only called for each document that matches the query but we will only call the check if we collect all documents in the index.

Yes, I was afraid of that, as all tests are using the o.a.l.s.MatchAllDocsQuery API and no Painless script I tried from the documentation made the debugger stop in the o.e.s.a.LeafBucketCollectorBase#collect API.

Thank you for your help @colings86, I'll update my PR. Feel free to ask me to open a new one and close this one to avoid the last noisy commit.

@cbismuth
Copy link
Contributor Author

PR updated with proper hook implementation and tested with a Mockito#verify call to check CollectionUtils.ensureNoSelfReferences is called only once per metric aggregation.

IndexSettings indexSettings,
MultiBucketConsumerService.MultiBucketConsumer bucketConsumer,
MappedFieldType... fieldTypes) throws IOException {
aggregator = spy(super.createAggregator(query, aggregationBuilder, indexSearcher, indexSettings, bucketConsumer, fieldTypes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Im not particularly a fan of using spy() here. There are a couple of reasons for this:

  1. The class under is being wrapped by Mockito which feels a bit weird to me because we are not really testing the class we intend to test. It also has the danger of the Mockito usage being extended in future and us accidentally bypassing the actual logic in the ScriptedMetricAggregator itself
  2. We are creating a ensureNoSelfReferencesInAggState() method purely so we can do this spying which feels a bit ugly to me.

I wonder if we need to do this spying at all and instead could maybe just rely on the existing tests that check we catch when a self-reference occurs and not worry about ensuring its only called once? It feels to me like the downsides of doing this mockito check might outweigh the benefits.

@rjernst do you have any thoughts on this?

Copy link
Contributor Author

@cbismuth cbismuth Sep 27, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand and I do agree with you @colings86.

Another idea I can suggest would be to throw an exception from the ScriptedMetricAggregator if we do ensure no self references in agg state more than once.

EDIT But adding a counter and throwing an exception also seems overwhelming to me.

I let @rjernst give us his opinion on this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @colings86. I would rely on/improve existing tests. Mocking should not be necessary here. A mock script can be added to the existing tests which creates a cycle, and ScriptedMetricAggregator can call the normal ensureNoSelfReferences method directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @rjernst, I'll remove unnecessary mocking.

…tric_agg_exec

# Conflicts:
#	server/src/test/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregatorTests.java
@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 1, 2018

I've merged master into my branch, conflicts are resolved and PR is ready for review.

@cbismuth
Copy link
Contributor Author

cbismuth commented Oct 9, 2018

Hi, the latest comment in this PR is: do we need to test with a spied ScriptedMetricAggregator to verify single call to ensureNoSelfReferences or should we rely on existing tests?

This PR has been updated with master and all tests are green, I stay available for any necessary change.

@cbismuth
Copy link
Contributor Author

I've updated this PR to remove unnecessary mocking and rely on existing tests, PR is ready for review.

Copy link
Contributor

@colings86 colings86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cbismuth thanks for updating. This LGTM. I'll set off a test run

@colings86
Copy link
Contributor

@elasticmachine test this please

@colings86
Copy link
Contributor

@cbismuth the build failed but it looks unrelated to your changes. Would you be able to update your branch with the latest master and I'll kick off another CI run?

@cbismuth
Copy link
Contributor Author

@colings86, branch is up-to-date with master, thank you 👍

@colings86
Copy link
Contributor

@elasticmachine test this please

@colings86 colings86 merged commit 70871b5 into elastic:master Oct 25, 2018
@colings86
Copy link
Contributor

@cbismuth thanks for working on this PR. I've now merged it to master

colings86 pushed a commit that referenced this pull request Oct 25, 2018
(#34001)

* Check self references in metric agg after last doc collection (#33593)

* Revert 0aff5a3 (#33593)

* Check self refs in metric agg only once in post collection hook
(#33593)

* Remove unnecessary mocking (#33593)
@colings86
Copy link
Contributor

I have also backported this to 6.x. The backport is still pending on 6.5 and 6.4 and I'll do those backports tomorrow

@cbismuth
Copy link
Contributor Author

Great! Thank you @colings86.

@cbismuth cbismuth deleted the 33593_no_ensure_self_ref_after_scripted_metric_agg_exec branch October 25, 2018 17:39
colings86 pushed a commit that referenced this pull request Oct 26, 2018
(#34001)

* Check self references in metric agg after last doc collection (#33593)

* Revert 0aff5a3 (#33593)

* Check self refs in metric agg only once in post collection hook
(#33593)

* Remove unnecessary mocking (#33593)
colings86 pushed a commit that referenced this pull request Oct 26, 2018
(#34001)

* Check self references in metric agg after last doc collection (#33593)

* Revert 0aff5a3 (#33593)

* Check self refs in metric agg only once in post collection hook
(#33593)

* Remove unnecessary mocking (#33593)
@colings86
Copy link
Contributor

Backports to 6.5 and 6.4 branches now complete

jasontedor added a commit to jasontedor/elasticsearch that referenced this pull request Oct 26, 2018
* master: (74 commits)
  XContent: Check for bad parsers (elastic#34561)
  Docs: Align prose with snippet (elastic#34839)
  document the search context is freed if the scroll is not extended (elastic#34739)
  Test: Lookup node versions on rest test start (elastic#34657)
  SQL: Return error with ORDER BY on non-grouped. (elastic#34855)
  Reduce channels in AbstractSimpleTransportTestCase (elastic#34863)
  [DOCS] Updates Elasticsearch monitoring tasks (elastic#34339)
  Check self references in metric agg after last doc collection (elastic#33593) (elastic#34001)
  [Docs] Add `indices.query.bool.max_clause_count` setting (elastic#34779)
  Add 6.6.0 version to master (elastic#34847)
  Test: ensure char[] doesn't being with prefix (elastic#34816)
  Remove static import from HLRC doc snippet (elastic#34834)
  Logging: server: clean up logging (elastic#34593)
  Logging: tests: clean up logging (elastic#34606)
  SQL: Fix edge case: `<field> IN (null)` (elastic#34802)
  [Test] Mute FullClusterRestartIT.testShrink() until test is fixed
  SQL: Introduce ODBC mode, similar to JDBC (elastic#34825)
  SQL: handle X-Pack or X-Pack SQL not being available in a more graceful way (elastic#34736)
  [Docs] Add explanation for code snippets line width (elastic#34796)
  CCR: Rename follow-task parameters and stats (elastic#34836)
  ...
kcm pushed a commit that referenced this pull request Oct 30, 2018
#34001)

* Check self references in metric agg after last doc collection (#33593)

* Revert 0aff5a3 (#33593)

* Check self refs in metric agg only once in post collection hook (#33593)

* Remove unnecessary mocking (#33593)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants