-
Notifications
You must be signed in to change notification settings - Fork 24.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check self references in metric agg after last doc collection (#33593) #34001
Check self references in metric agg after last doc collection (#33593) #34001
Conversation
Pinging @elastic/es-search-aggs |
Could you explain your reasoning here? I would expect that we could check for self-references by overriding the Also, I think in its current form the self-reference check will not be run in the query in the search request does not match all documents since the collect method is only called for each document that matches the query but we will only call the check if we collect all documents in the index. |
No good reason in my reasoning. I missed the
Yes, I was afraid of that, as all tests are using the Thank you for your help @colings86, I'll update my PR. Feel free to ask me to open a new one and close this one to avoid the last noisy commit. |
PR updated with proper hook implementation and tested with a |
IndexSettings indexSettings, | ||
MultiBucketConsumerService.MultiBucketConsumer bucketConsumer, | ||
MappedFieldType... fieldTypes) throws IOException { | ||
aggregator = spy(super.createAggregator(query, aggregationBuilder, indexSearcher, indexSettings, bucketConsumer, fieldTypes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im not particularly a fan of using spy()
here. There are a couple of reasons for this:
- The class under is being wrapped by
Mockito
which feels a bit weird to me because we are not really testing the class we intend to test. It also has the danger of theMockito
usage being extended in future and us accidentally bypassing the actual logic in theScriptedMetricAggregator
itself - We are creating a
ensureNoSelfReferencesInAggState()
method purely so we can do this spying which feels a bit ugly to me.
I wonder if we need to do this spying at all and instead could maybe just rely on the existing tests that check we catch when a self-reference occurs and not worry about ensuring its only called once? It feels to me like the downsides of doing this mockito check might outweigh the benefits.
@rjernst do you have any thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand and I do agree with you @colings86.
Another idea I can suggest would be to throw an exception from the ScriptedMetricAggregator
if we do ensure no self references in agg state more than once.
EDIT But adding a counter and throwing an exception also seems overwhelming to me.
I let @rjernst give us his opinion on this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @colings86. I would rely on/improve existing tests. Mocking should not be necessary here. A mock script can be added to the existing tests which creates a cycle, and ScriptedMetricAggregator can call the normal ensureNoSelfReferences method directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @rjernst, I'll remove unnecessary mocking.
…tric_agg_exec # Conflicts: # server/src/test/java/org/elasticsearch/search/aggregations/metrics/ScriptedMetricAggregatorTests.java
I've merged |
Hi, the latest comment in this PR is: do we need to test with a spied This PR has been updated with |
I've updated this PR to remove unnecessary mocking and rely on existing tests, PR is ready for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cbismuth thanks for updating. This LGTM. I'll set off a test run
@elasticmachine test this please |
@cbismuth the build failed but it looks unrelated to your changes. Would you be able to update your branch with the latest master and I'll kick off another CI run? |
@colings86, branch is up-to-date with |
@elasticmachine test this please |
@cbismuth thanks for working on this PR. I've now merged it to master |
I have also backported this to 6.x. The backport is still pending on 6.5 and 6.4 and I'll do those backports tomorrow |
Great! Thank you @colings86. |
Backports to 6.5 and 6.4 branches now complete |
* master: (74 commits) XContent: Check for bad parsers (elastic#34561) Docs: Align prose with snippet (elastic#34839) document the search context is freed if the scroll is not extended (elastic#34739) Test: Lookup node versions on rest test start (elastic#34657) SQL: Return error with ORDER BY on non-grouped. (elastic#34855) Reduce channels in AbstractSimpleTransportTestCase (elastic#34863) [DOCS] Updates Elasticsearch monitoring tasks (elastic#34339) Check self references in metric agg after last doc collection (elastic#33593) (elastic#34001) [Docs] Add `indices.query.bool.max_clause_count` setting (elastic#34779) Add 6.6.0 version to master (elastic#34847) Test: ensure char[] doesn't being with prefix (elastic#34816) Remove static import from HLRC doc snippet (elastic#34834) Logging: server: clean up logging (elastic#34593) Logging: tests: clean up logging (elastic#34606) SQL: Fix edge case: `<field> IN (null)` (elastic#34802) [Test] Mute FullClusterRestartIT.testShrink() until test is fixed SQL: Introduce ODBC mode, similar to JDBC (elastic#34825) SQL: handle X-Pack or X-Pack SQL not being available in a more graceful way (elastic#34736) [Docs] Add explanation for code snippets line width (elastic#34796) CCR: Rename follow-task parameters and stats (elastic#34836) ...
I've read what has been said about a post-collection hook, it should probably be done in
LeafBucketCollector
as it is inBucketCollector
.I've started to work on it, but before spending more time looking in which places this hook should be called, I would like to suggest the small change in this PR.
Make it more generic doesn't seem obvious to me as the second argument of the LeafBucketCollectorBase is an
Object
instance.I'm not a Lucene guru, I may be totally wrong.