Fix completion suggester's score tie-break #34508

jimczi · 2018-10-16T10:08:52Z

The shard suggestion sort uses a different tie-break than the one that is used
to merge different shards responses. The former uses the internal document identifier
when scores are the same whereas the latter compares the surface form first.
Because of this discrepancy some suggestion outputs are linked to the wrong documents
because the merge sort reorders the shard suggestions differently. This change
fixes this bug by duplicating the Lucene collector in order to be able to apply the
same tiebreak strategy than the merge sort. This logic will be removed when
https://issues.apache.org/jira/browse/LUCENE-8529 is fixed.

Closes #34378

The shard suggestion sort uses a different tie-break than the one that is used to merge different shards responses. The former uses the internal document identifier when scores are the same whereas the latter compares the surface form first. Because of this discrepancy some suggestion outputs are linked to the wrong documents because the merge sort reorders the shard suggestions differently. This change fixes this bug by duplicating the Lucene collector in order to be able to apply the same tiebreak strategy than the merge sort. This logic will be removed when https://issues.apache.org/jira/browse/LUCENE-8529 is fixed. Closes elastic#34378

elasticmachine · 2018-10-16T10:08:55Z

Pinging @elastic/es-search-aggs

mayya-sharipova

Thanks @jimczi , your PR makes sense

The only thing that is not clear for me why score (decr), completion key (incr), document id(incr) can't be a natural sorting order of SuggestScoreDoc? May be, there are some other reasons.

jimczi · 2018-10-19T15:57:07Z

Thanks @mayya-sharipova

score (decr), completion key (incr), document id(incr) can't be a natural sorting order of SuggestScoreDoc?

I think we can have this discussion on the Lucene issue ?

The shard suggestion sort uses a different tie-break than the one that is used to merge different shards responses. The former uses the internal document identifier when scores are the same whereas the latter compares the surface form first. Because of this discrepancy some suggestion outputs are linked to the wrong documents because the merge sort reorders the shard suggestions differently. This change fixes this bug by duplicating the Lucene collector in order to be able to apply the same tiebreak strategy than the merge sort. This logic will be removed when https://issues.apache.org/jira/browse/LUCENE-8529 is fixed. Closes #34378

Relates #34508

This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for elastic#34210 which adds the `parent` aggregation in the parent-join module. Relates elastic#34508

) This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for #34210 which adds the `parent` aggregation in the parent-join module. Relates #34508

The shard suggestion sort uses a different tie-break than the one that is used to merge different shards responses. The former uses the internal document identifier when scores are the same whereas the latter compares the surface form first. Because of this discrepancy some suggestion outputs are linked to the wrong documents because the merge sort reorders the shard suggestions differently. This change fixes this bug by duplicating the Lucene collector in order to be able to apply the same tiebreak strategy than the merge sort. This logic will be removed when https://issues.apache.org/jira/browse/LUCENE-8529 is fixed. Closes #34378

Relates #34508

) This commit adds a new ParentJoinAggregator that implements a join using global ordinals in a way that can be reused by the `children` and the upcoming `parent` aggregation. This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes: * It uses a dense bit array instead of a long array when the aggregation does not have any parent. * It uses a single aggregator per bucket if it is nested under another aggregation. For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy). This change is also required for #34210 which adds the `parent` aggregation in the parent-join module. Relates #34508

jimczi added 2 commits October 16, 2018 11:56

fix style

26792b1

jimczi added >bug :Search Relevance/Suggesters "Did you mean" and suggestions as you type v7.0.0 v6.5.0 labels Oct 16, 2018

jimczi added 3 commits October 16, 2018 12:30

remove redundant public modifier

039dc72

fix style

2bc3ab2

Merge branch 'master' into bug/completion_suggester_tiebreak

d9473a5

mayya-sharipova self-requested a review October 17, 2018 21:03

fix random test

d99c110

mayya-sharipova approved these changes Oct 19, 2018

View reviewed changes

jimczi merged commit fba5d39 into elastic:master Oct 19, 2018

jimczi deleted the bug/completion_suggester_tiebreak branch October 19, 2018 17:46

jimczi added a commit that referenced this pull request Oct 19, 2018

#34508: fix compil after backport

f8db72f

jimczi added a commit that referenced this pull request Oct 19, 2018

[TEST] Fix sporadic failures in CompletionSuggestSearchIT#testTiebreak

ba87c54

Relates #34508

jimczi added a commit that referenced this pull request Oct 19, 2018

[TEST] Fix sporadic failures in CompletionSuggestSearchIT#testTiebreak

b77ca32

Relates #34508

jimczi mentioned this pull request Oct 25, 2018

Refactor children aggregator into a generic ParentJoinAggregator #34845

Merged

kcm pushed a commit that referenced this pull request Oct 30, 2018

[TEST] Fix sporadic failures in CompletionSuggestSearchIT#testTiebreak

8c01c7c

Relates #34508

colings86 added v7.0.0-beta1 and removed v7.0.0 labels Feb 7, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix completion suggester's score tie-break #34508

Fix completion suggester's score tie-break #34508

jimczi commented Oct 16, 2018

elasticmachine commented Oct 16, 2018

mayya-sharipova left a comment

jimczi commented Oct 19, 2018

Fix completion suggester's score tie-break #34508

Fix completion suggester's score tie-break #34508

Conversation

jimczi commented Oct 16, 2018

elasticmachine commented Oct 16, 2018

mayya-sharipova left a comment

Choose a reason for hiding this comment

jimczi commented Oct 19, 2018