Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix completion suggester's score tie-break #34508

Merged
merged 6 commits into from
Oct 19, 2018

Conversation

jimczi
Copy link
Contributor

@jimczi jimczi commented Oct 16, 2018

The shard suggestion sort uses a different tie-break than the one that is used
to merge different shards responses. The former uses the internal document identifier
when scores are the same whereas the latter compares the surface form first.
Because of this discrepancy some suggestion outputs are linked to the wrong documents
because the merge sort reorders the shard suggestions differently. This change
fixes this bug by duplicating the Lucene collector in order to be able to apply the
same tiebreak strategy than the merge sort. This logic will be removed when
https://issues.apache.org/jira/browse/LUCENE-8529 is fixed.

Closes #34378

The shard suggestion sort uses a different tie-break than the one that is used
to merge different shards responses. The former uses the internal document identifier
when scores are the same whereas the latter compares the surface form first.
Because of this discrepancy some suggestion outputs are linked to the wrong documents
because the merge sort reorders the shard suggestions differently. This change
fixes this bug by duplicating the Lucene collector in order to be able to apply the
same tiebreak strategy than the merge sort. This logic will be removed when
https://issues.apache.org/jira/browse/LUCENE-8529 is fixed.

Closes elastic#34378
@jimczi jimczi added >bug :Search Relevance/Suggesters "Did you mean" and suggestions as you type v7.0.0 v6.5.0 labels Oct 16, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

Copy link
Contributor

@mayya-sharipova mayya-sharipova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @jimczi , your PR makes sense

The only thing that is not clear for me why score (decr), completion key (incr), document id(incr) can't be a natural sorting order of SuggestScoreDoc? May be, there are some other reasons.

@jimczi
Copy link
Contributor Author

jimczi commented Oct 19, 2018

Thanks @mayya-sharipova

score (decr), completion key (incr), document id(incr) can't be a natural sorting order of SuggestScoreDoc?

I think we can have this discussion on the Lucene issue ?

@jimczi jimczi merged commit fba5d39 into elastic:master Oct 19, 2018
@jimczi jimczi deleted the bug/completion_suggester_tiebreak branch October 19, 2018 17:46
jimczi added a commit that referenced this pull request Oct 19, 2018
The shard suggestion sort uses a different tie-break than the one that is used
to merge different shards responses. The former uses the internal document identifier
when scores are the same whereas the latter compares the surface form first.
Because of this discrepancy some suggestion outputs are linked to the wrong documents
because the merge sort reorders the shard suggestions differently. This change
fixes this bug by duplicating the Lucene collector in order to be able to apply the
same tiebreak strategy than the merge sort. This logic will be removed when
https://issues.apache.org/jira/browse/LUCENE-8529 is fixed.

Closes #34378
jimczi added a commit that referenced this pull request Oct 19, 2018
jimczi added a commit to jimczi/elasticsearch that referenced this pull request Oct 25, 2018
This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for elastic#34210 which adds the `parent` aggregation in the parent-join module.

Relates elastic#34508
jimczi added a commit that referenced this pull request Oct 26, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
jimczi added a commit that referenced this pull request Oct 26, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
kcm pushed a commit that referenced this pull request Oct 30, 2018
The shard suggestion sort uses a different tie-break than the one that is used
to merge different shards responses. The former uses the internal document identifier
when scores are the same whereas the latter compares the surface form first.
Because of this discrepancy some suggestion outputs are linked to the wrong documents
because the merge sort reorders the shard suggestions differently. This change
fixes this bug by duplicating the Lucene collector in order to be able to apply the
same tiebreak strategy than the merge sort. This logic will be removed when
https://issues.apache.org/jira/browse/LUCENE-8529 is fixed.

Closes #34378
kcm pushed a commit that referenced this pull request Oct 30, 2018
)

This commit adds a new ParentJoinAggregator that implements a join using global ordinals
in a way that can be reused by the `children` and the upcoming `parent` aggregation.
This new aggregator is a refactor of the existing ParentToChildrenAggregator with two main changes:
* It uses a dense bit array instead of a long array when the aggregation does not have any parent.
* It uses a single aggregator per bucket if it is nested under another aggregation.
For the latter case we use a `MultiBucketAggregatorWrapper` in the factory in order to ensure that each
instance of the aggregator handles a single bucket. This is more inlined with the strategy we use for other
aggregations like `terms` aggregation for instance since the number of buckets to handle should be low (thanks to the breadth_first strategy).
This change is also required for #34210 which adds the `parent` aggregation in the parent-join module.

Relates #34508
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search Relevance/Suggesters "Did you mean" and suggestions as you type v6.5.0 v7.0.0-beta1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Completion suggester returns mangled documents
4 participants