Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Broken sort on multiple-level nested documents #32130

Closed
JulienColin opened this issue Jul 17, 2018 · 5 comments · Fixed by #32204
Closed

Broken sort on multiple-level nested documents #32130

JulienColin opened this issue Jul 17, 2018 · 5 comments · Fixed by #32204
Labels
>bug :Search/Search Search-related issues that do not fall into other categories

Comments

@JulienColin
Copy link

Elasticsearch version : 6.3.1 and below

JVM version : 1.8.0_171

OS version : Ubuntu 16.04 LTS

Expected behaviours :
correct sort. The family with id=2 should get a sort value of 30 in the example below.

Problem description :
faulty sort when querying on a 3-levels nested objects model, and sorting parent objects on a field from the lower level. In the example below, family with id=2 is getting a sort value of 10 while it should be 30 (the value 10 doesn't even appear in the document with id=2).

Steps to reproduce:

  1. Create index
    PUT tree { "settings": {"number_of_shards": 1,"number_of_replicas": 0 } }

  2. Put mapping
    PUT tree/family/_mapping {"properties":{"name":{"type":"keyword"},"members":{"type":"nested","properties":{"firstname":{"type":"keyword"},"color":{"type":"keyword"},"levels":{"type":"nested","properties":{"strength":{"type":"integer"}}}}}}}

  3. Insert data (bulk index API)
    POST _bulk { "index" : { "_index" : "tree", "_type" : "family", "_id" : "1" } } {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":10}},{"firstName":"Serge","color":"brown","levels":{"strength":15}},{"firstName":"Marie","color":"brown","levels":{"strength":20}}]} { "index" : { "_index" : "tree", "_type" : "family", "_id" : "2" } } {"name":"Simpson","members":[{"firstName":"Homer","color":"brown","levels":{"strength":30}},{"firstName":"Lisa","color":"brown","levels":{"strength":40}},{"firstName":"Marge","color":"brown","levels":{"strength":60}}]} { "index" : { "_index" : "tree", "_type" : "family", "_id" : "3" } } {"name":"Simpson","members":[{"firstName":"Bart","color":"yellow","levels":{"strength":70}},{"firstName":"Snowball","color":"yellow","levels":{"strength":80}},{"firstName":"Maggie","color":"yellow","levels":{"strength":90}},{"firstName":"Gandpa","color":"brown","levels":{"strength":95}}]}

  4. Query
    GET tree/_search { "query": { "bool": { "filter": [ { "term": { "name": { "value": "Simpson" } } }, { "nested": { "path" : "members", "query": { "bool" : { "filter" : [ { "term" : { "members.color" : { "value" : "brown" } } } ] } } } } ] } }, "sort": [ { "members.levels.strength": { "order": "asc", "nested": { "path": "members", "filter": { "term" : { "members.color" : { "value" : "brown" } } }, "nested": { "path": "members.levels" } } } } ] }

  5. Results
    { "hits": { "total": 2, "max_score": null, "hits": [ { "_index": "tree", "_type": "family", "_id": "2", "_score": null, "_source": { "name": "Simpson", "members": [ { "firstName": "Homer", "color": "brown", "levels": { "strength": 30 } }, { "firstName": "Lisa", "color": "brown", "levels": { "strength": 40 } }, { "firstName": "Marge", "color": "brown", "levels": { "strength": 60 } } ] }, "sort": [ 10 ] }, ... ] } }


Note that the result of the query above is correct if the index API was used instead of the bulk API, using the commands below :
POST tree/family {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":10}},{"firstName":"Serge","color":"brown","levels":{"strength":15}},{"firstName":"Marie","color":"brown","levels":{"strength":20}}]} POST tree/family {"name":"Simpson","members":[{"firstName":"Homer","color":"brown","levels":{"strength":30}},{"firstName":"Lisa","color":"brown","levels":{"strength":40}},{"firstName":"Marge","color":"brown","levels":{"strength":60}}]} POST tree/family {"name":"Simpson","members":[{"firstName":"Bart","color":"yellow","levels":{"strength":70}},{"firstName":"Snowball","color":"yellow","levels":{"strength":80}},{"firstName":"Maggie","color":"yellow","levels":{"strength":90}},{"firstName":"Gandpa","color":"brown","levels":{"strength":95}}]}

See following discussion : https://discuss.elastic.co/t/issue-sorting-nested-documents-indexed-via-bulk/139164

@jasontedor jasontedor added the :Search/Search Search-related issues that do not fall into other categories label Jul 17, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-search-aggs

@cbuescher
Copy link
Member

@JulienColin thanks for raising this here, and thanks for the great reproduction. I was able to see similiar behaviour locally on 6.3.0.
For anybody interested in reproducing this quickly on console I put the parts of the above reproduction together in a nice to copy&paste Console script here: https://gist.github.com/cbuescher/f9c8c2132d2667d3e907a6283d3f171a

@cbuescher
Copy link
Member

Whats indeed weird is that in the case of bulk indexing, the sort-value for document "2" seems to get picked up from the smallest "strength"-value in document "1". If I e.g. change this to {"name":"Doe","members":[{"firstName":"John","color":"brown","levels":{"strength":12}}]} in the bulk example, I get "12" as the sort value of doc "2" in the response

@cbuescher cbuescher added the >bug label Jul 18, 2018
@polyfractal
Copy link
Contributor

This might be related to the problem under discussion in #31554

Not quite the same (no missing fields), but similar symptoms: wrong sort values getting picked up.

@JulienColin
Copy link
Author

Hello , thank you @cbuescher for your help, and @polyfractal for pointing out the similarities.
Though I am not sure the issues are the same, as mine is reproductible 100% of the times, whereas it seems to reproduce only under very precise circumstances in #31554 . Anyway, it would be interesting to see if the fix proposed there fixes this issue as well.

jimczi added a commit that referenced this issue Jul 20, 2018
The parent filter for nested sort should always match **all** parents regardless
of the child queries. It is used to find the boundaries of a single parent and we use
the child query to match all the filters set in the nested tree so there is no need to
repeat the nested filters.
With this change we ensure that we build bitset filters
only to find the root docs (or the docs at the level where the sort applies) that can be reused
among queries.

Closes #31554
Closes #32130
Closes #31783

Co-authored-by: Dominic Bevacqua <bev@treatwell.com>
jimczi added a commit that referenced this issue Jul 20, 2018
The parent filter for nested sort should always match **all** parents regardless
of the child queries. It is used to find the boundaries of a single parent and we use
the child query to match all the filters set in the nested tree so there is no need to
repeat the nested filters.
With this change we ensure that we build bitset filters
only to find the root docs (or the docs at the level where the sort applies) that can be reused
among queries.

Closes #31554
Closes #32130
Closes #31783

Co-authored-by: Dominic Bevacqua <bev@treatwell.com>
jimczi added a commit that referenced this issue Jul 20, 2018
The parent filter for nested sort should always match **all** parents regardless
of the child queries. It is used to find the boundaries of a single parent and we use
the child query to match all the filters set in the nested tree so there is no need to
repeat the nested filters.
With this change we ensure that we build bitset filters
only to find the root docs (or the docs at the level where the sort applies) that can be reused
among queries.

Closes #31554
Closes #32130
Closes #31783

Co-authored-by: Dominic Bevacqua <bev@treatwell.com>
yrodiere added a commit to hibernate/hibernate-search that referenced this issue Jun 9, 2020
ES 6.2, 6.3.0, 6.3.1 and 6.3.2 and below have a bug that prevents double-nested
sorts from working: elastic/elasticsearch#32130

In our case, DistanceSearchSortBaseIT and FieldSearchSortBaseIT were
failing with parameter IndexFieldLocation.IN_NESTED_TWICE.
yrodiere added a commit to yrodiere/hibernate-search that referenced this issue Jun 17, 2020
ES 6.2, 6.3.0, 6.3.1 and 6.3.2 and below have a bug that prevents double-nested
sorts from working: elastic/elasticsearch#32130

In our case, DistanceSearchSortBaseIT and FieldSearchSortBaseIT were
failing with parameter IndexFieldLocation.IN_NESTED_TWICE.
yrodiere added a commit to yrodiere/hibernate-search that referenced this issue Jun 17, 2020
ES 6.2, 6.3.0, 6.3.1 and 6.3.2 and below have a bug that prevents double-nested
sorts from working: elastic/elasticsearch#32130

In our case, DistanceSearchSortBaseIT and FieldSearchSortBaseIT were
failing with parameter IndexFieldLocation.IN_NESTED_TWICE.
wklaczynski pushed a commit to wklaczynski/hibernate-search that referenced this issue Feb 14, 2021
ES 6.2, 6.3.0, 6.3.1 and 6.3.2 and below have a bug that prevents double-nested
sorts from working: elastic/elasticsearch#32130

In our case, DistanceSearchSortBaseIT and FieldSearchSortBaseIT were
failing with parameter IndexFieldLocation.IN_NESTED_TWICE.
wklaczynski pushed a commit to wklaczynski/hibernate-search that referenced this issue Feb 14, 2021
ES 6.2, 6.3.0, 6.3.1 and 6.3.2 and below have a bug that prevents double-nested
sorts from working: elastic/elasticsearch#32130

In our case, DistanceSearchSortBaseIT and FieldSearchSortBaseIT were
failing with parameter IndexFieldLocation.IN_NESTED_TWICE.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants