Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multiple levels of field collapse #24855

Closed
eskibars opened this issue May 24, 2017 · 4 comments
Closed

Multiple levels of field collapse #24855

eskibars opened this issue May 24, 2017 · 4 comments
Assignees
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories

Comments

@eskibars
Copy link
Contributor

Describe the feature:
Currently, we support 1 level of field collapse + inner hits (https://www.elastic.co/guide/en/elasticsearch/reference/5.4/search-request-collapse.html#_expand_collapse_results). However, some users need to collapse multiple levels (while still providing features provided by field collapse, e.g. pagination), e.g. to simply provide the top item for 2 tiers of collapsing. For those users, they have a few options:

  1. Ignore the pagination requirement and do terms+terms+top_hits (which will be fine for some users)
  2. In a single query, ask for a very large inner_hits size and then reduce the set client-side
  3. Perform multiple queries: 1 for the first collapse level and then iterate through these for secondary queries

When 1 is not an option for business purposes, this leaves you with options 2 and 3. 2 is a fairly abusive/expensive way to solve it when you're looking for just the top result under each of the top-level collapse. 3 adds a lot of client code as well as round trips (which increase overall query latency).

Unfortunately, opening up multiple levels also opens the query up for abuse in different ways (asking for high depth)

I wanted to open up for discussion the possibility of having multiple levels of field collapse, even if there were additional restrictions (e.g. only 2 levels, only can return a much smaller result set from the second level, etc)

@eskibars eskibars added discuss :Search/Search Search-related issues that do not fall into other categories labels May 24, 2017
@emmerich
Copy link

emmerich commented Jul 7, 2017

Just a quick note that I myself came across this requirement, and decided to approach it with the terms+terms+top_hits approach. With 7000 documents returned by the query before aggregations, I found that the performance tripled when I added the final top_hits aggregation. (terms+top_hits was fine, terms+terms was fine, terms+terms+top_hits was awful).

I raised this as a forum post here: https://discuss.elastic.co/t/top-hits-performance-inside-2-levels-of-terms-aggregations/92266

But it's perhaps useful to see it here too.

@tarunramsinghani
Copy link

+1

We are also using term->terms->TopHits to solve a problem, where multiple level of collapse with pagination will be very useful. But the question is would it be better to have Parent->Child Structure to solve this problem performance wise instead of having N-level nesting for collapse ?

@andyb-elastic
Copy link
Contributor

@elastic/es-search-aggs

@mayya-sharipova mayya-sharipova self-assigned this Jun 19, 2018
@mayya-sharipova mayya-sharipova removed the help wanted adoptme label Jun 19, 2018
mayya-sharipova added a commit that referenced this issue Jun 27, 2018
Introduce collapsing on multiple fields

`field` field in  the `collapse` request in addition of taking a string,
can take an array - fields on which to collapse.

Limitation:  all fields in the field collapsing request must be
of the same type, either all are of keyword or numeric type.

Example request:
```json
{
    "query": {
        "match": {
            "address": "victoria"
        }
    },
    "collapse" : {
        "field" : ["country", "city"]
    }
}
```

Example response:
```json
{
    ...
    "hits": [
        {
            ...
            "fields": {
                "country": [
                    "Canda"
                ],
                "city": [
                    "Saskatoon"
                ]
            }
        },
        {
            ...,
            "fields": {
                "country": [
                    "Canada"
                ],
                "city": [
                    "Toronto"
                ]
            }
        },
        {
            ...,
            "fields": {
                "country": [
                    "UK"
                ],
                "city": [
                    "London"
                ]
            }
        }
    ]
}
```

Breaking changes:
The internal format between nodes for TopDocs for a collapsing request
has been changed.

TODO:
1. Limit the number of fields for multiple collapsing
2. Return 400x instead of 500x for field types on which collapsing
    can't be done (all types except keyword or numeric)

Closes #24855
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Jul 5, 2018
mayya-sharipova added a commit that referenced this issue Jul 13, 2018
* Put second level collapse under inner_hits

Closes #24855
mayya-sharipova added a commit that referenced this issue Jul 13, 2018
Put second level collapse under inner_hits

Closes #24855
@stojan-jovic
Copy link

Wondering why did we give up from approach introduced in #31557? Are there plans to support something like that in the near future?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>enhancement :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

No branches or pull requests

7 participants