Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Term vectors query returns nothing in nested inner objects if term vectors are stored in the index #21625

Closed
gronostajo opened this issue Nov 17, 2016 · 21 comments · Fixed by #32728
Assignees
Labels
>bug >docs General docs changes :Search/Search Search-related issues that do not fall into other categories

Comments

@gronostajo
Copy link

gronostajo commented Nov 17, 2016

Elasticsearch version: 5.0.0, 2.4.1

Plugins installed: []

JVM version: 1.8.0_112-b15 64-bit

OS version: Windows 10 v. 1607 (build 14393.447)

Description of the problem including expected versus actual behavior:

When making a _termvectors query to an object in a "type": "nested" field, response is returned with "term_vectors" : { } if vectors are stored ("store": true in mapping). If the object isn't nested or term vectors aren't stored, the object returned is filled properly (examples below). When TV storing is enabled then despite the fact that I'm unable to retrieve them, highlighting seems to be significantly faster, as if term vectors were present.

Steps to reproduce:

  1. PUT /mcve
2.  PUT /mcve/ex/_mapping
    {
      "properties": {
        "parent": {
          "type": "nested",
          "properties": {
            "value": {"type": "text", "term_vector": "with_positions_offsets", "store": true}
          }
        }
      }
    }
3.  PUT /mcve/ex/ample
    {
      "parent": {
        "value": "hello world"
      }
    }
4.  PUT /mcve/ex/ample/_termvectors
    {
      "fields": ["parent.value"]
    }

Response:

{
  "_index" : "mcve",
  "_type" : "ex",
  "_id" : "ample",
  "_version" : 1,
  "found" : true,
  "took" : 1,
  "term_vectors" : { }
}

However, if I disable storing term vectors by changing this line in mapping:

"value": {"type": "text", "term_vector": "with_positions_offsets", "store": true}

to this:

"value": {"type": "text"}

the result will be:

{
  "_index" : "mcve",
  "_type" : "ex",
  "_id" : "ample",
  "_version" : 1,
  "found" : true,
  "took" : 60,
  "term_vectors" : {
    "parent.value" : {
      "field_statistics" : {
        "sum_doc_freq" : 2,
        "doc_count" : 1,
        "sum_ttf" : 2
      },
      "terms" : {
        /* ...snip... */
      }
    }
  }
}

But highlighting queries will be slower (tested on a ~80 MB index with ~600 documents), as if term vectors were in fact built previously.

Provide logs (if relevant):

--

Related SO question

@jpountz
Copy link
Contributor

jpountz commented Nov 17, 2016

This is a tricky problem. The reason why the term vectors API works when term vectors are disabled is that it ignores the fact that the sum documents are nested, which is probably a bug. I don't think we thought about how the term vectors API should work with nested documents. Out of curiosity, what is your use-case for retrieving term vectors on nested docs?

@jacool
Copy link

jacool commented Jun 4, 2017

It seems the issue is not related to storing term vectors but to the "with_positions_offsets" option. As can be seen by this sequence:

PUT /mcve

PUT /mcve/ex/_mapping
{
  "properties": {
    "parent": {
      "type": "nested",
      "properties": {
        "value": { "type": "text",  "store": true }
      } } } }

PUT /mcve/ex/ample
{ "parent": { "value": "hello world" } }

GET /mcve/ex/ample/_termvectors
{ "fields": [ "parent.value" ] }

The last command produces the expected term-vectors.
However adding "term_vector": "with_positions_offsets" breaks the _termvectors functionality.
Reproducible with ES 5.4.0

@jpountz
Copy link
Contributor

jpountz commented Jun 5, 2017

The issue is that the term vectors API has been designed with the idea in mind that there would be a single document composed of 0 or more fields. However when indexing with nested documents, you have multiple sub-documents that each have their own term vectors, se we'd have to return an array of term vectors for each nested path.

@jacool
Copy link

jacool commented Jun 5, 2017

This is not what happens with the simple example we used above. Try to replace the indexing command above with this

PUT /mcve/ex/ample
{ "parent": [{ "value": "hello world " }, { "value": "having world" } ] }

And then request the _termvectors, you'll see some kind of merging is performed between the results representing inner documents. For us accidentally this merging makes sense, however the fact the functionality disappears once with_positions_offsets is declared seems like a bug.

@fbaligand
Copy link
Contributor

I'm really interested by this issue fix.

@bkazez
Copy link

bkazez commented Jan 13, 2018

I'm interested in this too. At the very least, it would be helpful to return an error if someone tries to get term vectors for a nested document, since the documentation never mentions that this isn't supported.

Meanwhile, is there another way to debug analyzers for nested docs?

@clintongormley clintongormley added :Search/Search Search-related issues that do not fall into other categories and removed :Term Vectors labels Feb 14, 2018
@mayya-sharipova mayya-sharipova changed the title Term vectors query returns nothing in nested objects if term vectors are stored Term vectors query returns nothing in nested inner objects if term vectors are stored in the index Mar 21, 2018
@mayya-sharipova
Copy link
Contributor

cc @elastic/es-search-aggs

We currently still have this issue.

Stored in the index term vectors on nested inner values, are not returned when requested.
While generated on the fly term vectors on nested inner values are returned.

A way to reproduce

PUT tm
{
  "mappings": {
    "doc": {
      "properties": {
        "users1" : {
          "type" : "nested",
          "properties": {
            "name" : {"type": "text"}
          }
        },
        "users2" : {
          "type" : "nested",
          "properties": {
            "name" : {"type": "text", "term_vector": "yes"}
          }
        }
      }
    }
  }
}

PUT tm/doc/1
{
  "users1": {"name": "John Snow"},
  "users2": {"name": "John Snow"}
}

GET tm/doc/1/_termvectors?fields=users1.name
//will return term vectors

GET tm/doc/1/_termvectors?fields=users2.name
//will NOT return term vectors

Possible resolutions:

  • accept this as a feature and document this behaviour
  • return stored term vectors of nested inner values
  • not return generated term vectors of nested inner values

@rbower54
Copy link

rbower54 commented Apr 4, 2018

Watching and very interested when this is remedied.

@jimczi
Copy link
Contributor

jimczi commented Jul 16, 2018

We discussed this issue internally and we've decided to document the limitations of this API regarding nested fields. Using this expert API with nested fields shouldn't be recommended, instead a copy field can be used on the main document. This doesn't mean that a fix is not possible but it is unlikely that we'll work on it in a near future so updating the documentation should help users to not fall into this trap.

@jimczi jimczi added >docs General docs changes and removed team-discuss labels Jul 16, 2018
@fbaligand
Copy link
Contributor

Thanks Jim for this answer.
This is not what I hoped, but thanks to have discussed about this issue.

@mayya-sharipova mayya-sharipova self-assigned this Aug 8, 2018
@mayya-sharipova mayya-sharipova removed the help wanted adoptme label Aug 8, 2018
mayya-sharipova added a commit to mayya-sharipova/elasticsearch that referenced this issue Aug 9, 2018
Requesting _termvectors on a nested field or any sub-fields of a nested field
returns empty results.

Closes elastic#21625
mayya-sharipova added a commit that referenced this issue Aug 23, 2018
Requesting _termvectors on a nested field or any sub-fields of a nested field
returns empty results.

Closes #21625
@mayya-sharipova
Copy link
Contributor

Reopening, as disabling term_vectors on nested fields breaks backward compatibility. We need to investigate more how to address it.

@mayya-sharipova
Copy link
Contributor

@gronostajo @fbaligand @rbower54
We would like to learn more about your use case of requesting _termvectors on nested fields? What is your application for them? Thanks.

@fbaligand
Copy link
Contributor

Hi,

I have a complex analyzer (numerous filters), that extracts one term from a big text.
I want to see what is the result term for a given document.

@dnm1977
Copy link

dnm1977 commented Mar 14, 2019

Am I correct to think that this also means that any highlighting that uses term vectors in nested fields will also not work? E.g., when I set set term_vectors="with_positions_offsets" on a nested field and then use the unified highlighter on that field, I get no highlights. (Even when not using a span query, just a simple Boolean.)

Having the ability to store term vector offsets to speed up highlighting on nested documents would be tremendously helpful for our search application. [edit: As you may know, the "plain" highlighter is extremely slow for large documents, and some of our nested documents can be large.]

[edit: also does not work with term_vectors="yes" as per @mayya-sharipova's post above.]

@jimczi
Copy link
Contributor

jimczi commented May 7, 2019

Nested documents are indexed as separate documents in the index so this is expected that a query that targets the root document cannot access these nested documents. One workaround for this is to use include_in_root in the definition of the nested field. This will duplicate the field at the root level and make it available for _termvectors and highlighting. Since we don't plan to handle these nested fields automatically and that a workaround exists (include_in_root) I am closing this issue.

@jimczi jimczi closed this as completed May 7, 2019
@fbaligand
Copy link
Contributor

Thanks for the workaround @jimczi !

@vaanessaromero
Copy link

Where do you put the include_in_root? @jimczi

@fbaligand
Copy link
Contributor

@vaanessaromero
In the index mapping :

PUT my_index
{
  "mappings": {
    "_doc": {
      "properties": {
        "user": {
          "type": "nested",
          "include_in_root": true
        }
      }
    }
  }
}

@fbaligand
Copy link
Contributor

fbaligand commented May 9, 2019

Hi,

After some tests, I realized that with Elasticsearch 6.5, if I just don't precise term_vector setting, it works out-of-the-box!

So, in the example given in the issue description, it works with this mapping:

PUT /mcve/ex/_mapping
{
  "properties": {
    "parent": {
      "type": "nested",
      "include_in_root": false,
      "properties": {
        "value": {
          "type": "text",
          "store": true
        }
      }
    }
  }
}

By the way, that's weird, because the default value is term_vector: no

@jimczi
Copy link
Contributor

jimczi commented May 9, 2019

After some tests, I realized that with Elasticsearch 6.5, if I just don't precise term_vector setting, it works out-of-the-box!

It works because term vectors are generated on the fly if they don't exist in the index (term_vector: no):
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-termvectors.html#_example_generating_term_vectors_on_the_fly

@fbaligand
Copy link
Contributor

Thanks for the precision @jimczi !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug >docs General docs changes :Search/Search Search-related issues that do not fall into other categories
Projects
None yet
Development

Successfully merging a pull request may close this issue.