-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Term vectors query returns nothing in nested inner objects if term vectors are stored in the index #21625
Comments
This is a tricky problem. The reason why the term vectors API works when term vectors are disabled is that it ignores the fact that the sum documents are nested, which is probably a bug. I don't think we thought about how the term vectors API should work with nested documents. Out of curiosity, what is your use-case for retrieving term vectors on nested docs? |
It seems the issue is not related to storing term vectors but to the "with_positions_offsets" option. As can be seen by this sequence:
The last command produces the expected term-vectors. |
The issue is that the term vectors API has been designed with the idea in mind that there would be a single document composed of 0 or more fields. However when indexing with nested documents, you have multiple sub-documents that each have their own term vectors, se we'd have to return an array of term vectors for each nested path. |
This is not what happens with the simple example we used above. Try to replace the indexing command above with this
And then request the _termvectors, you'll see some kind of merging is performed between the results representing inner documents. For us accidentally this merging makes sense, however the fact the functionality disappears once with_positions_offsets is declared seems like a bug. |
I'm really interested by this issue fix. |
I'm interested in this too. At the very least, it would be helpful to return an error if someone tries to get term vectors for a nested document, since the documentation never mentions that this isn't supported. Meanwhile, is there another way to debug analyzers for nested docs? |
cc @elastic/es-search-aggs We currently still have this issue. Stored in the index term vectors on nested inner values, are not returned when requested. A way to reproduce PUT tm
{
"mappings": {
"doc": {
"properties": {
"users1" : {
"type" : "nested",
"properties": {
"name" : {"type": "text"}
}
},
"users2" : {
"type" : "nested",
"properties": {
"name" : {"type": "text", "term_vector": "yes"}
}
}
}
}
}
}
PUT tm/doc/1
{
"users1": {"name": "John Snow"},
"users2": {"name": "John Snow"}
}
GET tm/doc/1/_termvectors?fields=users1.name
//will return term vectors
GET tm/doc/1/_termvectors?fields=users2.name
//will NOT return term vectors Possible resolutions:
|
Watching and very interested when this is remedied. |
We discussed this issue internally and we've decided to document the limitations of this API regarding nested fields. Using this expert API with nested fields shouldn't be recommended, instead a copy field can be used on the main document. This doesn't mean that a fix is not possible but it is unlikely that we'll work on it in a near future so updating the documentation should help users to not fall into this trap. |
Thanks Jim for this answer. |
Requesting _termvectors on a nested field or any sub-fields of a nested field returns empty results. Closes elastic#21625
Requesting _termvectors on a nested field or any sub-fields of a nested field returns empty results. Closes #21625
Reopening, as disabling term_vectors on nested fields breaks backward compatibility. We need to investigate more how to address it. |
@gronostajo @fbaligand @rbower54 |
Hi, I have a complex analyzer (numerous filters), that extracts one term from a big text. |
Am I correct to think that this also means that any highlighting that uses term vectors in nested fields will also not work? E.g., when I set set Having the ability to store term vector offsets to speed up highlighting on nested documents would be tremendously helpful for our search application. [edit: As you may know, the "plain" highlighter is extremely slow for large documents, and some of our nested documents can be large.] [edit: also does not work with |
Nested documents are indexed as separate documents in the index so this is expected that a query that targets the root document cannot access these nested documents. One workaround for this is to use |
Thanks for the workaround @jimczi ! |
Where do you put the include_in_root? @jimczi |
@vaanessaromero
|
Hi, After some tests, I realized that with Elasticsearch 6.5, if I just don't precise So, in the example given in the issue description, it works with this mapping:
By the way, that's weird, because the default value is |
It works because term vectors are generated on the fly if they don't exist in the index ( |
Thanks for the precision @jimczi ! |
Elasticsearch version: 5.0.0, 2.4.1
Plugins installed: []
JVM version: 1.8.0_112-b15 64-bit
OS version: Windows 10 v. 1607 (build 14393.447)
Description of the problem including expected versus actual behavior:
When making a
_termvectors
query to an object in a"type": "nested"
field, response is returned with"term_vectors" : { }
if vectors are stored ("store": true
in mapping). If the object isn't nested or term vectors aren't stored, the object returned is filled properly (examples below). When TV storing is enabled then despite the fact that I'm unable to retrieve them, highlighting seems to be significantly faster, as if term vectors were present.Steps to reproduce:
PUT /mcve
Response:
However, if I disable storing term vectors by changing this line in mapping:
to this:
the result will be:
But highlighting queries will be slower (tested on a ~80 MB index with ~600 documents), as if term vectors were in fact built previously.
Provide logs (if relevant):
--
Related SO question
The text was updated successfully, but these errors were encountered: