-
Notifications
You must be signed in to change notification settings - Fork 116
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE] Improved multi vector support using Nested fields #1065
Comments
Question:
|
It needs a code change from k-NN plugin to adapt the feature. Lucene introduced a new Query type One additional field is required for
The feature works only for Lucene engine as Faiss and nmslib uses our own custom Query. |
Expected behavior1. Create knn field with lucene engine
2. Index data
3. Query data
4. Should return two documents (Current implementation returns 1 document)
|
It won't. After the change, it might return k results when it returned less than k results before. If the result was more than k before, the result will be same even after this change. |
Let say I index two docs with a nested field. The first doc has vectors A,B and the second doc has vectors [C,D]. A, B, C, D represent different "items" in my corpus. I perform a search with k=2 and the most similar items are A,B. I expect the first doc to be returned. This was the previous behavior, correct? Now, (A,B) and (C,D) represents "items" and A,B, D, C are chunks that represent these items, the new enhancement will retrieve both docs for k=2, correct? |
Also, can you please provide an example of how to use this feature with neural search? Specifically, given a nested field of strings, how can I construct a nested field of vectors using the text embedding processor. |
That is correct.
This is correct. |
Ok, so let's say I intentionally data modeled my application around the first scenario. If I upgrade to 2.12, would it not break my app? Pre 2.12 I get one doc. I upgrade now I get two docs in the result. My app functionality has changed. |
I won't say it is a breaking of an app. It is a wrong way of using the nested field. If you only rely on k value, its behavior is non-deterministic. For example, in the above example, if there are two segments and doc1 is in segment1 and doc2 is in segment2, with k=2, you will get both documents as results even before this change. |
The question should be asked in neural search repo. There is a GH issue for it. opensearch-project/neural-search#482 |
One could debate what's a good data model, but there could be valid reasons for electing this data modeling design. Regardless of whether the user made a good data modeling decision, we don't govern or restrict users from being able to design their data model in either way. I suggest we have an index configuration like "nested_vector_mode" = SINGLE | MULTI. It could be defaulted to "SINGLE". At least someone has the option to change the config to "MULTI" in case this causes a breaking change. |
The meaning of |
Can you think of a scenario where the ranking is changed? Before we only return 2 results because the 5 most similar things are in those two documents. Now we return 5. Can you think of a scenario where the 2 original results might end up being ranked lower after the change? |
No such case unless user rerank on the returned result. |
Hi all, Two questions here,
Thanks |
|
I have an index that already uses parent join, would that conflict with this feature? As far as I know that an index can have only one parent join field. |
It won't conflict with this feature. This feature does not use parent join internally. |
Is your feature request related to a problem?
Related to #675
What solution would you like?
Use Parent Join feature support to retrieve all the documents for a given query instead of using child documents resulting in fewer hits apache/lucene#12434.
The text was updated successfully, but these errors were encountered: