Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for knn_vector property type #524
Add support for knn_vector property type #524
Changes from 6 commits
de8678e
bfd5286
46c8f96
4092963
deeb591
43bbdbf
23934a8
4ed819a
c9e8335
4349749
84cd2ee
b72dfa3
51cf5c1
d24dbee
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is not OpenSearch core field type, this is provided by the plugin (https://github.com/opensearch-project/k-NN), afaik we do not (and probably should not) make it a part of the standard Java client but part of the k-NN plugin APIs, @VachaShah @dblock what do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. This effectively creates a chicken/egg problem, which is why plugins are "awesome" :)
@maltehedderich Is there a mechanism that lets us add a k-nn high level REST client JAR to the k-nn plugin with this code? or maybe it's a waste of time and it makes more sense to add support for k-nn to the new transport that doesn't have a dependency on core? See #326 and #262
On the other hand, in other clients we are adding plugin code into the client. We just namespace it clearly, so is there a way to namespace all k-nn code here under
something.plugins.k-nn
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick replies.
While namespaces probably work well for most plugins, this is difficult in the case of the KNN plugin due to its tight integration with core components.
The naming for both cluster and index-specific settings makes it difficult to identify them as plugin settings. As a user, if KNN settings remain under index settings, I would expect to be able to modify and read them through the Index Settings functionality in the Java client.
Moreover, I'm not aware of other plugins introducing custom field types. I think many users who use the OpenSearch KNN functionalities also use other field types (those who want a pure vector database have many alternatives). So it would be important to be able to define the mappings together with your other mappings.
I believe that in order to solve this cleanly we would either have to incorporate the functionality of the KNN plugin into the OpenSearch core or fundamentally consider how to deal with extensions to field types and index settings.
Personally, I think integrating vector search capabilities into the OpenSearch core is the best solution. However, if this is not in line with your product vision I am also willing to spend time on alternative solutions, we should just clarify in advance where we really want to go.
As for the namespacing solution, it's achievable but would necessitate several class extensions and potentially affect usability. Regarding incorporating this into the new Transport Client, I'm unsure how this would look like and would need more information about your vision for that integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most if not all plugins have tight integration with core components
There are many, even within OpenSearch core plugins, fe
mapper-murmur3
introducesmurmur3
custom field type.I think more broadly we need an extensible mechanism to support plugin specific APIs in the Java client, we have talked about that in the past in scope of different issues (#377, #262) but have not conclude anything on the subject.
We need to get back to the extensibility part of OpenSearch Java client. The major drawback I see is that by adding
k-NN
plugin to the core client we put all other plugins in unfair and disadvantageous position, I think we need to work towards long term goal instead of taking the simple path of baking everything in.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How is "core plugin" defined? I am only aware of the distinction between "Bundled Plugins" and "Additional Plugins" (according to installing plugins). Unlike murmur3, k-NN belongs to the Bundled Plugins and should therefore be present in the vast majority of OpenSearch deployments.
The discussions sound to me less like extensibility and more like the creation of a workaround through a kind of low level client. This could then of course be used as a basis for plugin specific libraries, but would itself not make the features of the Java client (typing, builders etc.) usable for e.g. users of the k-NN plugin.
According to my understanding of extensibility, there would need to be a standardized way to extend the components of the Java client with plugin specific types and functionalities. So that mappings, index settings and co can all be defined together via the builder without the need for custom HTTP requests.
The distinction between "bundled plugins" and "additional plugins" already puts all "additional plugins" at a disadvantage. In addition, I think it is questionable how much of the "plugin character" is left if they are shipped with all but the minimal distributions.
As you said, there are already several ideas and discussions about this topic, not only here in the Java Client repo. In my opinion it is important that we come to a holistic decision how to deal with "bundled" and "additional" plugins within the clients. How do you think we can reach such a decision most effectively?
The rough concepts I have seen for this so far are:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @maltehedderich
Sorry, I should have been more clear, your distinction is very correct, by core plugins I meant the plugins included in OpenSearch repository and bundled with it.
There should be no distinction, I just provided an example of field type added in the plugin (since you mentioned you were not aware of), no emphasize on core - it just happened to be a plugin we bundle with OpenSearch.
The solution (as I see) it is the combination of plugin specific APIs that could be plugged into OpenSearch Java client. The very rough idea of how it could look like (it is not formalized anywhere):
But it does not include plugin specific mapping types, index settings, query builders, etc. , I think we could work it through by taking the
k-NN
plugin as the example and formalize the approach.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From my current perspective, I think we definitely want something like that, but (as you already said) it only solves the problem with plugin specific APIs. However, in the case of the k-NN plugin, the standard APIs are used almost exclusively (except for training custom models?).
Nevertheless, I can understand your concerns about including plugin-specific code in the core library and will be happy to participate in possible solutions.