Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for knn_vector property type #524

Merged
merged 14 commits into from
Jun 20, 2023
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ Inspired from [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
## [Unreleased 2.x]

### Added
- Add support for knn_vector field type ([#529](https://github.com/opensearch-project/opensearch-java/pull/524))

### Dependencies

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ public enum FieldType implements JsonEnum {

DenseVector("dense_vector"),
maltehedderich marked this conversation as resolved.
Show resolved Hide resolved

KnnVector("knn_vector"),
Copy link
Collaborator

@reta reta Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is not OpenSearch core field type, this is provided by the plugin (https://github.com/opensearch-project/k-NN), afaik we do not (and probably should not) make it a part of the standard Java client but part of the k-NN plugin APIs, @VachaShah @dblock what do you think?

Copy link
Member

@dblock dblock Jun 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right. This effectively creates a chicken/egg problem, which is why plugins are "awesome" :)

@maltehedderich Is there a mechanism that lets us add a k-nn high level REST client JAR to the k-nn plugin with this code? or maybe it's a waste of time and it makes more sense to add support for k-nn to the new transport that doesn't have a dependency on core? See #326 and #262

On the other hand, in other clients we are adding plugin code into the client. We just namespace it clearly, so is there a way to namespace all k-nn code here under something.plugins.k-nn?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the quick replies.

While namespaces probably work well for most plugins, this is difficult in the case of the KNN plugin due to its tight integration with core components.

The naming for both cluster and index-specific settings makes it difficult to identify them as plugin settings. As a user, if KNN settings remain under index settings, I would expect to be able to modify and read them through the Index Settings functionality in the Java client.

Moreover, I'm not aware of other plugins introducing custom field types. I think many users who use the OpenSearch KNN functionalities also use other field types (those who want a pure vector database have many alternatives). So it would be important to be able to define the mappings together with your other mappings.

I believe that in order to solve this cleanly we would either have to incorporate the functionality of the KNN plugin into the OpenSearch core or fundamentally consider how to deal with extensions to field types and index settings.
Personally, I think integrating vector search capabilities into the OpenSearch core is the best solution. However, if this is not in line with your product vision I am also willing to spend time on alternative solutions, we should just clarify in advance where we really want to go.

As for the namespacing solution, it's achievable but would necessitate several class extensions and potentially affect usability. Regarding incorporating this into the new Transport Client, I'm unsure how this would look like and would need more information about your vision for that integration.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While namespaces probably work well for most plugins, this is difficult in the case of the KNN plugin due to its tight integration with core components.

Most if not all plugins have tight integration with core components

Moreover, I'm not aware of other plugins introducing custom field types.

There are many, even within OpenSearch core plugins, fe mapper-murmur3 introduces murmur3 custom field type.

I believe that in order to solve this cleanly we would either have to incorporate the functionality of the KNN plugin into the OpenSearch core or fundamentally consider how to deal with extensions to field types and index settings.

I think more broadly we need an extensible mechanism to support plugin specific APIs in the Java client, we have talked about that in the past in scope of different issues (#377, #262) but have not conclude anything on the subject.

As for the namespacing solution, it's achievable but would necessitate several class extensions and potentially affect usability. Regarding incorporating this into the new Transport Client, I'm unsure how this would look like and would need more information about your vision for that integration.

We need to get back to the extensibility part of OpenSearch Java client. The major drawback I see is that by adding k-NN plugin to the core client we put all other plugins in unfair and disadvantageous position, I think we need to work towards long term goal instead of taking the simple path of baking everything in.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are many, even within OpenSearch core plugins, fe mapper-murmur3 introduces murmur3 custom field type.

How is "core plugin" defined? I am only aware of the distinction between "Bundled Plugins" and "Additional Plugins" (according to installing plugins). Unlike murmur3, k-NN belongs to the Bundled Plugins and should therefore be present in the vast majority of OpenSearch deployments.

I think more broadly we need an extensible mechanism to support plugin specific APIs in the Java client, we have talked about that in the past in scope of different issues (#377, #262) but have not conclude anything on the subject.

The discussions sound to me less like extensibility and more like the creation of a workaround through a kind of low level client. This could then of course be used as a basis for plugin specific libraries, but would itself not make the features of the Java client (typing, builders etc.) usable for e.g. users of the k-NN plugin.
According to my understanding of extensibility, there would need to be a standardized way to extend the components of the Java client with plugin specific types and functionalities. So that mappings, index settings and co can all be defined together via the builder without the need for custom HTTP requests.

We need to get back to the extensibility part of OpenSearch Java client. The major drawback I see is that by adding k-NN plugin to the core client we put all other plugins in unfair and disadvantageous position, I think we need to work towards long term goal instead of taking the simple path of baking everything in.

The distinction between "bundled plugins" and "additional plugins" already puts all "additional plugins" at a disadvantage. In addition, I think it is questionable how much of the "plugin character" is left if they are shipped with all but the minimal distributions.

As you said, there are already several ideas and discussions about this topic, not only here in the Java Client repo. In my opinion it is important that we come to a holistic decision how to deal with "bundled" and "additional" plugins within the clients. How do you think we can reach such a decision most effectively?

The rough concepts I have seen for this so far are:

  1. Integration into core client libraries with appropriate namespacing
  2. Generic low level clients provided by the core client libraries
  3. Development of plugin-specific clients which may build up on the core clients

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @maltehedderich

How is "core plugin" defined? I am only aware of the distinction between "Bundled Plugins" and "Additional Plugins" (according to installing plugins).

Sorry, I should have been more clear, your distinction is very correct, by core plugins I meant the plugins included in OpenSearch repository and bundled with it.

The distinction between "bundled plugins" and "additional plugins" already puts all "additional plugins" at a disadvantage.

There should be no distinction, I just provided an example of field type added in the plugin (since you mentioned you were not aware of), no emphasize on core - it just happened to be a plugin we bundle with OpenSearch.

The rough concepts I have seen for this so far are:

  1. Integration into core client libraries with appropriate namespacing
  2. Generic low level clients provided by the core client libraries
  3. Development of plugin-specific clients which may build up on the core clients

The solution (as I see) it is the combination of plugin specific APIs that could be plugged into OpenSearch Java client. The very rough idea of how it could look like (it is not formalized anywhere):

public class OpenSearchClient extends ApiClient<OpenSearchTransport, OpenSearchClient> {
	public <PluginClient> PluginClient plugin(Class<PluginClient> plugin) {
		....
	}
}
class OpenSearchKnnPluginClient {
  ....
}

But it does not include plugin specific mapping types, index settings, query builders, etc. , I think we could work it through by taking the k-NN plugin as the example and formalize the approach.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my current perspective, I think we definitely want something like that, but (as you already said) it only solves the problem with plugin specific APIs. However, in the case of the k-NN plugin, the standard APIs are used almost exclusively (except for training custom models?).

Nevertheless, I can understand your concerns about including plugin-specific code in the core library and will be happy to participate in possible solutions.

;

private final String jsonValue;
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,229 @@
/*
* SPDX-License-Identifier: Apache-2.0
*
* The OpenSearch Contributors require contributions made to
* this file be licensed under the Apache-2.0 license or a
* compatible open source license.
*/

/*
maltehedderich marked this conversation as resolved.
Show resolved Hide resolved
* Licensed to Elasticsearch B.V. under one or more contributor
* license agreements. See the NOTICE file distributed with
* this work for additional information regarding copyright
* ownership. Elasticsearch B.V. licenses this file to you under
* the Apache License, Version 2.0 (the "License"); you may
* not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing,
* software distributed under the License is distributed on an
* "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
* KIND, either express or implied. See the License for the
* specific language governing permissions and limitations
* under the License.
*/

/*
* Modifications Copyright OpenSearch Contributors. See
* GitHub history for details.
*/

package org.opensearch.client.opensearch._types.mapping;

import java.util.Map;
import java.util.function.Function;

import javax.annotation.Nullable;

import org.opensearch.client.json.JsonData;
import org.opensearch.client.json.JsonpDeserializable;
import org.opensearch.client.json.JsonpDeserializer;
import org.opensearch.client.json.JsonpMapper;
import org.opensearch.client.json.JsonpSerializable;
import org.opensearch.client.json.ObjectBuilderDeserializer;
import org.opensearch.client.json.ObjectDeserializer;
import org.opensearch.client.util.ApiTypeHelper;
import org.opensearch.client.util.ObjectBuilder;
import org.opensearch.client.util.ObjectBuilderBase;

import jakarta.json.stream.JsonGenerator;

// typedef: _types.mapping.KnnVectorMethod

@JsonpDeserializable
public class KnnVectorMethod implements JsonpSerializable {
/**
* Builder for {@link KnnVectorMethod}.
*/

public static class Builder extends ObjectBuilderBase implements ObjectBuilder<KnnVectorMethod> {
private String name;

@Nullable
private String spaceType;

@Nullable
private String engine;

@Nullable
private Map<String, JsonData> parameters;

/**
* Required - API name: {@code name}
*/
public final Builder name(String value) {
this.name = value;
return this;
}

/**
* API name: {@code space_type}
*/
public final Builder spaceType(@Nullable String value) {
this.spaceType = value;
return this;
}

/**
* API name: {@code engine}
*/
public final Builder engine(@Nullable String value) {
this.engine = value;
return this;
}

/**
* API name: {@code parameters}
*/
public final Builder parameters(@Nullable Map<String, JsonData> map) {
this.parameters = _mapPutAll(this.parameters, map);
return this;
}

/**
* Builds a {@link KnnVectorMethod}.
*
* @throws NullPointerException
* if some of the required fields are null.
*/
public KnnVectorMethod build() {
_checkSingleUse();

return new KnnVectorMethod(this);
}
}

/**
* Json deserializer for {@link KnnVectorMethod}
*/
public static final JsonpDeserializer<KnnVectorMethod> _DESERIALIZER = ObjectBuilderDeserializer
.lazy(Builder::new, KnnVectorMethod::setupKnnVectorMethodDeserializer);

public static KnnVectorMethod of(Function<Builder, ObjectBuilder<KnnVectorMethod>> fn) {
return fn.apply(new Builder()).build();
}

protected static void setupKnnVectorMethodDeserializer(
ObjectDeserializer<KnnVectorMethod.Builder> op) {

op.add(Builder::name, JsonpDeserializer.stringDeserializer(), "name");
op.add(Builder::spaceType, JsonpDeserializer.stringDeserializer(), "space_type");
op.add(Builder::engine, JsonpDeserializer.stringDeserializer(), "engine");
op.add(Builder::parameters, JsonpDeserializer.stringMapDeserializer(JsonData._DESERIALIZER), "parameters");

}

// ---------------------------------------------------------------------------------------------

private final String name;

@Nullable
private final String spaceType;

@Nullable
private final String engine;

@Nullable
private final Map<String, JsonData> parameters;

private KnnVectorMethod(Builder builder) {

this.name = ApiTypeHelper.requireNonNull(builder.name, this, "name");
this.spaceType = builder.spaceType;
this.engine = builder.engine;
this.parameters = builder.parameters;

}

/**
* Required - API name: {@code name}
*/
public final String name() {
return this.name;
}

/**
* API name: {@code space_type}
*/
@Nullable
public final String spaceType() {
return this.spaceType;
}

/**
* API name: {@code engine}
*/
@Nullable
public final String engine() {
return this.engine;
}

// ---------------------------------------------------------------------------------------------

/**
* API name: {@code parameters}
*/
@Nullable
public final Map<String, JsonData> parameters() {
return this.parameters;
}

// ---------------------------------------------------------------------------------------------

/**
* Serialize this object to JSON.
*/
public void serialize(JsonGenerator generator, JsonpMapper mapper) {
generator.writeStartObject();
serializeInternal(generator, mapper);
generator.writeEnd();
}

protected void serializeInternal(JsonGenerator generator, JsonpMapper mapper) {

generator.writeKey("name");
generator.write(this.name);

if (this.spaceType != null) {
generator.writeKey("space_type");
generator.write(this.spaceType);
}

if (this.engine != null) {
generator.writeKey("engine");
generator.write(this.engine);
}

if (this.parameters != null) {
generator.writeKey("parameters");
for (Map.Entry<String, JsonData> item0 : this.parameters.entrySet()) {
generator.writeKey(item0.getKey());
item0.getValue().serialize(generator, mapper);
}
}

}

}
Loading