Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add embedding support api #528

Merged
merged 38 commits into from
Sep 22, 2023
Merged

Add embedding support api #528

merged 38 commits into from
Sep 22, 2023

Conversation

maheshrajamani
Copy link
Contributor

@maheshrajamani maheshrajamani commented Sep 15, 2023

What this PR does:

Support for vectorize method
Fixed the update operation for vector data

Which issue(s) this PR fixes:
Fixes #524

Checklist

  • Changes manually tested
  • Automated Tests added/updated
  • Documentation added/updated
  • CLA Signed: DataStax CLA

@maheshrajamani maheshrajamani requested a review from a team as a code owner September 15, 2023 14:26
@maheshrajamani maheshrajamani self-assigned this Sep 15, 2023
@@ -15,17 +15,23 @@ public record CreateCollectionOperation(
String name,
boolean vectorSearch,
int vectorSize,
String vectorFunction)
String vectorFunction,
String vectorize)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this a JSON serialization of Vectorize service configuration?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the value is stored part of table's comment.

@@ -66,6 +72,9 @@ protected QueryOuterClass.Query getCreateTable(String keyspace, String table) {
+ vectorSize
+ ">, "
+ " PRIMARY KEY (key))";
if (vectorize != null) {
createTableWithVector = createTableWithVector + " WITH comment = '" + vectorize + "'";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it matters, but if vectorize is serialized JSON it could still have characters that require escaping (esp. apostrophes themselves). Although I guess those would lead to CQL query failure, not corrupted data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what kind of characters to handle since this is just a text value, If needed I will add permitted character for the vectorize options.

Copy link
Contributor

@tatu-at-datastax tatu-at-datastax left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, looks good overall -- I added (quite) a few suggestions; feel free to use or ignore whatever makes (or doesn't make) sense.

{
"findOneAndUpdate": {
"filter" : {"_id" : "id"},
"update" : {"$setOnInsert" : {"$vectorize" : "New York"}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the difference between this $setOnInsert and upper $set

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When upsert option is set to true during update operation, we will create new document if no document found based on condition. setOnInsert will work when this a new document.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

$setOnInsert is only applied in case a new document is inserted, but not if one exists.
Used for things like adding id field(s), or creation-time.

@maheshrajamani maheshrajamani merged commit fd7479e into main Sep 22, 2023
3 checks passed
@maheshrajamani maheshrajamani deleted the add-embedding-support-api branch September 22, 2023 18:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Vectorize] Implement embedding service call
4 participants