-
Notifications
You must be signed in to change notification settings - Fork 16
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vectorization on demand #1258
vectorization on demand #1258
Conversation
# Conflicts: # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/FindOneAndReplaceCommandResolver.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/FindOneAndUpdateCommandResolver.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/UpdateManyCommandResolver.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/UpdateOneCommandResolver.java
src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizer.java
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizerService.java
Outdated
Show resolved
Hide resolved
...a/io/stargate/sgv2/jsonapi/service/resolver/model/impl/FindOneAndReplaceCommandResolver.java
Outdated
Show resolved
Hide resolved
# Conflicts: # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizerService.java # src/test/java/io/stargate/sgv2/jsonapi/service/embedding/operation/DataVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/CommandResolverWithVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/UpdateManyCommandResolverTest.java
# Conflicts: # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizerService.java # src/test/java/io/stargate/sgv2/jsonapi/service/embedding/operation/DataVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/CommandResolverWithVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/UpdateManyCommandResolverTest.java
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 👍
# Conflicts: # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizer.java # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizerService.java # src/test/java/io/stargate/sgv2/jsonapi/service/embedding/operation/DataVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/CommandResolverWithVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/model/impl/UpdateManyCommandResolverTest.java
src/main/java/io/stargate/sgv2/jsonapi/api/model/command/clause/update/SetOperation.java
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
went through, couple of improvements but also 2 bugs
src/main/java/io/stargate/sgv2/jsonapi/api/model/command/clause/update/UpdateOperation.java
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/api/model/command/clause/update/SetOperation.java
Outdated
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/updater/DocumentUpdater.java
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/updater/DocumentUpdater.java
Outdated
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/api/model/command/clause/update/SetOperation.java
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/updater/DocumentUpdater.java
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/updater/DocumentUpdater.java
Outdated
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/updater/DocumentUpdater.java
Outdated
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/operation/model/impl/ReadAndUpdateOperation.java
Outdated
Show resolved
Hide resolved
src/main/java/io/stargate/sgv2/jsonapi/service/operation/model/impl/ReadAndUpdateOperation.java
Outdated
Show resolved
Hide resolved
# Conflicts: # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizer.java # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizerService.java # src/main/java/io/stargate/sgv2/jsonapi/service/operation/collections/ReadAndUpdateCollectionOperation.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/FindOneAndReplaceCommandResolver.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/FindOneAndUpdateCommandResolver.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/UpdateManyCommandResolver.java # src/main/java/io/stargate/sgv2/jsonapi/service/resolver/UpdateOneCommandResolver.java # src/test/java/io/stargate/sgv2/jsonapi/service/operation/collections/ReadAndUpdateCollectionOperationRetryTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/operation/collections/ReadAndUpdateCollectionOperationTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/operation/collections/SerialConsistencyOverrideOperationTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/CommandResolverWithVectorizerTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/FindOneAndReplaceCommandResolverTest.java # src/test/java/io/stargate/sgv2/jsonapi/service/resolver/UpdateManyCommandResolverTest.java
dataVectorizerService.constructDataVectorizer(dataApiRequestInfo, commandContext); | ||
// TODO: only SetOperation and Replacement may create one embeddingUpdateOperation, Refactor | ||
// when there are multiple | ||
final EmbeddingUpdateOperation embeddingUpdateOperation = embeddingUpdateOperations.get(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why pick it as get(0)
when we have a list? I understand we support only one vector currently. Would be better to use Multi and merge here, so we don't have to worry about this code in case of multiple vectorize
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved
// when there are multiple | ||
final EmbeddingUpdateOperation embeddingUpdateOperation = embeddingUpdateOperations.get(0); | ||
return dataVectorizer | ||
.vectorize(embeddingUpdateOperation.vectorizeContent()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this method is called multiple times if the update needs to update on multiple documents in case of UpdateMany. Let's lazy cache the vector for inside the EmbeddingUpdateOperation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part Aaron and I discussed, and decide not do the cache here, possibly another improvemtn
understand that what you did in DataVectorizer previously can avoid this problem
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will create a ticket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
...io/stargate/sgv2/jsonapi/service/operation/collections/ReadAndUpdateCollectionOperation.java
Show resolved
Hide resolved
@@ -110,4 +110,7 @@ public int compare(ActionWithLocator o1, ActionWithLocator o2) { | |||
return o1.path().compareTo(o2.path()); | |||
} | |||
} | |||
|
|||
public record UpdateOperationResult( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved
@@ -30,7 +30,7 @@ public List<A> actions() { | |||
* @param doc Document to apply operation to | |||
* @return True if document was modified by operation; false if not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Update the java doc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Solved
@@ -227,77 +223,6 @@ public void dynamicFilterCondition() throws Exception { | |||
}); | |||
} | |||
|
|||
@Test | |||
public void dynamicFilterConditionSetVectorize() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing the test can we modify it to show how the data would look like? Will be helpful when changes are done around it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tests in the DocumentUpdaterTest class. Should be clear to see.
Reason of deleting this one, is because we no longer do updateClause vectorize.
Instead, we vectorize the document if found. Here in UpdateManyCommandResolverTest.java, we don't actually mimic document returned by DB.
@@ -420,183 +413,6 @@ public void findOneAndDelete() throws Exception { | |||
}); | |||
} | |||
|
|||
@Test | |||
public void findOneAndReplace() throws Exception { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of removing can we change the test to show how the object look like when $vectorize
is used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same as above.
# Conflicts: # src/main/java/io/stargate/sgv2/jsonapi/service/embedding/DataVectorizer.java # src/main/java/io/stargate/sgv2/jsonapi/service/updater/DocumentUpdater.java
For these update commands: findOneAndUpdate, updateOne, updateMany, findOneAndReplace
We need to vectorize the update clause or replacement document as needed, that is when:
OR
Since the pre-requisite for vectorization on demand is findOperation's execution., So we can not continue to do vectorization for these four update commands at commandResolver level.Postpone the vectorization in ReadAndUpdateOperation, utilize the documentUpdater, only vectorize as needed.
As you can see, in the DataVectorizer, there will be no more vectorizeUpdateClause, since we postpone the vectorization for updateCommand in operation level. The new refactored documentUpdater will return a updateResponse that contains a possible UpdateEmbeddingOperation. And we can update the $vector after apply vectorization to $vectorize.
Checklist