-
Notifications
You must be signed in to change notification settings - Fork 71
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vectorsearch training workload #333
Conversation
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
"target_index_num_vectors": 1000, | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we remove "target_index_num_vectors" from param file?
} | ||
], | ||
"corpora": [ | ||
{ | ||
"name": "cohere", | ||
"base-url": "https://dbyiw3u3rf9yr.cloudfront.net/corpora/vectorsearch/cohere-wikipedia-22-12-en-embeddings", | ||
"target-index": "{{ target_index_name }}", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Calling out here that this target-index
param is not used anywhere in the workload, but it's necessary due to OSB validation. I'm not sure what the solution is, but I opened an issue about this.
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@finnroblin Overall, LGTM. As per best practices specified in the README, please provide a sample summary output of train-test
in the PR description.
Signed-off-by: Finn Roblin <finnrobl@amazon.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
* Add vectorsearch training workload Signed-off-by: Finn Roblin <finnrobl@amazon.com> * Addressed Vijay feedback and ignores error if model DNE Signed-off-by: Finn Roblin <finnrobl@amazon.com> * Added documentation to VS readme Signed-off-by: Finn Roblin <finnrobl@amazon.com> --------- Signed-off-by: Finn Roblin <finnrobl@amazon.com> (cherry picked from commit 29d9715) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add vectorsearch training workload Signed-off-by: Finn Roblin <finnrobl@amazon.com> * Addressed Vijay feedback and ignores error if model DNE Signed-off-by: Finn Roblin <finnrobl@amazon.com> * Added documentation to VS readme Signed-off-by: Finn Roblin <finnrobl@amazon.com> --------- Signed-off-by: Finn Roblin <finnrobl@amazon.com> (cherry picked from commit 29d9715) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add vectorsearch training workload * Addressed Vijay feedback and ignores error if model DNE * Added documentation to VS readme --------- (cherry picked from commit 29d9715) Signed-off-by: Finn Roblin <finnrobl@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* Add vectorsearch training workload * Addressed Vijay feedback and ignores error if model DNE * Added documentation to VS readme --------- (cherry picked from commit 29d9715) Signed-off-by: Finn Roblin <finnrobl@amazon.com> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Description
Adds the
train-test
vectorsearch workload to benchmark kNN operations that require training like faiss ivf. Please see issue #332 for context.This PR adds a schedule to train kNN algorithms using the train-knn-model operation proposal in OSB PR 556. It depends on the operation runners in that PR. It also requires an additional index in the vectorsearch
workload.json
to hold training data.The
train-test
workload on my branch works on the faiss-sift-128 dataset without breaking backwards compatibility with other vectorsearch workloads. Please feel free to clone my forks (OSB, OSB Workload) to investigate workload behavior, as there are not unit tests in the OSB workloads framework.Issues Resolved
Closes #332
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.
Sample Output: