-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Perform dimensionality reduction on the embeddings #47
Labels
Comments
The folding would mean a 512 floating point addition operations per embedding. Simple. |
Thoughts about truncation vs folding vs other techniques:
|
I'm gravitating towards addition based folding instead of average or max. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
While researching #46 I saw that since
text-embedding-004
the API supportsoutputDimensionality
reduction, it's part of the parameter section of the payload: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#advanced-useAccording to doc the "reduction" is a simple truncation:
Note that a code example suggests reduction to 256. Note also that
autoTruncate
is on by default.If we go for a dimensionality reduction to 256, that would cut the storage size in 1/3rd (768 / 3 = 256) and also the retrieval processing time as well. But since this affects the accuracy and precision we'd definitely benefit from a reranking https://github.com/CsabaConsulting/InspectorGadgetApp/issues/39
Since this parameter is not available through the Gemini Dart API anyway, and probably the bandwidth aspect of the saving is not that important for us (there are sporadic requests only, however the history can pile up over time, so we'd rather benefit from the storage and processing time savings) we'd use the workaround of performing the reduction ourselves. However I think we should perform a fold instead of the truncation. This way we'd merge every three dimensions into one, potentially not losing any dimension info, however the merging would still result in precision loss, but I suspect that not as much as we just simply throw out 2/3rd of the dimensions (512).
The text was updated successfully, but these errors were encountered: