Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Perform dimensionality reduction on the embeddings #47

Closed
MrCsabaToth opened this issue Aug 30, 2024 · 3 comments
Closed

Perform dimensionality reduction on the embeddings #47

MrCsabaToth opened this issue Aug 30, 2024 · 3 comments
Assignees
Labels
enhancement New feature or request RAG Retrieval Augmented Generation related

Comments

@MrCsabaToth
Copy link
Member

While researching #46 I saw that since text-embedding-004 the API supports outputDimensionality reduction, it's part of the parameter section of the payload: https://cloud.google.com/vertex-ai/generative-ai/docs/model-reference/text-embeddings-api#advanced-use

{
  "instances": [
    { "content": "TEXT",
      "task_type": "TASK_TYPE",
      "title": "TITLE"
    },
  ],
  "parameters": {
    "autoTruncate": AUTO_TRUNCATE,
    "outputDimensionality": OUTPUT_DIMENSIONALITY
  }
}

According to doc the "reduction" is a simple truncation:

Used to specify output embedding size. If set, output embeddings will be truncated to the size specified.

Note that a code example suggests reduction to 256. Note also that autoTruncate is on by default.

If we go for a dimensionality reduction to 256, that would cut the storage size in 1/3rd (768 / 3 = 256) and also the retrieval processing time as well. But since this affects the accuracy and precision we'd definitely benefit from a reranking https://github.com/CsabaConsulting/InspectorGadgetApp/issues/39

Since this parameter is not available through the Gemini Dart API anyway, and probably the bandwidth aspect of the saving is not that important for us (there are sporadic requests only, however the history can pile up over time, so we'd rather benefit from the storage and processing time savings) we'd use the workaround of performing the reduction ourselves. However I think we should perform a fold instead of the truncation. This way we'd merge every three dimensions into one, potentially not losing any dimension info, however the merging would still result in precision loss, but I suspect that not as much as we just simply throw out 2/3rd of the dimensions (512).

@MrCsabaToth
Copy link
Member Author

The folding would mean a 512 floating point addition operations per embedding. Simple.

@MrCsabaToth MrCsabaToth added the RAG Retrieval Augmented Generation related label Aug 30, 2024
@MrCsabaToth
Copy link
Member Author

MrCsabaToth commented Aug 31, 2024

Thoughts about truncation vs folding vs other techniques:

Technique: Truncation Folding PCA t-SNE UMAP
Mechanism: Discards less important dimensions Combines multiple dimensions into one Linear projection onto principal components Nonlinear, preserves local structure Nonlinear, preserves local and global structure
Note: Addition, Average, or Maximum within folded groups Principal Component Analysis t-distributed Stochastic Neighbor Embedding Uniform Manifold Approximation and Projection
Advantages: Simple, computationally efficient Can preserve more information than truncation Interpretable, captures global variance, computationally efficient for moderate dimensions Excellent for visualization, captures nonlinear relationships Fast, scalable, good for visualization and clustering
Disadvantages: Potential for significant information loss, doesn't consider data structure Choice of aggregation function is crucial, less interpretable Assumes linear relationships, sensitive to scaling Computationally expensive, difficult to interpret, sensitive to hyperparameters Can be sensitive to hyperparameters, might require more tuning
Use Cases: When the first few dimensions capture most information, exploratory analysis When you have many features and want to retain some information from less important ones Feature selection, data visualization, noise reduction Data visualization, clustering, exploring high-dimensional data Similar to t-SNE, but faster and more scalable

@MrCsabaToth
Copy link
Member Author

I'm gravitating towards addition based folding instead of average or max.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RAG Retrieval Augmented Generation related
Projects
None yet
Development

No branches or pull requests

1 participant