Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

community: Fix FastEmbedEmbeddings #24462

Merged
merged 7 commits into from
Jul 30, 2024
Merged

Conversation

Anush008
Copy link
Contributor

@Anush008 Anush008 commented Jul 20, 2024

Description

This PR:

  • Fixes the validation error in FastEmbedEmbeddings.
  • Adds support for batch_size, parallel params.
  • Removes support for very old FastEmbed versions.
  • Updates the FastEmbed doc with the new params.

Associated Issues:

Copy link

vercel bot commented Jul 20, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
langchain ✅ Ready (Inspect) Visit Preview 💬 Add feedback Jul 30, 2024 4:33pm

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. community Related to langchain-community Ɑ: embeddings Related to text embedding models module 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder 🤖:improvement Medium size change to existing code to handle new use-cases labels Jul 20, 2024
@Anush008 Anush008 marked this pull request as draft July 20, 2024 04:42
@Anush008 Anush008 marked this pull request as ready for review July 20, 2024 04:50
@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Jul 20, 2024
@Anush008
Copy link
Contributor Author

cc @eyurtsev

Copy link
Collaborator

@ccurme ccurme left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FastEmbed 0.1 was current in January. Are we sure we want to drop support for it? Should we continue to handle both < 0.2 and >= 0.2 separately?

@@ -48,12 +52,24 @@ class FastEmbedEmbeddings(BaseModel, Embeddings):
The available options are: "default" and "passage"
"""

batch_size: int = 256
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(nit) If FastEmbed default changes, this will be out of date. Would prefer to have this Optional and not pass in in the None case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this could also be applicable to model_name, max_length.
Leading to to quite some duplicated code due to the conditionals.

So I propose to keep it. WDYT?

@Anush008
Copy link
Contributor Author

FastEmbed 0.1 was current in January. Are we sure we want to drop support for it?

There have been several improvements during the time. We highly recommend to move to the latest.

So yes, dropping would be fine.

@dosubot dosubot bot added the lgtm PR looks good. Use to confirm that a PR is ready for merging. label Jul 30, 2024
@ccurme ccurme merged commit 51b1544 into langchain-ai:master Jul 30, 2024
45 checks passed
@Anush008 Anush008 deleted the fastembed-fix branch July 30, 2024 16:43
olgamurraft pushed a commit to olgamurraft/langchain that referenced this pull request Aug 16, 2024
## Description

This PR:
- Fixes the validation error in `FastEmbedEmbeddings`.
- Adds support for `batch_size`, `parallel` params.
- Removes support for very old FastEmbed versions.
- Updates the FastEmbed doc with the new params.

Associated Issues:
- Resolves langchain-ai#24039
- Resolves #qdrant/fastembed#296
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature community Related to langchain-community 🤖:docs Changes to documentation and examples, like .md, .rst, .ipynb files. Changes to the docs/ folder Ɑ: embeddings Related to text embedding models module 🤖:improvement Medium size change to existing code to handle new use-cases lgtm PR looks good. Use to confirm that a PR is ready for merging. size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validation error for FastEmbedEmbeddings - extra fields not permitted
2 participants