-
Notifications
You must be signed in to change notification settings - Fork 123
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pass kwargs to encoder #482
Comments
Matryoshka embeddings is handled in another issue. Please comment in #476 Due to the dynamic batching nature the models such as Jina need to handle the prompt template on instance level. Sentence transformer implements this on per batch level. In infinity, batch level likely/might have over multiple tenants. |
Ok my bad, thanks for the explanations. I was under the impression that similarly to what has been done in vLLM we could pass custom args to the encode method for each request. So this means that if I want to use model like jina on specific task such as model = SentenceTransformer("jinaai/jina-embeddings-v3", trust_remote_code=True, model_kwargs={'default_task': 'retrieval.query'}) I need to find a way to pass it through the Thanks for the hard work on this lib! |
Hi, looking into this as well. Could is possibly done by baking/hard coding into the model when passing it in rather than through kwargs? Thanks for all the hard work on the lib, it's a pleasure to use. Hoping to pick up the Matryoshka embeddings issue. |
@s04 What do you mean by hard-coding? Currently it uses the default template / no template. |
Feature request
Models like https://huggingface.co/BAAI/bge-m3 and https://huggingface.co/jinaai/jina-embeddings-v3 can take extras kwargs as input of the
encode
function such astask=...
for Jina v3 orreturn_dense=False/True
for bge-m3It would be great if we could pass these kwargs either when using the async engine via the Python API
engine.embed(sentences=[...], additional_args=**kwargs)
or when we are sending requests to an endpoint create using your docker image
r = requests.post("http://0.0.0.0:7997/embeddings", json={"model":"test_model","input":["Two cute cats."], "task": "text-matching"})
Motivation
This would could also be used to handle
truncate_dim
for Matryoshka embeddings.might be linked to: #476
Your contribution
I could try to implement it on my free time but I do not have much currently plus I'm still navigating the code. Any pointers at where to start are welcome.
The text was updated successfully, but these errors were encountered: