You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I have a TEI embedding model endpoint created like this:
fromhuggingface_hubimportcreate_inference_endpointrepository="thenlper/gte-large"#"BAAI/bge-reranker-large-base"endpoint_name="gte-large-001"namespace="MoritzLaurer"# your user or organization name# check if endpoint with this name already exists from previous testsavailable_endpoints_names= [endpoint.nameforendpointinhuggingface_hub.list_inference_endpoints()]
ifendpoint_nameinavailable_endpoints_names:
endpoint_exists=Trueelse:
endpoint_exists=Falseprint("Does the endpoint already exist?", endpoint_exists)
# create new endpointifnotendpoint_exists:
endpoint=create_inference_endpoint(
endpoint_name,
repository=repository,
namespace=namespace,
framework="pytorch",
task="sentence-similarity",
# see the available hardware options here: https://huggingface.co/docs/inference-endpoints/pricing#pricingaccelerator="gpu",
vendor="aws",
region="us-east-1",
instance_size="x1",
instance_type="nvidia-a10g",
min_replica=2,
max_replica=4,
type="protected",
custom_image={
"health_route":"/health",
"env": {
"MAX_BATCH_TOKENS":"16384",
"MAX_CONCURRENT_REQUESTS":"512",
"MAX_BATCH_REQUESTS": "124",
"MODEL_ID": "/repository"},
"url":"ghcr.io/huggingface/text-embeddings-inference:latest"
}
)
print("Waiting for endpoint to be created")
endpoint.wait()
print("Endpoint ready")
# if endpoint with this name already exists, get existing endpointelse:
endpoint=huggingface_hub.get_inference_endpoint(name=endpoint_name, namespace=namespace)
ifendpoint.statusin ["paused", "scaledToZero"]:
print("Resuming endpoint")
endpoint.resume()
print("Waiting for endpoint to start")
endpoint.wait()
print("Endpoint ready")
Based on the docs here, I should be able to call it like this:
fromhuggingface_hubimportInferenceClientclient=InferenceClient()
client.sentence_similarity(
"Machine learning is so easy.",
other_sentences=[
"Deep learning is so straightforward.",
"This is so difficult, like rocket science.",
"I can't believe how much I struggled with this.",
],
model=endpoint.url
)
This results in this (hard to interpret) error message: HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://c5hhcabur7dqwyj7.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: nEd4Xz) Make sure 'sentence-similarity' task is supported by the model.
It does work when making the /similarity route from TEI explicit:
fromhuggingface_hubimportInferenceClientclient=InferenceClient()
client.sentence_similarity(
"Machine learning is so easy.",
other_sentences=[
"Deep learning is so straightforward.",
"This is so difficult, like rocket science.",
"I can't believe how much I struggled with this.",
],
model=endpoint.url+"/similarity"
)
# output: [0.9319057, 0.81048536, 0.75192505]
Seems like the route is not set correctly by the client.
Thanks for reporting these with a reproducible example @MoritzLaurer. I'm figuring out a solution to avoid this kind of problems where we don't call the correct endpoint because of difference between Inference API and Inference Endpoints (similar to #2484). Will keep you posted.
Describe the bug
I have a TEI embedding model endpoint created like this:
Based on the docs here, I should be able to call it like this:
This results in this (hard to interpret) error message:
HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://c5hhcabur7dqwyj7.us-east-1.aws.endpoints.huggingface.cloud/ (Request ID: nEd4Xz) Make sure 'sentence-similarity' task is supported by the model.
It does work when making the
/similarity
route from TEI explicit:Seems like the route is not set correctly by the client.
Reproduction
No response
Logs
No response
System info
The text was updated successfully, but these errors were encountered: