You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
$ docker logs tei-reranking-server
2024-07-23T04:25:26.629348Z INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, hf_api_token: None, hostname: "427706bc91b6", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-23T04:25:26.635961Z INFO hf_hub: /root/.cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-23T04:25:28.061305Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-23T04:25:28.061547Z INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 254.674µs
2024-07-23T04:25:28.900779Z WARN text_embeddings_router: router/src/lib.rs:165: Could not find a Sentence Transformers config
2024-07-23T04:25:28.900827Z INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-23T04:25:28.925017Z INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-23T04:25:51.644147Z INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
Error: Could not create backend
Caused by:
Could not start backend: Runtime compute cap 90 is not compatible with compile time compute cap 80
I root caused that the image ghcr.io/huggingface/text-embeddings-inference:1.2 reranking and embedding services use is incompatible for some GPUs. For example, my GPU card H100, which is Hopper architecture, it should use image ghcr.io/huggingface/text-embeddings-inference:hopper-1.5. See compatibility in : https://github.com/huggingface/text-embeddings-inference/tree/main
I filed a PR to fix this issue. Please correct me if I am not right. Or notify me if you have a better fix.
The text was updated successfully, but these errors were encountered:
PeterYang12
added a commit
to PeterYang12/GenAIExamples
that referenced
this issue
Jul 23, 2024
Embedding and reranking services failed to run on GPU H100.
Change the image tag and use CPU for these services. This PR will
fixopea-project#442
Signed-off-by: PeterYang12 <yuhan.yang@intel.com>
…a-project#443)
Embedding and reranking services failed to run on GPU H100.
Change the image tag and use CPU for these services. This PR will
fixopea-project#442
Signed-off-by: PeterYang12 <yuhan.yang@intel.com>
wangkl2
added a commit
to wangkl2/GenAIExamples
that referenced
this issue
Dec 11, 2024
Hi, maintainers,
I followed the README https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gpu and tried to use docker-compose to deploy ChatQnA. However, tei-reranking-server and tei-embedding-server failed. The following is the error logs:
I root caused that the image
ghcr.io/huggingface/text-embeddings-inference:1.2
reranking and embedding services use is incompatible for some GPUs. For example, my GPU card H100, which is Hopper architecture, it should use imageghcr.io/huggingface/text-embeddings-inference:hopper-1.5
. See compatibility in : https://github.com/huggingface/text-embeddings-inference/tree/mainI filed a PR to fix this issue. Please correct me if I am not right. Or notify me if you have a better fix.
The text was updated successfully, but these errors were encountered: