[ChatQnA] Failed to run reranking and embedding service on H100. #442

PeterYang12 · 2024-07-23T06:36:15Z

Hi, maintainers,
I followed the README https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gpu and tried to use docker-compose to deploy ChatQnA. However, tei-reranking-server and tei-embedding-server failed. The following is the error logs:

$ docker logs tei-reranking-server
2024-07-23T04:25:26.629348Z  INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, hf_api_token: None, hostname: "427706bc91b6", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-23T04:25:26.635961Z  INFO hf_hub: /root/.cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-23T04:25:28.061305Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-23T04:25:28.061547Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 254.674µs
2024-07-23T04:25:28.900779Z  WARN text_embeddings_router: router/src/lib.rs:165: Could not find a Sentence Transformers config
2024-07-23T04:25:28.900827Z  INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-23T04:25:28.925017Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-23T04:25:51.644147Z  INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Runtime compute cap 90 is not compatible with compile time compute cap 80

I root caused that the image ghcr.io/huggingface/text-embeddings-inference:1.2 reranking and embedding services use is incompatible for some GPUs. For example, my GPU card H100, which is Hopper architecture, it should use image ghcr.io/huggingface/text-embeddings-inference:hopper-1.5. See compatibility in : https://github.com/huggingface/text-embeddings-inference/tree/main

I filed a PR to fix this issue. Please correct me if I am not right. Or notify me if you have a better fix.

The text was updated successfully, but these errors were encountered:

Embedding and reranking services failed to run on GPU H100. Change the image tag and use CPU for these services. This PR will fix opea-project#442 Signed-off-by: PeterYang12 <yuhan.yang@intel.com>

…a-project#443) Embedding and reranking services failed to run on GPU H100. Change the image tag and use CPU for these services. This PR will fix opea-project#442 Signed-off-by: PeterYang12 <yuhan.yang@intel.com>

…a-project#442) Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

PeterYang12 mentioned this issue Jul 23, 2024

Fix a minor bug for chatqna in docker-compose (#442) #443

Merged

4 tasks

hshen14 closed this as completed in #443 Jul 24, 2024

hshen14 closed this as completed in b46ae8b Jul 24, 2024

wangkl2 added a commit to wangkl2/GenAIExamples that referenced this issue Dec 11, 2024

Refine the instructions to run the retriever example with qdrant (ope…

eb51018

…a-project#442) Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ChatQnA] Failed to run reranking and embedding service on H100. #442

[ChatQnA] Failed to run reranking and embedding service on H100. #442

PeterYang12 commented Jul 23, 2024

[ChatQnA] Failed to run reranking and embedding service on H100. #442

[ChatQnA] Failed to run reranking and embedding service on H100. #442

Comments

PeterYang12 commented Jul 23, 2024