Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ChatQnA] Failed to run reranking and embedding service on H100. #442

Closed
PeterYang12 opened this issue Jul 23, 2024 · 0 comments · Fixed by #443
Closed

[ChatQnA] Failed to run reranking and embedding service on H100. #442

PeterYang12 opened this issue Jul 23, 2024 · 0 comments · Fixed by #443

Comments

@PeterYang12
Copy link
Contributor

Hi, maintainers,
I followed the README https://github.com/opea-project/GenAIExamples/tree/main/ChatQnA/docker/gpu and tried to use docker-compose to deploy ChatQnA. However, tei-reranking-server and tei-embedding-server failed. The following is the error logs:

$ docker logs tei-reranking-server
2024-07-23T04:25:26.629348Z  INFO text_embeddings_router: router/src/main.rs:140: Args { model_id: "BAA*/***-********-*ase", revision: None, tokenization_workers: None, dtype: None, pooling: None, max_concurrent_requests: 512, max_batch_tokens: 16384, max_batch_requests: None, max_client_batch_size: 32, auto_truncate: false, hf_api_token: None, hostname: "427706bc91b6", port: 80, uds_path: "/tmp/text-embeddings-inference-server", huggingface_hub_cache: Some("/data"), payload_limit: 2000000, api_key: None, json_output: false, otlp_endpoint: None, cors_allow_origin: None }
2024-07-23T04:25:26.635961Z  INFO hf_hub: /root/.cargo/git/checkouts/hf-hub-1aadb4c6e2cbe1ba/b167f69/src/lib.rs:55: Token file not found "/root/.cache/huggingface/token"
2024-07-23T04:25:28.061305Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:20: Starting download
2024-07-23T04:25:28.061547Z  INFO download_artifacts: text_embeddings_core::download: core/src/download.rs:37: Model artifacts downloaded in 254.674µs
2024-07-23T04:25:28.900779Z  WARN text_embeddings_router: router/src/lib.rs:165: Could not find a Sentence Transformers config
2024-07-23T04:25:28.900827Z  INFO text_embeddings_router: router/src/lib.rs:169: Maximum number of tokens per request: 512
2024-07-23T04:25:28.925017Z  INFO text_embeddings_core::tokenization: core/src/tokenization.rs:23: Starting 112 tokenization workers
2024-07-23T04:25:51.644147Z  INFO text_embeddings_router: router/src/lib.rs:194: Starting model backend
Error: Could not create backend

Caused by:
    Could not start backend: Runtime compute cap 90 is not compatible with compile time compute cap 80

I root caused that the image ghcr.io/huggingface/text-embeddings-inference:1.2 reranking and embedding services use is incompatible for some GPUs. For example, my GPU card H100, which is Hopper architecture, it should use image ghcr.io/huggingface/text-embeddings-inference:hopper-1.5. See compatibility in : https://github.com/huggingface/text-embeddings-inference/tree/main

I filed a PR to fix this issue. Please correct me if I am not right. Or notify me if you have a better fix.

PeterYang12 added a commit to PeterYang12/GenAIExamples that referenced this issue Jul 23, 2024
Embedding and reranking services failed to run on GPU H100.
Change the image tag and use CPU for these services. This PR will
fix opea-project#442

Signed-off-by: PeterYang12 <yuhan.yang@intel.com>
yogeshmpandey pushed a commit to hteeyeoh/GenAIExamples that referenced this issue Aug 12, 2024
…a-project#443)

Embedding and reranking services failed to run on GPU H100.
Change the image tag and use CPU for these services. This PR will
fix opea-project#442

Signed-off-by: PeterYang12 <yuhan.yang@intel.com>
wangkl2 added a commit to wangkl2/GenAIExamples that referenced this issue Dec 11, 2024
…a-project#442)

Signed-off-by: Wang, Kai Lawrence <kai.lawrence.wang@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant