Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenAIEmbeddings causes CUDA bug #27266

Open
5 tasks done
pengfeihe2024 opened this issue Oct 11, 2024 · 2 comments
Open
5 tasks done

OpenAIEmbeddings causes CUDA bug #27266

pengfeihe2024 opened this issue Oct 11, 2024 · 2 comments
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed

Comments

@pengfeihe2024
Copy link

pengfeihe2024 commented Oct 11, 2024

Checked other resources

  • I added a very descriptive title to this issue.
  • I searched the LangChain documentation with the integrated search.
  • I used the GitHub search to find a similar question and didn't find it.
  • I am sure that this is a bug in LangChain rather than my code.
  • The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).

Example Code

Working code

from openai import OpenAI
from langchain_openai import AzureOpenAIEmbeddings, OpenAIEmbeddings
client = OpenAI(
    base_url="http://localhost:8000/v1",
    api_key="token-abc123",
)

models = client.models.list()
model = 'BAAI/bge-en-icl'

responses = client.embeddings.create(
    input=[
        "Hello my name is",
        "The best thing about vLLM is that it supports many different models",
        "annual wellness",
        "What is an Annual Wellness Visit? An Annual Wellness Visit (ANNUAL WELLNESS VISIT) is a yearly appointment with your healthcare provider focused on preventive care."
    ],
    model=model,
)
for data in responses.data:
    # print(data.embedding)  # list of float of len 4096
    print(len(data.embedding))

Non-working code will trigger the vLLM index select error on some tokens

from openai import OpenAI
from langchain_openai import AzureOpenAIEmbeddings, OpenAIEmbeddings


embeddings = OpenAIEmbeddings(
                openai_api_base = "http://localhost:8000/v1",
                openai_api_key = "token-abc123",
                model = 'BAAI/bge-en-icl',
                openai_api_type="openai",
                chunk_size = 1
            )
text = "what is an annual anual visit"
# text = "annual wellness"
text = "annual wellness"
query_result = embeddings.embed_query(text)
print(len(query_result))

Error Message and Stack Trace (if applicable)

../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [27,0,0], thread: [63,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
ERROR 10-10 22:51:09 engine.py:157] RuntimeError('CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`')
ERROR 10-10 22:51:09 engine.py:157] Traceback (most recent call last):
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 155, in start
ERROR 10-10 22:51:09 engine.py:157]     self.run_engine_loop()
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 218, in run_engine_loop
ERROR 10-10 22:51:09 engine.py:157]     request_outputs = self.engine_step()
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 236, in engine_step
ERROR 10-10 22:51:09 engine.py:157]     raise e
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/multiprocessing/engine.py", line 227, in engine_step
ERROR 10-10 22:51:09 engine.py:157]     return self.engine.step()
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/engine/llm_engine.py", line 1264, in step
ERROR 10-10 22:51:09 engine.py:157]     outputs = self.model_executor.execute_model(
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/executor/gpu_executor.py", line 130, in execute_model
ERROR 10-10 22:51:09 engine.py:157]     output = self.driver_worker.execute_model(execute_model_req)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/worker_base.py", line 327, in execute_model
ERROR 10-10 22:51:09 engine.py:157]     output = self.model_runner.execute_model(
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
ERROR 10-10 22:51:09 engine.py:157]     return func(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/worker/embedding_model_runner.py", line 115, in execute_model
ERROR 10-10 22:51:09 engine.py:157]     hidden_states = model_executable(**execute_model_kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 10-10 22:51:09 engine.py:157]     return self._call_impl(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 10-10 22:51:09 engine.py:157]     return forward_call(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama_embedding.py", line 41, in forward
ERROR 10-10 22:51:09 engine.py:157]     return self.model.forward(input_ids, positions, kv_caches,
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 329, in forward
ERROR 10-10 22:51:09 engine.py:157]     hidden_states, residual = layer(
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 10-10 22:51:09 engine.py:157]     return self._call_impl(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 10-10 22:51:09 engine.py:157]     return forward_call(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 251, in forward
ERROR 10-10 22:51:09 engine.py:157]     hidden_states = self.self_attn(
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 10-10 22:51:09 engine.py:157]     return self._call_impl(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 10-10 22:51:09 engine.py:157]     return forward_call(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/models/llama.py", line 178, in forward
ERROR 10-10 22:51:09 engine.py:157]     qkv, _ = self.qkv_proj(hidden_states)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
ERROR 10-10 22:51:09 engine.py:157]     return self._call_impl(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
ERROR 10-10 22:51:09 engine.py:157]     return forward_call(*args, **kwargs)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 367, in forward
ERROR 10-10 22:51:09 engine.py:157]     output_parallel = self.quant_method.apply(self, input_, bias)
ERROR 10-10 22:51:09 engine.py:157]   File "/home/pii/anaconda3/envs/vllm/lib/python3.10/site-packages/vllm/model_executor/layers/linear.py", line 135, in apply
ERROR 10-10 22:51:09 engine.py:157]     return F.linear(x, layer.weight, bias)
ERROR 10-10 22:51:09 engine.py:157] RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
CRITICAL 10-10 22:51:09 launcher.py:72] AsyncLLMEngine has failed, terminating server process
INFO:     127.0.0.1:33778 - "POST /v1/embeddings HTTP/1.1" 500 Internal Server Error
...
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [27,0,0], thread: [28,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [27,0,0], thread: [29,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [27,0,0], thread: [30,0,0] Assertion `srcIndex < srcSelectDimSize` failed.
../aten/src/ATen/native/cuda/Indexing.cu:1231: indexSelectSmallIndex: block: [27,0,0], thread: [31,0,0] Assertion `srcIndex < srcSelectDimSize` failed.

Description

I am testing creating embeddings using vLLM endpoint with langchain embedding wrapper. Non-working code based on langchain.OpenAIEmbeddings will trigger CUDA error on the vllm side.
The reason I believe there is a bug in langchain OpenAIEmbeddings is that I have both a working code based on OpenAI and non-working code based on langchain. Plus, there is no quantization and parallelization enabled on vLLM side.

To reproduce the error:

  1. install vllm required packages and run vllm serve BAAI/bge-en-icl
  2. Run the two versions scripts above
  3. Working code runs fine on any text input. Non-working code will fail on some token sequences. Here I found that it fail for input text "annual wellness".

System Info

System Information

OS: Linux
OS Version: #129~20.04.1-Ubuntu SMP Wed Aug 7 13:07:13 UTC 2024
Python Version: 3.9.20 (main, Oct 3 2024, 07:27:41)
[GCC 11.2.0]

Package Information

langchain_core: 0.3.10
langchain: 0.3.3
langchain_community: 0.2.7
langsmith: 0.1.130
langchain_experimental: 0.0.62
langchain_openai: 0.2.2
langchain_text_splitters: 0.3.0

Optional packages not installed

langgraph
langserve

Other Dependencies

aiohttp: 3.10.8
async-timeout: 4.0.3
dataclasses-json: 0.6.7
httpx: 0.27.2
jsonpatch: 1.33
numpy: 1.26.4
openai: 1.51.0
orjson: 3.10.7
packaging: 24.1
pydantic: 2.7.4
PyYAML: 6.0.2
requests: 2.32.3
requests-toolbelt: 1.0.0
SQLAlchemy: 2.0.35
tenacity: 8.5.0
tiktoken: 0.7.0
typing-extensions: 4.12.2

@dosubot dosubot bot added the 🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature label Oct 11, 2024
@kodychik
Copy link

Hi can I work on this issue?
Thanks

Copy link

dosubot bot commented Jan 12, 2025

Hi, @pengfeihe2024. I'm Dosu, and I'm helping the LangChain team manage their backlog. I'm marking this issue as stale.

Issue Summary

  • You reported a CUDA error with the OpenAIEmbeddings class during the embed_query method.
  • The error persists despite updating to the latest version of LangChain.
  • Example code was provided to illustrate the problem.
  • @kodychik has shown interest in working on this issue.

Next Steps

  • Please confirm if this issue is still relevant with the latest version of the LangChain repository. If so, you can keep the discussion open by commenting here.
  • If there is no further activity, this issue will be automatically closed in 7 days.

Thank you for your understanding and contribution!

@dosubot dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jan 12, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:bug Related to a bug, vulnerability, unexpected error with an existing feature stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed
Projects
None yet
Development

No branches or pull requests

2 participants