-
Notifications
You must be signed in to change notification settings - Fork 490
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
H20显卡推理 glm9b-chat失败,版本0.16.3 #2544
Comments
这个报错一般是 OOM 或者程序意外退出。H20 应该够跑 glm9b,感觉可能是驱动或者什么导致意外退出。 |
This issue is stale because it has been open for 7 days with no activity. |
我遇到同样的问题,部署qwen72B,在A100 40G*2可以正常部署。在H20上面可以部署成功,对话后报错,服务重启。 |
This issue is stale because it has been open for 7 days with no activity. |
Hi, 我也遇到了这个问题。 Qwen2.5-32B的模型,在A30的GPU上面跑是正常的,在公司新买的H20上面一跑就挂, 看显存的监控就知道挂了重启了。 尝试更换了transformers版本也不行,xinference升级到1.0.1也不行... |
挂了有日志吗 |
模型正常启动以后,发起请求就挂了,没有发现明显的异常报错,
|
我用的是transformers的框架,但是从vllm issue里找到一个solution,测试了 一下可以跑模型了。
尽管 torch 2.3.1 requires nvidia-cublas-cu12==12.1.3.1,升级后还是可用的。 |
This issue is stale because it has been open for 7 days with no activity. |
This issue was closed because it has been inactive for 5 days since being marked as stale. |
System Info / 系統信息
版本0.16.3
显卡 H20
cuda 12.1
Running Xinference with Docker? / 是否使用 Docker 运行 Xinfernece?
Version info / 版本信息
0.16.3
The command used to start Xinference / 用以启动 xinference 的命令
nohup env XINFERENCE_HOME=/home/root/.cache XINFERENCE_MODEL_SRC=modelscope xinference-local --log-level debug --host 0.0.0.0 --port 9997 > output.log 2>&1 &
Reproduction / 复现过程
从可视化UI中加载
报错日志:
2024-11-12 08:45:41,477 xinference.core.model 5261 DEBUG [request 7f8510d6-a0d2-11ef-af1b-06cfd44f9164] Enter chat, args: ModelActor(glm4-chat-0),[{'role': 'user', 'content': '你好'}],{'frequency_penalty': 0.0, 'max_tokens': 512, 'presence_penalty': 0.0, 'temperature': 0.7, 'top_p': ..., kwargs: raw_params={'frequency_penalty': 0.0, 'max_tokens': 512, 'presence_penalty': 0.0, 'stream': True, 'temperature'...
2024-11-12 08:45:41,478 xinference.core.model 5261 DEBUG [request 7f8510d6-a0d2-11ef-af1b-06cfd44f9164] Leave chat, elapsed time: 0 s
2024-11-12 08:45:41,478 xinference.core.model 5261 DEBUG After request chat, current serve request count: 0 for the model glm4-chat
2024-11-12 08:45:41,486 transformers.generation.configuration_utils 5261 INFO loading configuration file /home/root/.cache/cache/glm4-chat-pytorch-9b/generation_config.json
loading configuration file /home/root/.cache/cache/glm4-chat-pytorch-9b/generation_config.json
2024-11-12 08:45:41,486 transformers.generation.configuration_utils 5261 INFO Generate config GenerationConfig {
"do_sample": true,
"eos_token_id": [
151329,
151336,
151338
],
"max_length": 128000,
"pad_token_id": 151329,
"temperature": 0.8,
"top_p": 0.8
}
Generate config GenerationConfig {
"do_sample": true,
"eos_token_id": [
151329,
151336,
151338
],
"max_length": 128000,
"pad_token_id": 151329,
"temperature": 0.8,
"top_p": 0.8
}
2024-11-12 08:45:42,850 xinference.api.restful_api 4440 ERROR Chat completion stream got an error: Remote server 0.0.0.0:33031 closed
Traceback (most recent call last):
File "/root/miniconda3/envs/yu/lib/python3.10/site-packages/xinference/api/restful_api.py", line 1974, in stream_results
async for item in iterator:
File "/root/miniconda3/envs/yu/lib/python3.10/site-packages/xoscar/api.py", line 340, in anext
return await self._actor_ref.xoscar_next(self._uid)
File "/root/miniconda3/envs/yu/lib/python3.10/site-packages/xoscar/backends/context.py", line 230, in send
result = await self._wait(future, actor_ref.address, send_message) # type: ignore
File "/root/miniconda3/envs/yu/lib/python3.10/site-packages/xoscar/backends/context.py", line 115, in _wait
return await future
File "/root/miniconda3/envs/yu/lib/python3.10/site-packages/xoscar/backends/core.py", line 84, in _listen
raise ServerClosed(
xoscar.errors.ServerClosed: Remote server 0.0.0.0:33031 closed
2024-11-12 08:45:43,146 xinference.core.worker 4582 WARNING Process 0.0.0.0:33031 is down.
现象:模型已经加载到显存中,调用对话接口后立马出现上述报错,然后模型重新加载。
Expected behavior / 期待表现
期望正常对话
The text was updated successfully, but these errors were encountered: