-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cuda failure 'peer access is not supported between these two devices' #406
Comments
I run into the same issue on a G5.12xlarge instance for WizardLM-30b-fp16 |
|
I get this error even with trying the workaround mentioned by @mspronesti. Interestingly with the same G5.12xlarge instance |
@nivibilla I tried the above workaround in a notebook g5.12xlarge instance in SageMaker and It worked for me. I also tried reinstalling vllm from source adding steps
os.environ["NCCL_IGNORE_DISABLED_P2P"] = '1'
|
Thanks for reporting the issue! This should be fixed by #397. It should be merged soon. You can retry with this fix. |
Should be fixed by #397. Please re-open if you meet any new issues. |
@zhuohan123 Sorry for the delayed response, I will test this out thanks! |
@mspronesti I tried adding the step in source and doing the pip install, However I am getting the following error - AttributeError: 'NoneType' object has no attribute 'fs'. Is there any other way to solve this issue? |
I've tried with installing from the git with pip. And it works for me |
@nivibilla, So basically you cloned the git repo. And then did the change and used pip install command to build it! Correct? Could you share you're existing versions and the worker.py file? I tried the same. But it doesn't work. |
I didn't make any changes to the repo Just do
|
This issue was fixed in #397 but the changes from the named PR have not been released yet. However, if you install |
That looks like pyarrow is missing to me. |
Thanks @nivibilla and @mspronesti - It works perfectly now! |
Modify `benchmark_throughput.py` to allow running with FP8 on HPU (KV cache dtype `fp8_inc`) and to use padding-aware scheduling.
Usage stats collection is enabled. To disable this, run the following command:
ray disable-usage-stats
before starting Ray. See https://docs.ray.io/en/master/cluster/usage-stats.html for more details.2023-07-08 23:11:34,236 INFO worker.py:1610 -- Started a local Ray instance. View the dashboard at 127.0.0.1:8265
INFO 07-08 23:11:35 llm_engine.py:60] Initializing an LLM engine with config: model='openlm-research/open_llama_13b', tokenizer='openlm-research/open_llama_13b', tokenizer_mode=auto, dtype=torch.float16, use_dummy_weights=False, download_dir=None, use_np_weights=False, tensor_parallel_size=4, seed=0)
INFO 07-08 23:11:35 tokenizer.py:28] For some LLaMA-based models, initializing the fast tokenizer may take a long time. To eliminate the initialization time, consider using 'hf-internal-testing/llama-tokenizer' instead of the original tokenizer.
(Worker pid=4225) Exception raised in creation task: The actor died because of an error raised in its creation task, ray::Worker.init() (pid=4225, ip=172.31.68.176, actor_id=5dc662848f950df8d330eb8a01000000, repr=<vllm.worker.worker.Worker object at 0x7f4e9ea814e0>)
(Worker pid=4225) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 40, in init
(Worker pid=4225) _init_distributed_environment(parallel_config, rank,
(Worker pid=4225) File "/opt/conda/lib/python3.10/site-packages/vllm/worker/worker.py", line 307, in _init_distributed_environment
(Worker pid=4225) torch.distributed.all_reduce(torch.zeros(1).cuda())
(Worker pid=4225) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1451, in wrapper
(Worker pid=4225) return func(*args, **kwargs)
(Worker pid=4225) File "/opt/conda/lib/python3.10/site-packages/torch/distributed/distributed_c10d.py", line 1700, in all_reduce
(Worker pid=4225) work = default_pg.allreduce([tensor], opts)
(Worker pid=4225) torch.distributed.DistBackendError: NCCL error in: ../torch/csrc/distributed/c10d/ProcessGroupNCCL.cpp:1275, internal error, NCCL version 2.14.3
(Worker pid=4225) ncclInternalError: Internal check failed.
(Worker pid=4225) Last error:
(Worker pid=4225) Cuda failure 'peer access is not supported between these two devices'
Code:
llm = LLM(model="openlm-research/open_llama_13b", tensor_parallel_size=4)
Env:
Single EC2 instance G5.12xlarge with 4 A10G GPU
The text was updated successfully, but these errors were encountered: