Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segmentation fault #12

Open
quantumiracle opened this issue Oct 18, 2022 · 5 comments
Open

Segmentation fault #12

quantumiracle opened this issue Oct 18, 2022 · 5 comments

Comments

@quantumiracle
Copy link

Hi,

When I run experiments with python train.py --task=ShadowHandOver --algo=ppo, it generates the following error:

Algorithm:  ppo
Python
Averaging factor:  0.01
Obs type: full_state
Not connected to PVD
+++ Using GPU PhysX
Physics Engine: PhysX
Physics Device: cuda:0
GPU Pipeline: enabled
WARNING: lavapipe is not a conformant vulkan implementation, testing use only.
...
Unhandled descriptor set 433
Unhandled descriptor set 1788307008
Segmentation fault (core dumped)

I know this might not be an issue of the repo, but the compatibility of Nvidia gpu driver. Just to post here to see if anyone has the solution.

My test GPU is Nvidia A6000 with NVIDIA-SMI 510.85.02 Driver Version: 510.85.02 CUDA Version: 11.6

@quantumiracle
Copy link
Author

some update:
by vulkaninfo:

	Devices: count = 9
		GPU id 	: 0 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 1 (llvmpipe (LLVM 12.0.0, 256 bits))
		Layer-Device Extensions: count = 0

		GPU id 	: 2 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 3 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 4 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 5 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 6 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 7 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 8 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

some GPU (GPU id: 1) may not using nvidia driver, but llvmpipe instead. Any specification to this GPU will lead to the segmentation fault.

@gemcollector
Copy link

some update: by vulkaninfo:

	Devices: count = 9
		GPU id 	: 0 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 1 (llvmpipe (LLVM 12.0.0, 256 bits))
		Layer-Device Extensions: count = 0

		GPU id 	: 2 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 3 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 4 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 5 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 6 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 7 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

		GPU id 	: 8 (NVIDIA RTX A6000)
		Layer-Device Extensions: count = 0

some GPU (GPU id: 1) may not using nvidia driver, but llvmpipe instead. Any specification to this GPU will lead to the segmentation fault.

I also met this problem when using issacgym. So have you fixed this problem?

@cypypccpy
Copy link
Collaborator

cypypccpy commented Oct 27, 2022

Hi @gemcollector ,

Isaac Gym is currently only available on nvidia GPU, you can specify the GPU (such as cuda:2) to run on by adding
--rl_device=cuda:2 --sim_device=cuda:2
to the command line at startup to avoid using Isaac Gym on a GPU without a cuda driver.

Hope this can help you.

@cypypccpy
Copy link
Collaborator

Hi @quantumiracle ,

Thank you for your sharing, this information is very valuable!

@quantumiracle
Copy link
Author

@gemcollector I just specify the device as suggested by @cypypccpy to avoid the GPU using llvmpipe as a temporal solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants