Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to EGL with glad. #59

Open
houyaokun opened this issue Nov 21, 2023 · 15 comments
Open

failed to EGL with glad. #59

houyaokun opened this issue Nov 21, 2023 · 15 comments

Comments

@houyaokun
Copy link

error:failed to EGL with glad.
Does this error occur because I didn't install EGL properly?
When I enter "ldconfig -p | grep libEGL" in the terminal, I get the following output.
libEGL_nvidia.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_nvidia.so.0
libEGL_nvidia.so.0 (libc6) => /lib/i386-linux-gnu/libEGL_nvidia.so.0
libEGL_mesa.so.0 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL_mesa.so.0
libEGL.so.1 (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so.1
libEGL.so (libc6,x86-64) => /lib/x86_64-linux-gnu/libEGL.so
Can you please guide me on what to do next? Thank you very much.

@2488583886
Copy link

Hello, I encountered the same err, could you please share how you solve this?

@houyaokun
Copy link
Author

Hello, I encountered the same err, could you please share how you solve this?

I just render in headless mode, then the error disappeared.

@qureshinomaan
Copy link

Hi,
Sorry for the naive question, but how do you run in headless mode? I have tried setting cfg.show_gui as false in datarenderer.py. I have also unset $DISPLAY. However, I continue to face this error.

Thanks!

@lukashermann
Copy link
Collaborator

@qureshinomaan what exactly are you running? The datarenderer in calvin_env is not being used during training, we just used it to render the dataset once after recording it with teleoperation. During training, it's the rollout callbacks that use the calvin_env simulator, so as a quick fix you could disable them during training (they are just used to evaluate the performance during training), however, you would still need to render at one point for the full evaluation after the training is done. (for disabling them, set ~callbacks/rollout and ~callbacks/rollout_lh in the command line arguments for the training).
Also, headless rendering is enabled by default for the rollouts / the evaluation. Does your computer have a graphics card? EGL renders on the GPU, so it would fail if you don't have one.

What is the output if you run this script in calvin_env?

@qureshinomaan
Copy link

qureshinomaan commented Jan 7, 2024

Hi @lukashermann!
Thanks a lot for responding!
I am using the following command with debug dataset

$ python training.py datamodule.root_data_dir=/path/to/dataset/ datamodule/datasets=vision_lang_shm

I am running this on a machine with a 3080Ti GPU with 16GB VRAM. In the environment, torch is properly installed (cuda.is_available() is true)

I get the following output :

 | Name               | Type                   | Params
--------------------------------------------------------------
0 | perceptual_encoder | ConcatEncoders         | 174 K 
1 | plan_proposal      | PlanProposalNetwork    | 13.9 M
2 | plan_recognition   | PlanRecognitionNetwork | 36.0 M
3 | visual_goal        | VisualGoalEncoder      | 4.4 M 
4 | language_goal      | LanguageGoalEncoder    | 5.1 M 
5 | action_decoder     | LogisticPolicyNetwork  | 13.8 M
--------------------------------------------------------------
73.2 M    Trainable params
0         Non-trainable params
73.2 M    Total params
146.424   Total estimated model params size (MB)
Sanity Checking: 0it [00:00, ?it/s]pybullet build time: Nov 28 2023 23:51:11
[2024-01-07 14:26:30,850][calvin_agent.wrappers.calvin_env_wrapper][WARNING] - Couldn't find correct EGL device. Setting EGL_VISIBLE_DEVICE=0. When using DDP with many GPUs this can lead to OOM errors. Did you install PyBullet correctly? Please refer to calvin env README
[2024-01-07 14:26:30,851][calvin_agent.wrappers.calvin_env_wrapper][INFO] - EGL_DEVICE_ID 0 <==> CUDA_DEVICE_ID 0
argv[0]=--width=200
argv[1]=--height=200
[2024-01-07 14:26:30,958][calvin_env.envs.play_table_env][INFO] - Loading EGL plugin (may segfault on misconfigured systems)...
failed to EGL with glad.

@lukashermann
Copy link
Collaborator

EGL has nothing to do with torch, it is the GPU renderer of pybullet.
Could you still run the script that I linked and copy the output here?

cd calvin_env/egl_check
bash build.sh  # should have been built automatically, but try running this again
python list_egl_options.py

@lukashermann
Copy link
Collaborator

Anyway, this is not an issue with our repository, but with pybullet. Did you try following issues like this one?

bulletphysics/bullet3#3737

@qureshinomaan
Copy link

The output of the commands you said to run.

----------Default-------------
Starting EGL query
b'EGL device choice: -1 of 0.\neglInitialize() failed with error: 3008\n'
number of EGL devices: 0

I think there is a mismatch between the egl driver and cuda driver in my system. I have seen similar issues on Habitat and ai2thor repositories as well. I setted up the repository on another system, followed the same instructions and was able to run it.
The only difference was the version of cuda (worked with cuda 12.0, didn't work with 11.7).

Anyways, thanks for your help!

@lukashermann
Copy link
Collaborator

I don't think that the cuda driver is relevant here, maybe the nvidia driver.

@Patricia1019
Copy link

Hello, I encountered the same err, could you please share how you solve this?

I just render in headless mode, then the error disappeared.

Sorry but can you teach me how to render in headless mode?

@lukashermann
Copy link
Collaborator

It renders in headless mode by default. which error do you get?

@COST-97
Copy link

COST-97 commented Apr 18, 2024

Hello:
The same error failed to EGL with glad, though show_gui=False.
The output of the commands
cd calvin_env/egl_check bash build.sh # should have been built automatically, but try running this again python list_egl_options.py
is:
----------Default------------- Starting EGL query Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query b'EGL device choice: -1 of 1.\n' number of EGL devices: 1 ----------Option #1 (id=0)------------- Starting EGL query EGL device choice: 0 of 1 (from EGL_VISIBLE_DEVICE) Loaded EGL 1.5 after reload. GL_VENDOR=Mesa/X.org GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits) GL_VERSION=4.5 (Core Profile) Mesa 21.2.6 GL_SHADING_LANGUAGE_VERSION=4.50 Completeing EGL query

The output of the commands
ldconfig -p | grep libEGL
is
libEGL_mesa.so.0 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL_mesa.so.0 libEGL.so.1 (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so.1 libEGL.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libEGL.so

nvidia driver:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 525.105.17 Tue Mar 28 18:02:59 UTC 2023 GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-44) (GCC)

@lukashermann
Copy link
Collaborator

Are you sure the nvidia-drivers are correctly installed? What's your output for nvidia-smi ? The output of list_egl_options.py should list the Nvidia card.

@Caixy1113
Copy link

Hi,Sorry but can you help me with the same bug? I try to run the command
python evaluation/evaluate_policy.py --dataset_path $CALVIN_ROOT/dataset/calvin_debug_dataset --train_folder $CALVIN_ROOT/calvin_models/calvin_agent/checkpoints/D_D_static_rgb_baseline --checkpoint $CALVIN_ROOT/calvin_models/calvin_agent/checkpoints/D_D_static_rgb_baseline/mcil_baseline.ckpt
and got the same bug
pybullet build time: May 10 2024 10:39:45 Global seed set to 0 trying to load lang data from: /home/cxy/calvin/dataset/calvin_debug_dataset/training/lang_annotations/auto_lang_ann.npy trying to load lang data from: /home/cxy/calvin/dataset/calvin_debug_dataset/validation/lang_annotations/auto_lang_ann.npy argv[0]=--width=200 argv[1]=--height=200 failed to EGL with glad.

I also try the list_egl_options.py and get
`----------Default-------------
Starting EGL query
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 545.23.06
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
Completeing EGL query
b'EGL device choice: -1 of 9.\n'
number of EGL devices: 9
----------Option #1 (id=0)-------------
Starting EGL query
EGL device choice: 0 of 9 (from EGL_VISIBLE_DEVICE)
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
CUDA_DEVICE=0
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 545.23.06
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
Completeing EGL query

----------Option #2 (id=1)-------------
Starting EGL query
EGL device choice: 1 of 9 (from EGL_VISIBLE_DEVICE)
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
CUDA_DEVICE=1
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 545.23.06
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
Completeing EGL query

----------Option #3 (id=2)-------------
Starting EGL query
EGL device choice: 2 of 9 (from EGL_VISIBLE_DEVICE)
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
CUDA_DEVICE=2
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 545.23.06
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
Completeing EGL query

----------Option #4 (id=3)-------------
Starting EGL query
EGL device choice: 3 of 9 (from EGL_VISIBLE_DEVICE)
Loaded EGL 1.5 after reload.
GL_VENDOR=NVIDIA Corporation
CUDA_DEVICE=3
GL_RENDERER=NVIDIA GeForce RTX 3090/PCIe/SSE2
GL_VERSION=3.3.0 NVIDIA 545.23.06
GL_SHADING_LANGUAGE_VERSION=3.30 NVIDIA via Cg compiler
Completeing EGL query

----------Option #5 (id=4)-------------
Starting EGL query
EGL device choice: 4 of 9 (from EGL_VISIBLE_DEVICE)
libEGL warning: failed to open /dev/dri/renderD131: Permission denied

libEGL warning: failed to open /dev/dri/renderD131: Permission denied

eglInitialize() failed with error: 3008

----------Option #6 (id=5)-------------
Starting EGL query
EGL device choice: 5 of 9 (from EGL_VISIBLE_DEVICE)
libEGL warning: failed to open /dev/dri/renderD130: Permission denied

libEGL warning: failed to open /dev/dri/renderD130: Permission denied

eglInitialize() failed with error: 3008

----------Option #7 (id=6)-------------
Starting EGL query
EGL device choice: 6 of 9 (from EGL_VISIBLE_DEVICE)
libEGL warning: failed to open /dev/dri/renderD129: Permission denied

libEGL warning: failed to open /dev/dri/renderD129: Permission denied

eglInitialize() failed with error: 3008

----------Option #8 (id=7)-------------
Starting EGL query
EGL device choice: 7 of 9 (from EGL_VISIBLE_DEVICE)
libEGL warning: failed to open /dev/dri/renderD128: Permission denied

libEGL warning: failed to open /dev/dri/renderD128: Permission denied

eglInitialize() failed with error: 3008

----------Option #9 (id=8)-------------
Starting EGL query
EGL device choice: 8 of 9 (from EGL_VISIBLE_DEVICE)
Loaded EGL 1.5 after reload.
GL_VENDOR=Mesa/X.org
GL_RENDERER=llvmpipe (LLVM 12.0.0, 256 bits)
GL_VERSION=4.5 (Core Profile) Mesa 21.2.6
GL_SHADING_LANGUAGE_VERSION=4.50
Completeing EGL query`

Hope for your reply.

@xiaofeifei-1
Copy link

Hello, I encountered the same err, could you please share how you solve this?

me too!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants