Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[xformers/benchmarks/benchmark_encoder.py] fatal error: cuda.h: No such file or directory + AttributeError: module 'triton' has no attribute 'code_gen' #516

Open
0xdevalias opened this issue Nov 10, 2022 · 6 comments

Comments

@0xdevalias
Copy link

🐛 Bug

Trying to follow along with:

/tmp/tmpzfde7mdr/main.c:2:10: fatal error: cuda.h: No such file or directory
 #include "cuda.h"
          ^~~~~~~~
compilation terminated.
  0%|                                                    | 0/28 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "<string>", line 21, in layer_norm_fw
KeyError: ('2-.-0-.-0--7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-d962222789c30252d492a16cca3bf467-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (True, 256), (True, True, True, True, True, True, (True, False), (True, False), (False,)))
During handling of the above exception, another exception occurred:
..snip..
  AttributeError: module 'triton' has no attribute 'code_gen'

Command

To Reproduce

Steps to reproduce the behavior:

  1. !conda run -n dreambooth --live-stream python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16

Expected behavior

The benchmark would run successfully.

Environment

⇒ python -m torch.utils.collect_env

Collecting environment information...
PyTorch version: 1.13.0+cu117
Is debug build: False
CUDA used to build PyTorch: 11.7
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.6 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.24.3
Libc version: glibc-2.27

Python version: 3.10.6 (main, Oct 24 2022, 16:07:47) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.4.0-124-generic-x86_64-with-glibc2.27
Is CUDA available: True
CUDA runtime version: Could not collect
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3090
Nvidia driver version: 515.65.01
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] functorch==1.13.0
[pip3] mypy==0.812
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.23.4
[pip3] pytorch-lightning==1.8.0.post1
[pip3] torch==1.13.0
[pip3] torchmetrics==0.10.2
[pip3] torchvision==0.14.0
[conda] cudatoolkit               11.3.1               h2bc3f7f_2
[conda] functorch                 1.13.0                   pypi_0    pypi
[conda] numpy                     1.23.4                   pypi_0    pypi
[conda] pytorch-lightning         1.8.0.post1              pypi_0    pypi
[conda] torch                     1.13.0                   pypi_0    pypi
[conda] torchmetrics              0.10.2                   pypi_0    pypi
[conda] torchvision               0.14.0                   pypi_0    pypi

Additional context

Testing the following parameters: 
 {
    "activation": [
        "relu"
    ],
    "attention_name": [
        "favor",
        "blocksparse",
        "global",
        "linformer",
        "local",
        "nystrom",
        "orthoformer",
        "random",
        "scaled_dot_product",
        "compositional",
        "fourier_mix",
        "lambda",
        "pooling",
        "visual"
    ],
    "autocast": [
        true
    ],
    "batch_size": [
        32
    ],
    "causal": [
        false
    ],
    "embed_dim": [
        256
    ],
    "feedforward_name": [
        "MLP"
    ],
    "heads": [
        16
    ],
    "sequence_length": [
        576,
        1024
    ]
}
  0%|                                                    | 0/28 [00:00<?, ?it/s]Testing: xFormerEncoderBlock(
  (pose_encoding): SinePositionalEmbedding()
  (wrap_att): PostNorm(
    (norm): FusedLayerNorm()
    (sublayer): Residual(
      (layer): MultiHeadDispatch(
        (attention): FavorAttention(
          (attn_drop): Dropout(p=0.1, inplace=True)
          (feature_map): SMReg()
        )
        (in_proj_container): InputProjection(
          (q_proj): Linear(in_features=256, out_features=256, bias=True)
          (k_proj): Linear(in_features=256, out_features=256, bias=True)
          (v_proj): Linear(in_features=256, out_features=256, bias=True)
        )
        (resid_drop): Dropout(p=0.1, inplace=False)
        (proj): Linear(in_features=256, out_features=256, bias=True)
      )
    )
  )
  (wrap_ff): PostNorm(
    (norm): FusedLayerNorm()
    (sublayer): Residual(
      (layer): MLP(
        (mlp): Sequential(
          (0): Linear(in_features=256, out_features=1024, bias=True)
          (1): ReLU()
          (2): Dropout(p=0.1, inplace=False)
          (3): Linear(in_features=1024, out_features=256, bias=True)
          (4): Dropout(p=0.1, inplace=False)
        )
      )
    )
  )
) 32 576 256 True cuda favor
/tmp/tmpzfde7mdr/main.c:2:10: fatal error: cuda.h: No such file or directory
 #include "cuda.h"
          ^~~~~~~~
compilation terminated.
  0%|                                                    | 0/28 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "<string>", line 21, in layer_norm_fw
KeyError: ('2-.-0-.-0--7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-d962222789c30252d492a16cca3bf467-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (True, 256), (True, True, True, True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 223, in layer_norm
    return _LayerNorm.apply(x, weight, bias, eps)
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 97, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 73, in forward
    layer_norm_fw[(M,)](
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
    return self.run(*args, grid=grid, **kwargs)
  File "<string>", line 41, in layer_norm_fw
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/compiler.py", line 1239, in compile
    so = _build(fn.__name__, src_path, tmpdir)
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/compiler.py", line 1169, in _build
    ret = subprocess.check_call(cc_cmd)
  File "/opt/conda/envs/dreambooth/lib/python3.10/subprocess.py", line 369, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpzfde7mdr/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/envs/dreambooth/include/python3.10', '-I/tmp/tmpzfde7mdr', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpzfde7mdr/layer_norm_fw.cpython-310-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/benchmarks/benchmark_encoder.py", line 379, in <module>
    outputs = test_xformer_encoder_block(**constants, **params)  # type: ignore
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/benchmarks/benchmark_encoder.py", line 181, in test_xformer_encoder_block
    return benchmark_model(
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/benchmarks/benchmark_encoder.py", line 133, in benchmark_model
    _train_for_several_steps(num_steps=num_warmup, **warm_up_args)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/benchmarks/benchmark_encoder.py", line 99, in _train_for_several_steps
    output = block(inputs)
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/factory/block_factory.py", line 231, in forward
    x = self.wrap_att(inputs=[q, k, v], att_mask=att_mask)
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/components/residual.py", line 165, in forward
    return self.norm(x)
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1190, in _call_impl
    return forward_call(*input, **kwargs)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 193, in forward
    return layer_norm(x, self.weight, self.bias, self.epsilon)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 224, in layer_norm
    except (triton.code_gen.OutOfResources, RuntimeError) as e:
AttributeError: module 'triton' has no attribute 'code_gen'
ERROR conda.cli.main_run:execute(49): `conda run python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16` failed. (See above for error)
@0xdevalias
Copy link
Author

⇒ find / -name cuda.h
/opt/conda/envs/dreambooth/lib/python3.10/site-packages/nvidia/cuda_runtime/include/cuda.h
/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/opt/conda/lib/python3.7/site-packages/nvidia/cuda_runtime/include/cuda.h
/opt/conda/pkgs/pytorch-1.12.0-py3.7_cuda11.3_cudnn8.3.2_0/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
find: '/proc/tty/driver': Permission denied
/usr/include/linux/cuda.h

@0xdevalias
Copy link
Author

0xdevalias commented Nov 10, 2022

One step closer it seems!

⇒ conda install -c nvidia cuda-libraries-dev
⇒ find / -name cuda.h
 /opt/conda/envs/dreambooth/lib/python3.10/site-packages/nvidia/cuda_runtime/include/cuda.h
 /opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
+ /opt/conda/envs/dreambooth/include/cuda.h
 /opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
 /opt/conda/lib/python3.7/site-packages/nvidia/cuda_runtime/include/cuda.h
 /opt/conda/pkgs/pytorch-1.12.0-py3.7_cuda11.3_cudnn8.3.2_0/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
 /opt/conda/pkgs/cuda-cudart-dev-11.8.89-0/include/cuda.h
 find: '/proc/tty/driver': Permission denied
 /usr/include/linux/cuda.h

Yet still getting the error: fatal error: cuda.h: No such file or directory

I noticed that the call to gcc doesn't seem to pass this include path in it's -I's.. haven't dug deeper into the relevant code to figure out why/how to potentially change that:

subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpzfde7mdr/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/envs/dreambooth/include/python3.10', '-I/tmp/tmpzfde7mdr', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpzfde7mdr/layer_norm_fw.cpython-310-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.

@0xdevalias
Copy link
Author

0xdevalias commented Nov 10, 2022



⇒ whereis libcuda.so
libcuda: /usr/lib/x86_64-linux-gnu/libcuda.so
⇒  find / -name libcuda.so
/opt/conda/envs/dreambooth/lib/stubs/libcuda.so
/opt/conda/pkgs/cuda-driver-dev-11.8.89-0/lib/stubs/libcuda.so
find: '/proc/tty/driver': Permission denied
/usr/lib/x86_64-linux-gnu/libcuda.so

⇒  ls -la /usr/local/cuda
lrwxrwxrwx 1 root root 17 Nov 10 06:36 /usr/local/cuda -> /tmp/tmpgyc5dwz3/
⇒  echo $CUDA_HOME


⇒  python -c "from sysconfig import get_paths; print(get_paths()['include'])"
/opt/conda/envs/dreambooth/include/python3.10

@0xdevalias
Copy link
Author

Setting CUDA_HOME seemed to allow it to progress a little bit more, and run into a new/different error:

!CUDA_HOME=/opt/conda/envs/dreambooth conda run -n dreambooth --live-stream python3 xformers/benchmarks/benchmark_encoder.py --activations relu  --plot -emb 256 -bs 32 -heads 16
Traceback (most recent call last):
  File "<string>", line 21, in layer_norm_fw
KeyError: ('2-.-0-.-0--7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-d962222789c30252d492a16cca3bf467-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (True, 256), (True, True, True, True, True, True, (True, False), (True, False), (False,)))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 223, in layer_norm
    return _LayerNorm.apply(x, weight, bias, eps)
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 97, in decorate_fwd
    return fwd(*args, **kwargs)
  File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 73, in forward
    layer_norm_fw[(M,)](
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
    return self.run(*args, grid=grid, **kwargs)
  File "<string>", line 41, in layer_norm_fw
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile
    asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
  File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
    name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: `ptxas` was searched in TRITON_PTXAS_PATH, /usr/local/cuda/bin/ or PATH but a working version could not be found.

Which may be related to this:

@0xdevalias
Copy link
Author

I wonder if the stuff I figured in the following will help here? (to explore when I get a chance):

blefaudeux added a commit to blefaudeux/xformers that referenced this issue Nov 22, 2022
@blefaudeux
Copy link
Contributor

the "triton has no code_gen attritbute" is unrelated, tied to a recent triton update, sorry about that. Fixed in #528

blefaudeux added a commit to blefaudeux/xformers that referenced this issue Nov 22, 2022
fmassa pushed a commit that referenced this issue Nov 24, 2022
* removing the fp16 blocksparse crutch

make the softmax kernel bfloat16 compatible

* partial fix for #516

* nit, adding blfoat16 to the layernorm benchmark

* dead code removal, improve code coverage

Co-authored-by: Benjamin Lefaudeux <benjamin@photoroom.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants