Blocksparse crashes in the encoder benchmark #24

blefaudeux · 2021-10-22T16:14:04Z

🐛 Bug

Running python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16 -mlp MLP -a blocksparse with latest Triton crashes on an illegal memory access.

The dedicated blocksparse matmul benchmark runs, same for the unit test

Expected behavior

Well, not crashing

Environment

PyTorch version: 1.9.1+cu111
Is debug build: False
CUDA used to build PyTorch: 11.1
ROCM used to build PyTorch: N/A

OS: Ubuntu 20.04.1 LTS (x86_64)
GCC version: (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
Clang version: Could not collect
CMake version: version 3.16.3
Libc version: glibc-2.31

Python version: 3.8 (64-bit runtime)
Python platform: Linux-5.4.0-52-generic-x86_64-with-glibc2.17
Is CUDA available: True
CUDA runtime version: 10.1.243
GPU models and configuration: 
GPU 0: Quadro GP100
GPU 1: Quadro GP100

Nvidia driver version: 450.80.02
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.5
[pip3] pytorch-sphinx-theme==0.0.24
[pip3] torch==1.9.1+cu111
[pip3] torch-tb-profiler==0.2.1
[pip3] torchaudio==0.9.1
[pip3] torchvision==0.10.1+cu111
[conda] numpy                     1.19.5                   pypi_0    pypi
[conda] pytorch-sphinx-theme      0.0.24                   pypi_0    pypi
[conda] torch                     1.9.1+cu111              pypi_0    pypi
[conda] torch-tb-profiler         0.2.1                    pypi_0    pypi
[conda] torchaudio                0.9.1                    pypi_0    pypi
[conda] torchvision               0.10.1+cu111             pypi_0    pypi

Additional context

Looks like something which can happen depending on the layout, not super clear why

The text was updated successfully, but these errors were encountered:

blefaudeux · 2021-10-22T16:32:13Z

not a stride issue, by default our q and k do have a strange pattern (due to the head splitting), but passing in a dummy tensor with standard decreasing strides leads to the same mem access crash

[chore] update to python 3.8 and cuda 11.1 on CI

blefaudeux self-assigned this Oct 22, 2021

blefaudeux added bug Something isn't working ongoing labels Oct 22, 2021

blefaudeux mentioned this issue Oct 22, 2021

[Blocksparse] bug fixing half + sequence length #25

Merged

8 tasks

blefaudeux closed this as completed in #25 Oct 22, 2021

xwhan pushed a commit to xwhan/xformers that referenced this issue Feb 8, 2022

Merge pull request facebookresearch#24 from fairinternal/ci_3.8

4d6129e

[chore] update to python 3.8 and cuda 11.1 on CI

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Blocksparse crashes in the encoder benchmark #24

Blocksparse crashes in the encoder benchmark #24

blefaudeux commented Oct 22, 2021

blefaudeux commented Oct 22, 2021

Blocksparse crashes in the encoder benchmark #24

Blocksparse crashes in the encoder benchmark #24

Comments

blefaudeux commented Oct 22, 2021

🐛 Bug

Expected behavior

Environment

Additional context

blefaudeux commented Oct 22, 2021