Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CUBLAS_STATUS_NOT_INITIALIZED Error #33

Open
championsnet opened this issue Mar 8, 2024 · 0 comments
Open

CUBLAS_STATUS_NOT_INITIALIZED Error #33

championsnet opened this issue Mar 8, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@championsnet
Copy link
Collaborator

Sometimes in the progen2-xlarge models when extracting embeddings and after encoding I get this error:

Traceback (most recent call last):
  File "/cluster/home/estamkopoulo/plmfit_workspace/plmfit/plmfit.py", line 102, in <module>
    model.extract_embeddings(data_type=args.data_type, layer=args.layer,
  File "/cluster/home/estamkopoulo/plmfit_workspace/plmfit/plmfit/models/pretrained_models.py", line 240, in extract_embeddings
    model_output = self.py_model(batch[0])
                   ^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/apps/nss/gcc-8.2.0/python/3.11.2/x86_64/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/home/estamkopoulo/plmfit_workspace/plmfit/plmfit/language_models/progen2/models/progen/modeling_progen.py", line 636, in forward
    transformer_outputs = self.transformer(
                          ^^^^^^^^^^^^^^^^^
  File "/cluster/apps/nss/gcc-8.2.0/python/3.11.2/x86_64/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/home/estamkopoulo/plmfit_workspace/plmfit/plmfit/language_models/progen2/models/progen/modeling_progen.py", line 508, in forward
    outputs = block(
              ^^^^^^
  File "/cluster/apps/nss/gcc-8.2.0/python/3.11.2/x86_64/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/home/estamkopoulo/plmfit_workspace/plmfit/plmfit/language_models/progen2/models/progen/modeling_progen.py", line 266, in forward
    attn_outputs = self.attn(
                   ^^^^^^^^^^
  File "/cluster/apps/nss/gcc-8.2.0/python/3.11.2/x86_64/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/home/estamkopoulo/plmfit_workspace/plmfit/plmfit/language_models/progen2/models/progen/modeling_progen.py", line 155, in forward
    qkv = self.qkv_proj(hidden_states)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/apps/nss/gcc-8.2.0/python/3.11.2/x86_64/lib64/python3.11/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/cluster/apps/nss/gcc-8.2.0/python/3.11.2/x86_64/lib64/python3.11/site-packages/torch/nn/modules/linear.py", line 114, in forward
    return F.linear(input, self.weight, self.bias)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
../c10/core/DynamicCast.h:95: cast_and_store: block: [61649,0,0], thread: [50,0,0] Assertion `false` failed.
../c10/core/DynamicCast.h:95: cast_and_store: block: [59867,0,0], thread: [50,0,0] Assertion `false` failed.
../c10/core/DynamicCast.h:95: cast_and_store: block: [77201,0,0], thread: [50,0,0] Assertion `false` failed.
../c10/core/DynamicCast.h:95: cast_and_store: block: [79960,0,0], thread: [82,0,0] Assertion `false` failed.
../c10/core/DynamicCast.h:95: cast_and_store: block: [90094,0,0], thread: [114,0,0] Assertion `false` failed.

The log file:

#---------Logger initiated with name "extract_embeddings_gb1_progen2-xlarge_layer-last_mean" at 2024-03-08 18:49:21.584684---------#
Available GPUs : 1
Running on NVIDIA A100 80GB PCIe
Encoding 149361 sequences....
Initiating categorical encoding
Memory needed for encoding: 158322660B
First sequence tokens: [1, 16, 20, 28, 14, 15, 13, 15, 17, 11, 14, 23, 15, 14, 11, 9, 23, 23, 23, 9, 5, 25, 8, 5, 5, 23, 5, 9, 14, 25, 10, 14, 20, 28, 5, 17, 8, 17, 11, 25, 8, 11, 9, 26, 23, 28, 8, 8, 5, 23, 14, 23, 10, 23, 25, 23, 9, 15, 9, 25, 15, 10, 20, 11, 19, 15, 8, 19, 17, 22, 16, 5, 23, 28, 9, 25, 15, 7, 9, 25, 5, 21, 14, 15, 11, 23, 8, 8, 21, 9, 25, 25, 15, 10, 15, 15, 17, 25, 10, 13, 19, 20, 19, 23, 15, 5, 20, 15, 13, 11, 5, 15, 21, 5, 15, 14, 9, 9, 11, 21, 15, 23, 10, 19, 15, 15, 5, 9, 7, 15, 10, 21, 5, 11, 21, 21, 8, 15, 15, 21, 8, 15, 15, 12, 15, 8, 19, 21, 10, 15, 9, 21, 12, 15, 5, 11, 23, 16, 22, 28, 10, 22, 19, 28, 20, 15, 23, 25, 15, 12, 25, 8, 11, 9, 15, 7, 5, 21, 8, 13, 21, 22, 15, 13, 10, 15, 22, 14, 8, 23, 13, 11, 22, 21, 22, 23, 19, 20, 23, 10, 15, 12, 26, 25, 28, 7, 16, 9, 17, 15, 8, 15, 15, 11, 19, 23, 8, 25, 8, 5, 15, 16, 22, 16, 15, 21, 22, 15, 22, 21, 25, 8, 15, 20, 21, 20, 25, 20, 23, 15, 16, 11, 15, 12, 15, 22, 11, 19, 22, 12, 22, 20, 12, 28, 21, 12, 23, 19, 15, 9, 12, 12, 12, 12, 12, 12, 2]
Categorical encoding finished
Encoding completed! 18.9484s
Extracting embeddings for 149361 sequences...

gres: 80g
mem-per-cpu: 90g

@BikiasT BikiasT added the bug Something isn't working label Mar 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants