Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failed CI on A100 #2064

Closed
xuzhao9 opened this issue Nov 27, 2023 · 1 comment
Closed

Failed CI on A100 #2064

xuzhao9 opened this issue Nov 27, 2023 · 1 comment
Assignees

Comments

@xuzhao9
Copy link
Contributor

xuzhao9 commented Nov 27, 2023

The test test_llama_v2_7b_16h_example_cuda failed between 20231115 and 20231116.

Failed workflow: https://github.com/pytorch/benchmark/actions/runs/7006721966/job/19059198530

Detailed error and command to reproduce:

$ python run.py llama_v2_7b_16h -d cuda --accuracy
fp64 golden ref were not generated for llama_v2_7b_16h. Setting accuracy check to cosine
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
Traceback (most recent call last):
  File "/data/users/xzhao9/git/benchmark/torchbenchmark/util/env_check.py", line 510, in check_accuracy
    correct_result = run_n_iterations(
  File "/data/users/xzhao9/git/benchmark/torchbenchmark/util/env_check.py", line 395, in run_n_iterations
    _model_iter_fn(mod, inputs, contexts, optimizer, collect_outputs=False)
  File "/data/users/xzhao9/git/benchmark/torchbenchmark/util/env_check.py", line 393, in _model_iter_fn
    return forward_pass(mod, inputs, contexts, collect_outputs)
  File "/data/users/xzhao9/git/benchmark/torchbenchmark/util/env_check.py", line 370, in forward_pass
    return mod(*inputs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 820, in forward
    outputs = self.model(
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 708, in forward
    layer_outputs = decoder_layer(
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 424, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 321, in forward
    query_states = self.q_proj(hidden_states)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1510, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1519, in _call_impl
    return forward_call(*args, **kwargs)
  File "/home/xzhao9/.conda/envs/py38/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 116, in forward
    return F.linear(input, self.weight, self.bias)
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling `cublasCreate(handle)`
Running eval method from llama_v2_7b_16h on cuda in eager mode with input batch size 1 and precision fp16.
Accuracy:              eager_1st_run_fail

Bisection workflow: https://github.com/pytorch/benchmark/actions/runs/6985353191
Root cause commit: 12b2dd16b050e6495910fc564517fbb51dde1f20 (pytorch/pytorch@12b2dd1)

@xuzhao9
Copy link
Contributor Author

xuzhao9 commented Jan 15, 2024

Fixed by upstream.

@xuzhao9 xuzhao9 closed this as completed Jan 15, 2024
facebook-github-bot pushed a commit that referenced this issue Jan 25, 2024
Summary:
This PR partially reverts #2095, since #2064 seems not to be an issue anymore.

Pull Request resolved: #2124

Reviewed By: suez1224

Differential Revision: D53093766

Pulled By: xuzhao9

fbshipit-source-id: 157a01dec22e48b5ee1cb1260070a6d270aec4f8
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants