Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

test_trace_and_metrics fail if PyTorch has CUDA support. #6292

Closed
ysiraichi opened this issue Jan 11, 2024 · 1 comment · Fixed by #6302
Closed

test_trace_and_metrics fail if PyTorch has CUDA support. #6292

ysiraichi opened this issue Jan 11, 2024 · 1 comment · Fixed by #6302
Labels

Comments

@ysiraichi
Copy link
Collaborator

ysiraichi commented Jan 11, 2024

🐛 Bug

PJRT_DEVICE=CUDA python test/test_profiler.py

The command above fails with:

2024-01-11 02:59:59.706960: I torch_xla/csrc/runtime/pjrt_computation_client.cc:167] Initializing PjRt GPU client...
2024-01-11 02:59:59.707055: I torch_xla/csrc/runtime/pjrt_computation_client.cc:200] Getting StreamExecutorGpuClient for node_id=0, num_nodes=1
2024-01-11 02:59:59.730538: E external/xla/xla/stream_executor/cuda/cuda_driver.cc:282] failed call to cuInit: CUDA_ERROR_NOT_INITIALIZED: initialization error
2024-01-11 02:59:59.730584: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:137] retrieving CUDA diagnostic information for host: qgpu3
2024-01-11 02:59:59.730600: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:144] hostname: qgpu3
2024-01-11 02:59:59.730688: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:168] libcuda reported version is: 530.30.2
2024-01-11 02:59:59.730731: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:172] kernel reported version is: 530.30.2
2024-01-11 02:59:59.730745: I external/xla/xla/stream_executor/cuda/cuda_diagnostics.cc:253] kernel version seems to match DSO: 530.30.2
Process Process-33:
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 315, in _bootstrap
    self.run()
  File "/usr/local/lib/python3.8/multiprocessing/process.py", line 108, in run
    self._target(*self._args, **self._kwargs)
  File "test/test_profiler.py", line 74, in train_worker
    test_profile_mp_mnist.train_mnist(
  File "xla/test/test_profile_mp_mnist.py", line 102, in train_mnist
    sample_count=600000 // flags.batch_size // xm.xrt_world_size())
  File "xla/torch_xla/core/xla_model.py", line 127, in xrt_world_size
    return runtime.world_size()
  File "xla/torch_xla/runtime.py", line 87, in wrapper
    return fn(*args, **kwargs)
  File "xla/torch_xla/runtime.py", line 149, in world_size
    if torch_xla._XLAC._xla_get_replication_devices_count() == 0:
RuntimeError: Bad StatusOr access: FAILED_PRECONDITION: No visible GPU devices.

Environment

Additional context

This feels similar to the issue solved by #5960 in the benchmarking scripts. Basically, we are initializing CUDA, and then forking with multiprocess.

cc @miladm @JackCaoG

@ysiraichi
Copy link
Collaborator Author

One way around this issue could be to use subprocess library instead (same solution as #5960). What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant