Modifying PjRt device at runtime doesn't work. #5942

ysiraichi · 2023-11-29T21:09:02Z

🐛 Bug

Not sure this is a bug or intended behavior. But, once the PjRt client is initialized with a device, apparently we can't change it. If we try to do it, PyTorch/XLA won't complain, and execute everything in the initialized device.

for dev in ("CPU", "CUDA", "CPU"):
    os.environ["PJRT_DEVICE"] = dev
    print(f"Supported devices for {dev}:", xm.get_xla_supported_devices(devkind=dev))
    device = xm.xla_device()
    a = torch.rand(5, 5, device=device)
    r = a @ a
    xm.mark_step()

Supported devices for CPU: ['xla:0']  # executed on CPU:0
Supported devices for CUDA: None      # executed on CPU:0 (no error or warning)
Supported devices for CPU: ['xla:0']  # executed on CPU:0

Expected behavior

Issue a warning (or, even better, an error). Or be able to change devices at runtime.

Environment

PyTorch/XLA: 402166b

Additional context

This came up when trying to use the recently upstreamed benchmark. The function is_xla_device_available (inside benchmarks/util.py) is called for each enabled accelerator.

The text was updated successfully, but these errors were encountered:

ysiraichi · 2023-11-29T21:09:35Z

cc @JackCaoG @miladm

JackCaoG · 2023-11-29T21:24:06Z

I think this is intended, we don't really expect user to switch PJRT device after program is inited. In fact, we don't expect user to change most(if not all) env var after program started, since most of them are set as static variable in C++.

ysiraichi · 2023-11-29T21:28:57Z

Then, maybe it would be better to error out, wouldn't it?

JackCaoG · 2023-11-29T21:31:14Z

hmm it means we need to check PJRT_DEVICE every time when we call some variant of the xla_device.. From this perspective using env var is annoying.. @will-cromar any thoughts?

ysiraichi · 2023-11-29T21:35:01Z

Ouch. What about having an API for initializing the PjRt client with a given PjRt device? It would make "doing the wrong thing" (i.e. changing the device) very hard.

will-cromar · 2023-11-29T22:17:12Z

Ouch. What about having an API for initializing the PjRt client with a given PjRt device? It would make "doing the wrong thing" (i.e. changing the device) very hard.

I like that idea. We can add a warning/error to runtime.set_device_type if the runtime is already initialized and discourage using the environment variables directly within code.

ysiraichi added the xla:gpu label Nov 29, 2023

will-cromar mentioned this issue Nov 30, 2023

Error when changing PJRT_DEVICE after runtime initialized #5948

Merged

lezcano mentioned this issue Dec 1, 2023

Failing Torchbench Models: tracking issue #5932

Open

ysiraichi closed this as completed Dec 11, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying PjRt device at runtime doesn't work. #5942

Modifying PjRt device at runtime doesn't work. #5942

ysiraichi commented Nov 29, 2023

ysiraichi commented Nov 29, 2023 •

edited

Loading

JackCaoG commented Nov 29, 2023

ysiraichi commented Nov 29, 2023

JackCaoG commented Nov 29, 2023

ysiraichi commented Nov 29, 2023

will-cromar commented Nov 29, 2023

Modifying PjRt device at runtime doesn't work. #5942

Modifying PjRt device at runtime doesn't work. #5942

Comments

ysiraichi commented Nov 29, 2023

🐛 Bug

Expected behavior

Environment

Additional context

ysiraichi commented Nov 29, 2023 • edited Loading

JackCaoG commented Nov 29, 2023

ysiraichi commented Nov 29, 2023

JackCaoG commented Nov 29, 2023

ysiraichi commented Nov 29, 2023

will-cromar commented Nov 29, 2023

ysiraichi commented Nov 29, 2023 •

edited

Loading