Remove or simplify hardcoded lists of device types #6235

will-cromar · 2023-12-26T20:34:09Z

Allows new device plugins without registering the device type in our code! See #6242 for broader context of this cleanup.

Add placeholder PLUGIN device type for unknown devices
- Store the actual device name (according to PjRtClient::platform_name) in the DeviceType, since we still need to pass the device name string across Python/C++ boundary
- Replace explicit lists of devices with patterns
Deprecate devkind argument, since we only support one PJRT backend at a time
Remove explicit references to device types that do not have any special cases (XPU, ROCM)
Remove deprecated GPU device type
Simplify several test skip conditions that had lists of device types

Tested with CPU plugin in #6253 where I change the platform name to TEST to simulate an unknown device type. test_operations.py passes with that plugin.

…-devkinds

vanbasten23 · 2024-01-09T17:34:15Z

torch_xla/csrc/runtime/ifrt_computation_client.h

@@ -71,6 +71,11 @@ class IfrtComputationClient : public ComputationClient {

  std::string GetDefaultDevice() const override;

+  torch_xla::DeviceType GetDeviceType() const override {
+    return torch_xla::DeviceType(
+        absl::AsciiStrToUpper(client_->platform_name()));


The platform_name will be something like CUDA or TPU?

vanbasten23

LGTM. Thanks for the simplification!

JackCaoG · 2024-01-09T23:56:12Z

torch_xla/core/xla_model.py

@@ -72,7 +72,7 @@ def is_xla_tensor(tensor):


 def parse_xla_device(device):
-  m = re.match(r'(CPU|TPU|GPU|ROCM|CUDA|XPU|NEURON):(\d+)$', device)
+  m = re.match(r'([A-Z]+):(\d+)$', device)


trying to think.. where would it fail if user has a typo on PJRT_DEVICE?

The device name here comes from the PjRtClient::platform_name, which is not guaranteed to match the PJRT_DEVICE name.

If there's a typo in PJRT_DEVICE, the runtime will fail to initialize and throw an error.

JackCaoG · 2024-01-09T23:58:15Z

torch_xla/csrc/device.cpp

+}  // namespace
+
+std::string DeviceType::XlaDeviceTypeToString(XlaDeviceType hw_type) {
+  XLA_CHECK(hw_type != XlaDeviceType::PLUGIN) << "PLUGIN type name unknown";


when will this function be called? Are we not expceting this to be called on plug in?

This function is used when constructing a device from the XlaDeviceType enum type rather than the device name in a string. E.g. here:

xla/torch_xla/csrc/xla_backend_impl.cpp

Lines 183 to 184 in 9a4ef68

default_device_type_ =

std::make_shared<DeviceType>(static_cast<XlaDeviceType>(type));

If all we get is the placeholder XlaDeviceType::PLUGIN type, then we don't know the actual platform_name of the device type for toString. I think this is a relatively rare case in our code, so I can try to remove it. I have a more ambitious refactoring effort going in #6261.

JackCaoG · 2024-01-10T00:00:30Z

torch_xla/csrc/device.h

+        type_name_(type_name) {}
+
+  // TODO(wcromar): do we even want this default constructor?
+  DeviceType() : DeviceType(XlaDeviceType::CPU){};


I felt like we should eventually let PyTorch/XLA to auto detect the device based on some rules(libtpu, cuda version etc). Can't really think of a case we want to default construct a CPU device...

I'm 90% sure this constructor is never called. Logically, the "default" device type is managed by xla_backend_impl.

Actually, we definitely don't need this. Upstream provides a better default constructor: https://github.com/pytorch/pytorch/blob/16d69290c6d037a25e32220b9517597d04dbd0bf/torch/csrc/lazy/backend/backend_device.cpp#L14-L15

I'm just going to delete this now instead of leaving the TODO.

torch_xla/csrc/random.cpp

…-devkinds

will-cromar added 3 commits December 26, 2023 19:00

Remove or simplify some device type lists

2fbe528

remove instances of GPU and ROCM devices

2fa3328

Add PLUGIN device type

f4ec98b

will-cromar added the runtime label Dec 26, 2023

will-cromar added 12 commits January 2, 2024 20:01

fix get_xla_supported_devices

ab627c4

formatting

8c9c6d4

remove old GPU conditions

65f5528

Remove XPU cases

b9e7756

enumerate

2026f4d

GPU -> CUDA

02b7227

Merge branch 'master' of github.com:pytorch/xla into wcromar/simplify…

264e1e2

…-devkinds

Merge branch 'master' of github.com:pytorch/xla into wcromar/simplify…

7809ef7

…-devkinds

Store device type name in torch_xla::DeviceType

a014da9

formatting

6d97878

Add ComputationClient::GetDeviceType

80ad8af

clarify skip conditions

71e6e52

will-cromar changed the title ~~[WIP] Remove or simplify hardcoded lists of device types~~ Remove or simplify hardcoded lists of device types Jan 9, 2024

Merge branch 'master' of github.com:pytorch/xla into wcromar/simplify…

430ab6b

…-devkinds

will-cromar requested review from vanbasten23 and JackCaoG January 9, 2024 00:21

will-cromar marked this pull request as ready for review January 9, 2024 00:21

vanbasten23 reviewed Jan 9, 2024

View reviewed changes

vanbasten23 approved these changes Jan 9, 2024

View reviewed changes

JackCaoG reviewed Jan 9, 2024

View reviewed changes

JackCaoG reviewed Jan 10, 2024

View reviewed changes

torch_xla/csrc/random.cpp Show resolved Hide resolved

JackCaoG approved these changes Jan 10, 2024

View reviewed changes

will-cromar added 2 commits January 10, 2024 00:36

Merge branch 'master' of github.com:pytorch/xla into wcromar/simplify…

02b5d4d

…-devkinds

remove useless constructor

96934e7

will-cromar merged commit 050a240 into master Jan 10, 2024
20 checks passed

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Remove or simplify hardcoded lists of device types (#6235)

deff513

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove or simplify hardcoded lists of device types #6235

Remove or simplify hardcoded lists of device types #6235

will-cromar commented Dec 26, 2023 •

edited

Loading

vanbasten23 Jan 9, 2024

will-cromar Jan 10, 2024

vanbasten23 left a comment

JackCaoG Jan 9, 2024

will-cromar Jan 10, 2024

JackCaoG Jan 9, 2024

will-cromar Jan 10, 2024

JackCaoG Jan 10, 2024

will-cromar Jan 10, 2024

will-cromar Jan 10, 2024

	default_device_type_ =
	std::make_shared<DeviceType>(static_cast<XlaDeviceType>(type));

Remove or simplify hardcoded lists of device types #6235

Remove or simplify hardcoded lists of device types #6235

Conversation

will-cromar commented Dec 26, 2023 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vanbasten23 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

will-cromar commented Dec 26, 2023 •

edited

Loading