Make CUDA OpenXLA fallback the default. #7630

ysiraichi · 2024-07-03T23:09:09Z

Partially fix: #7342

This PR changes the default device for running OpenXLA fallback operation from CPU to CUDA. So, instead of specifying XLA_FALLBACK_CUDA=1, in order to run fallback operations on CUDA the user must make sure that the following is true:

XLA_FALLBACK_CPU (newly introduced) is not set
The DeviceType of the current ComputationClient is CUDA
PyTorch was compiled with CUDA support

I have also changed test_fallback function so that it won't (un)set XLA_FALLBACK_CPU flag. Instead, it just runs the fallback operation. But, from this PR onwards, PyTorch/XLA should be able to detect when it can't do CUDA OpenXLA fallback automatically.

That said, one can force CPU fallback execution with the newly introduced XLA_FALLBACK_CPU environment variable.

cc @miladm @vanbasten23 @JackCaoG

miladm

@ysiraichi

please add XLA_FALLBACK_CPU to https://github.com/pytorch/xla/blob/master/configuration.yaml and explain its current/latest function
have you tested this PR on your full torchbench runs to verify functionality?

approving to unblock - let's confirm the above items please.

miladm · 2024-07-08T17:52:35Z

torch_xla/csrc/aten_cuda_functions.cpp

@@ -11,6 +11,8 @@ static void fail(const char* name) {

 namespace c10::cuda {

+DeviceIndex device_count() noexcept { return 0; }


IIRC, we unit test this function in python layer - do we need a unit test for the cpp layer?

Do we? Can you point me to the source location? These functions are supposed to be implemented by PyTorch when it is compiled with CUDA support. Otherwise, this is an implementation for cases when PyTorch is compiled without CUDA support. This needs to be supplied since we would get an undefined reference, otherwise.

miladm · 2024-07-08T17:52:49Z

torch_xla/csrc/device.cpp

@@ -55,6 +55,10 @@ std::string DeviceType::toString() const {
  return absl::StrCat(type_name_, ":");
 }

+XlaDeviceType DeviceType::getType() const {


Do we need a unit test for this function? It only casts an integer into an enum.

miladm · 2024-07-08T17:54:49Z

cc @zpcore

JackCaoG · 2024-07-08T17:59:45Z

torch_xla/csrc/aten_cpu_fallback.cpp

@@ -51,8 +52,33 @@ std::vector<std::string> GetFallbackOperations() {
 // Before each modified function below, we shall specify what has changed,
 // if there was any.



nit, we can fix in a follow up. We should rename this file to aten_fallback.cpp

zpcore · 2024-07-08T18:12:32Z

cc @zpcore

Thanks, looks like the fallback is enabled by default. We should see the performance update tomorrow.

ysiraichi · 2024-07-08T21:14:59Z

As @miladm pointed out, I will still run benchmarks with this PR. So, this might get landed only tomorrow.

ysiraichi · 2024-07-09T13:36:06Z

I confirmed this PR doesn't introduce any new regressions.

@miladm I will merge this PR, and add the XLA_FALLBACK_CPU description with the file renaming (suggested by @JackCaoG) in a follow-up PR.

ysiraichi · 2024-07-09T17:49:05Z

Actually, I think I will wait for #7647. It looks like a relevant issue.

vanbasten23 · 2024-07-10T00:06:58Z

torch_xla/csrc/aten_cpu_fallback.cpp

-bool UseCUDAFallback() {
-  return runtime::sys_util::GetEnvBool("XLA_FALLBACK_CUDA", false);
+// Decide whether to run OpenXLA fallback operations on CUDA.
+bool UseCUDAFallback(const c10::OperatorHandle& op) {


Since we have another type of cuda fallback, how about the name "OpenXLAFallbackOnCUDA"

vanbasten23 · 2024-07-10T00:08:37Z

torch_xla/csrc/aten_cuda_functions.cpp

@@ -11,6 +11,8 @@ static void fail(const char* name) {

 namespace c10::cuda {

+DeviceIndex device_count() noexcept { return 0; }


It returns 0 because it's a phony implementation?
Also, could you add a comment in this file describing it's phony implementation and what is the purpose of this file?

vanbasten23

LGTM with minor comments

ysiraichi · 2024-07-23T15:15:11Z

torch_xla/csrc/aten_cpu_fallback.cpp

+// List of operations that should be fallbacked to CPU instead of GPU.
+static std::unordered_set<std::string> _force_fallback_on_cpu{
+    // This operation is a simple memory access that transforms the given
+    // 1-element tensor into a Scalar.
+    //
+    // Although it makes sense to run this operation on CPU (since the
+    // output will get copied back to CPU anyway), this also fixes a
+    // particular issue with moco benchmark.
+    // More details: https://github.com/pytorch/xla/issues/7647
+    "aten::_local_scalar_dense",
+};


Just to be completely transparent: aten::_local_scalar_dense is here for 2 reasons:

It just makes sense to run it on CPU, since the output (a Scalar) also lives on CPU

As a temporary fix to [torchbench] moco fails to run with CUDA OpenXLA fallback. #7647

ysiraichi · 2024-07-23T17:03:40Z

@miladm @JackCaoG @vanbasten23 @zpcore In case you want to take another look at this PR, I added the following changes:

Check whether an operation has a CUDA kernel, before actually fallbacking to it
Have a set with operations that should fallback on CPU
- Temporary work-around [torchbench] moco fails to run with CUDA OpenXLA fallback. #7647

I will merge this one tomorrow.

ysiraichi added the xla:gpu label Jul 3, 2024

ysiraichi requested review from miladm, vanbasten23 and JackCaoG July 4, 2024 18:25

This was referenced Jul 4, 2024

OpenXLA CUDA fallback: Tracking Issue #7342

Open

Failing Torchbench Models: tracking issue #5932

Open

miladm approved these changes Jul 8, 2024

View reviewed changes

JackCaoG reviewed Jul 8, 2024

View reviewed changes

JackCaoG approved these changes Jul 8, 2024

View reviewed changes

ysiraichi mentioned this pull request Jul 9, 2024

Rename aten_cpu_fallback into aten_fallback. #7646

Merged

vanbasten23 reviewed Jul 10, 2024

View reviewed changes

vanbasten23 approved these changes Jul 10, 2024

View reviewed changes

ysiraichi added 4 commits July 23, 2024 10:27

Add getter of XlaDeviceType enum, given DeviceType.

41c957b

Introduce device_count CUDA function.

9740535

Remove XLA_FALLBACK_CUDA environment variable.

4fcfed8

Make CUDA OpenXLA fallback the default.

34a3ac1

ysiraichi force-pushed the ysiraichi/make-cuda-fallback-default branch from 462a877 to 34a3ac1 Compare July 23, 2024 13:27

ysiraichi added 3 commits July 23, 2024 10:35

Check for CUDA kernel.

bc6e77a

Add comments.

0510afc

Introduce a set of CPU fallback operations.

fa69508

ysiraichi commented Jul 23, 2024

View reviewed changes

ysiraichi merged commit 806de83 into master Jul 24, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make CUDA OpenXLA fallback the default. #7630

Make CUDA OpenXLA fallback the default. #7630

ysiraichi commented Jul 3, 2024

miladm left a comment •

edited

Loading

miladm Jul 8, 2024

ysiraichi Jul 8, 2024

miladm Jul 8, 2024

ysiraichi Jul 8, 2024 •

edited

Loading

miladm commented Jul 8, 2024

JackCaoG Jul 8, 2024

zpcore commented Jul 8, 2024

ysiraichi commented Jul 8, 2024

ysiraichi commented Jul 9, 2024

ysiraichi commented Jul 9, 2024

vanbasten23 Jul 10, 2024

vanbasten23 Jul 10, 2024

vanbasten23 left a comment

ysiraichi Jul 23, 2024

ysiraichi commented Jul 23, 2024

		@@ -11,6 +11,8 @@ static void fail(const char* name) {

		namespace c10::cuda {

		DeviceIndex device_count() noexcept { return 0; }

		@@ -51,8 +52,33 @@ std::vector<std::string> GetFallbackOperations() {
		// Before each modified function below, we shall specify what has changed,
		// if there was any.

Make CUDA OpenXLA fallback the default. #7630

Make CUDA OpenXLA fallback the default. #7630

Conversation

ysiraichi commented Jul 3, 2024

miladm left a comment • edited Loading

Choose a reason for hiding this comment

miladm Jul 8, 2024

Choose a reason for hiding this comment

ysiraichi Jul 8, 2024

Choose a reason for hiding this comment

miladm Jul 8, 2024

Choose a reason for hiding this comment

ysiraichi Jul 8, 2024 • edited Loading

Choose a reason for hiding this comment

miladm commented Jul 8, 2024

JackCaoG Jul 8, 2024

Choose a reason for hiding this comment

zpcore commented Jul 8, 2024

ysiraichi commented Jul 8, 2024

ysiraichi commented Jul 9, 2024

ysiraichi commented Jul 9, 2024

vanbasten23 Jul 10, 2024

Choose a reason for hiding this comment

vanbasten23 Jul 10, 2024

Choose a reason for hiding this comment

vanbasten23 left a comment

Choose a reason for hiding this comment

ysiraichi Jul 23, 2024

Choose a reason for hiding this comment

ysiraichi commented Jul 23, 2024

miladm left a comment •

edited

Loading

ysiraichi Jul 8, 2024 •

edited

Loading