OpenXLA pin update #6530

yeounoh · 2024-02-13T22:01:32Z

This moves the pin to

strip_prefix = "xla-b166243711f71b0a55daa1eda36b1dc745886784",

and libtpu build to

_libtpu_version = '0.1.dev20240213'

Locally tested, and

>>> import torch_xla.core.xla_model as xm
>>> xm.xla_device()
WARNING:root:PJRT is now the default runtime. For more information, see https://github.com/pytorch/xla/blob/master/docs/pjrt.md
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1707861898.453343   82066 pjrt_api.cc:100] GetPjrtApi was found for tpu at /root/.local/lib/python3.8/site-packages/libtpu/libtpu.so
I0000 00:00:1707861898.453434   82066 pjrt_api.cc:79] PJRT_Api is set for device type tpu
I0000 00:00:1707861898.453443   82066 pjrt_api.cc:146] The PJRT plugin has PJRT API version 0.40. The framework PJRT API version is 0.40.
device(type='xla', index=0)

lsy323 · 2024-02-13T22:14:24Z

Thanks @yeounoh! openxla_patches/stablehlo_quant_seralization.diff is still needed to workaround openxla/stablehlo#1812

yeounoh · 2024-02-13T22:34:00Z

Thanks @yeounoh! openxla_patches/stablehlo_quant_seralization.diff is still needed to workaround openxla/stablehlo#1812

We can drop it, based on our verification

root@0e0864097658:/workspace/pytorch/xla# python test/stablehlo/test_pt2e_qdq.py 
2024-02-13 22:31:03.922291: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:PJRT is now the default runtime. For more information, see https://github.com/pytorch/xla/blob/master/docs/pjrt.md
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1707863465.033336   91383 pjrt_api.cc:100] GetPjrtApi was found for tpu at /workspace/_libtpu.so
I0000 00:00:1707863465.033427   91383 pjrt_api.cc:79] PJRT_Api is set for device type tpu
I0000 00:00:1707863465.033435   91383 pjrt_api.cc:146] The PJRT plugin has PJRT API version 0.40. The framework PJRT API version is 0.40.
..s
----------------------------------------------------------------------
Ran 3 tests in 3.115s

OK (skipped=1)

lsy323 · 2024-02-14T04:23:51Z

Thanks @yeounoh!

lsy323 · 2024-02-14T04:24:53Z

Not sure if we should check the resnet18 speed? Remembered it's verified in previous pin update. Here is script of previous pin update

JackCaoG · 2024-02-14T07:07:47Z

Let's do a quick run of resnet to make sure we don't have any visible regression.

yeounoh · 2024-02-14T18:55:15Z

With default batch size 128

| Training Device=xla:0/0 Epoch=2 Step=1200 Loss=0.00135 Rate=1784.07 GlobalRate=1781.73 Time=18:48:34
| Training Device=xla:0/1 Epoch=2 Step=1200 Loss=0.00135 Rate=1784.08 GlobalRate=1781.74 Time=18:48:34
| Training Device=xla:0/2 Epoch=2 Step=1200 Loss=0.00135 Rate=1783.65 GlobalRate=1781.69 Time=18:48:34

With batch size 256

| Training Device=xla:0/1 Epoch=2 Step=160 Loss=0.00135 Rate=1818.24 GlobalRate=1806.78 Time=18:54:50
| Training Device=xla:0/2 Epoch=2 Step=180 Loss=0.00135 Rate=1818.24 GlobalRate=1807.91 Time=18:54:53
| Training Device=xla:0/3 Epoch=2 Step=180 Loss=0.00135 Rate=1818.23 GlobalRate=1808.03 Time=18:54:53
| Training Device=xla:0/1 Epoch=2 Step=180 Loss=0.00135 Rate=1818.29 GlobalRate=1808.04 Time=18:54:53
| Training Device=xla:0/0 Epoch=2 Step=180 Loss=0.00135 Rate=1817.99 GlobalRate=1807.90 Time=18:54:53

No visible regression on ResNet..

Bump the pinned XLA version to fix GPU builds with CUDA11. Note that there are only 13 commits between the new pin and the previous one: ``` $ git log --oneline b1662437^..419a3d73 419a3d736 [xla] Do not include absl headers into xla/types.h 1a4ec9190 [xla:gpu] Add initialization guard to make sure we have exactly one NCCL clique initialization in progress 1365d31a8 [xla] Fix test compilation for environments without cuda 86e231a58 [xla:gpu] Add support for legacy API custom calls in AddressComputationFusionRewriter 82e775381 Fix broken build for convert_memory_placement_to_internal_annotations_test db973b7fb Integrate LLVM at llvm/llvm-project@bc66e0cf9feb 09c7c0818 Fix gcd simplification of div. 04af47afd PR #9400: Move Gt(Max) optimization after all other HandleCompare optimizations 06c8c19d8 Fix pad indexing map with interior padding. a27177d76 [XLA:GPU] Implement GpuPriorityFusion::Run instead of calling InstructionFusion::Run. 8a5491aa8 Don't require the argument of ReducePrecision to be a tensor. 50b3b8c40 [XLA] Add a way for an HLO runner to run instructions in isolation. e020e2e9b [XLA:GPU] Add coalescing heuristic. b16624371 Add support for unpinned_host for host memory offloading. XLA does not currently differentiate between pinned and unpinned. ``` Fixes #6530.

yeounoh requested review from lsy323 and wonjoolee95 February 13, 2024 22:01

yeounoh self-assigned this Feb 13, 2024

yeounoh requested review from GleasonK and sdasgup3 February 13, 2024 22:22

lsy323 approved these changes Feb 14, 2024

View reviewed changes

yeounoh added 3 commits February 14, 2024 10:56

OpenXLA pin update to b166243711f71b0a55daa1eda36b1dc745886784

63c7e85

Drop stablehlo_quant_seralization.diff gpu_hanging.diff

e45bf15

Update libtpu versin to 0.1.dev02240213

f059eb9

yeounoh force-pushed the openxla_pin_update branch from b65f2de to f059eb9 Compare February 14, 2024 18:56

yeounoh merged commit 27d1f70 into master Feb 14, 2024
2 checks passed

ysiraichi mentioned this pull request Feb 19, 2024

Cannot compile PyTorch/XLA master. #6564

Closed

cota mentioned this pull request Feb 20, 2024

Update OpenXLA pin to fix GPU build #6569

Merged

amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024

OpenXLA pin update (pytorch#6530)

8a242a2

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

OpenXLA pin update (#6530)

3afacad

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenXLA pin update #6530

OpenXLA pin update #6530

yeounoh commented Feb 13, 2024 •

edited

Loading

lsy323 commented Feb 13, 2024

yeounoh commented Feb 13, 2024

lsy323 commented Feb 14, 2024

lsy323 commented Feb 14, 2024 •

edited

Loading

JackCaoG commented Feb 14, 2024

yeounoh commented Feb 14, 2024

OpenXLA pin update #6530

OpenXLA pin update #6530

Conversation

yeounoh commented Feb 13, 2024 • edited Loading

lsy323 commented Feb 13, 2024

yeounoh commented Feb 13, 2024

lsy323 commented Feb 14, 2024

lsy323 commented Feb 14, 2024 • edited Loading

JackCaoG commented Feb 14, 2024

yeounoh commented Feb 14, 2024

yeounoh commented Feb 13, 2024 •

edited

Loading

lsy323 commented Feb 14, 2024 •

edited

Loading