Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenXLA pin update #6530

Merged
merged 3 commits into from
Feb 14, 2024
Merged

OpenXLA pin update #6530

merged 3 commits into from
Feb 14, 2024

Conversation

yeounoh
Copy link
Contributor

@yeounoh yeounoh commented Feb 13, 2024

This moves the pin to

  • strip_prefix = "xla-b166243711f71b0a55daa1eda36b1dc745886784",

and libtpu build to

  • _libtpu_version = '0.1.dev20240213'

Locally tested, and

>>> import torch_xla.core.xla_model as xm
>>> xm.xla_device()
WARNING:root:PJRT is now the default runtime. For more information, see https://github.com/pytorch/xla/blob/master/docs/pjrt.md
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1707861898.453343   82066 pjrt_api.cc:100] GetPjrtApi was found for tpu at /root/.local/lib/python3.8/site-packages/libtpu/libtpu.so
I0000 00:00:1707861898.453434   82066 pjrt_api.cc:79] PJRT_Api is set for device type tpu
I0000 00:00:1707861898.453443   82066 pjrt_api.cc:146] The PJRT plugin has PJRT API version 0.40. The framework PJRT API version is 0.40.
device(type='xla', index=0)

@yeounoh yeounoh self-assigned this Feb 13, 2024
@lsy323
Copy link
Collaborator

lsy323 commented Feb 13, 2024

Thanks @yeounoh! openxla_patches/stablehlo_quant_seralization.diff is still needed to workaround openxla/stablehlo#1812

@yeounoh yeounoh requested review from GleasonK and sdasgup3 February 13, 2024 22:22
@yeounoh
Copy link
Contributor Author

yeounoh commented Feb 13, 2024

Thanks @yeounoh! openxla_patches/stablehlo_quant_seralization.diff is still needed to workaround openxla/stablehlo#1812

We can drop it, based on our verification

root@0e0864097658:/workspace/pytorch/xla# python test/stablehlo/test_pt2e_qdq.py 
2024-02-13 22:31:03.922291: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
WARNING:root:PJRT is now the default runtime. For more information, see https://github.com/pytorch/xla/blob/master/docs/pjrt.md
WARNING:root:libtpu.so and TPU device found. Setting PJRT_DEVICE=TPU.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
I0000 00:00:1707863465.033336   91383 pjrt_api.cc:100] GetPjrtApi was found for tpu at /workspace/_libtpu.so
I0000 00:00:1707863465.033427   91383 pjrt_api.cc:79] PJRT_Api is set for device type tpu
I0000 00:00:1707863465.033435   91383 pjrt_api.cc:146] The PJRT plugin has PJRT API version 0.40. The framework PJRT API version is 0.40.
..s
----------------------------------------------------------------------
Ran 3 tests in 3.115s

OK (skipped=1)

@lsy323
Copy link
Collaborator

lsy323 commented Feb 14, 2024

Thanks @yeounoh!

@lsy323
Copy link
Collaborator

lsy323 commented Feb 14, 2024

Not sure if we should check the resnet18 speed? Remembered it's verified in previous pin update. Here is script of previous pin update

@JackCaoG
Copy link
Collaborator

Let's do a quick run of resnet to make sure we don't have any visible regression.

@yeounoh
Copy link
Contributor Author

yeounoh commented Feb 14, 2024

With default batch size 128

| Training Device=xla:0/0 Epoch=2 Step=1200 Loss=0.00135 Rate=1784.07 GlobalRate=1781.73 Time=18:48:34
| Training Device=xla:0/1 Epoch=2 Step=1200 Loss=0.00135 Rate=1784.08 GlobalRate=1781.74 Time=18:48:34
| Training Device=xla:0/2 Epoch=2 Step=1200 Loss=0.00135 Rate=1783.65 GlobalRate=1781.69 Time=18:48:34

With batch size 256

| Training Device=xla:0/1 Epoch=2 Step=160 Loss=0.00135 Rate=1818.24 GlobalRate=1806.78 Time=18:54:50
| Training Device=xla:0/2 Epoch=2 Step=180 Loss=0.00135 Rate=1818.24 GlobalRate=1807.91 Time=18:54:53
| Training Device=xla:0/3 Epoch=2 Step=180 Loss=0.00135 Rate=1818.23 GlobalRate=1808.03 Time=18:54:53
| Training Device=xla:0/1 Epoch=2 Step=180 Loss=0.00135 Rate=1818.29 GlobalRate=1808.04 Time=18:54:53
| Training Device=xla:0/0 Epoch=2 Step=180 Loss=0.00135 Rate=1817.99 GlobalRate=1807.90 Time=18:54:53

No visible regression on ResNet..

@yeounoh yeounoh merged commit 27d1f70 into master Feb 14, 2024
2 checks passed
cota added a commit that referenced this pull request Feb 20, 2024
Bump the pinned XLA version to fix GPU builds with CUDA11.
Note that there are only 13 commits between the new pin and
the previous one:

```
$ git log --oneline b1662437^..419a3d73
419a3d736 [xla] Do not include absl headers into xla/types.h
1a4ec9190 [xla:gpu] Add initialization guard to make sure we have exactly one NCCL clique initialization in progress
1365d31a8 [xla] Fix test compilation for environments without cuda
86e231a58 [xla:gpu] Add support for legacy API custom calls in AddressComputationFusionRewriter
82e775381 Fix broken build for convert_memory_placement_to_internal_annotations_test
db973b7fb Integrate LLVM at llvm/llvm-project@bc66e0cf9feb
09c7c0818 Fix gcd simplification of div.
04af47afd PR #9400: Move Gt(Max) optimization after all other HandleCompare optimizations
06c8c19d8 Fix pad indexing map with interior padding.
a27177d76 [XLA:GPU] Implement GpuPriorityFusion::Run instead of calling InstructionFusion::Run.
8a5491aa8 Don't require the argument of ReducePrecision to be a tensor.
50b3b8c40 [XLA] Add a way for an HLO runner to run instructions in isolation.
e020e2e9b [XLA:GPU] Add coalescing heuristic.
b16624371 Add support for unpinned_host for host memory offloading. XLA does not currently differentiate between pinned and unpinned.
```

Fixes #6530.
amithrm pushed a commit to amithrm/xla that referenced this pull request Mar 1, 2024
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants