Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix XLA tensor storage device by using XlaDeviceToAtenDevice. #5743

Merged
merged 2 commits into from
Oct 31, 2023

Conversation

ysiraichi
Copy link
Collaborator

Discussion: pytorch/pytorch#111506

This PR replaces backendDeviceToAtenDevice by XlaDeviceToAtenDevice when instantiating a new storage in the newly created XLA tensor.

Problem: there are 2 problems:

  1. backendDeviceToAtenDevice doesn't take into consideration the device type
  2. XLA uses XlaDeviceType enum as the device type, while eager uses c10::DeviceType

Solution: use XlaDeviceToAtenDevice function, since we know that the input is an XLA device anyway.

@ysiraichi
Copy link
Collaborator Author

@JackCaoG The failing CI doesn't seem to be related to this PR. Can we re-run it?

@JackCaoG
Copy link
Collaborator

Let me also give you write access so you can rerun CI too

Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder have you experienced any similar bugs on python level?

This is originally a hack see: https://github.com/pytorch/xla/blob/master/torch_xla/csrc/tensor.h#L334

We don't expect people to actually use storage_.

@ysiraichi
Copy link
Collaborator Author

As of now, we do hit that bug when running this example. That said, there are 2 things I'd like to clarify:

  • We did decide to solve that bug, specifically, in some other way
  • I still think we should be consistent with aten devices, here
    • e.g. XLATensorImpl also calls bridge::XlaDeviceToAtenDevice for constructing itself

Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@alanwaketan
Copy link
Collaborator

As of now, we do hit that bug when running this example. That said, there are 2 things I'd like to clarify:

  • We did decide to solve that bug, specifically, in some other way

  • I still think we should be consistent with aten devices, here

    • e.g. XLATensorImpl also calls bridge::XlaDeviceToAtenDevice for constructing itself

Got it. Hopefully we can remove this workaround on fake storage soon.

@ysiraichi ysiraichi merged commit 944136f into pytorch:master Oct 31, 2023
17 checks passed
mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Nov 16, 2023
…rch#5743)

* Add test.

* Convert storage device using `XlaDeviceToAtenDevice`.
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* Add test.

* Convert storage device using `XlaDeviceToAtenDevice`.
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* Add test.

* Convert storage device using `XlaDeviceToAtenDevice`.
@lezcano lezcano added the xla:gpu label Dec 1, 2023
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
…rch#5743)

* Add test.

* Convert storage device using `XlaDeviceToAtenDevice`.
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
* Add test.

* Convert storage device using `XlaDeviceToAtenDevice`.
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* Add test.

* Convert storage device using `XlaDeviceToAtenDevice`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants