Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tpu ci module refactor #7

Merged
merged 286 commits into from
Nov 16, 2023
Merged

tpu ci module refactor #7

merged 286 commits into from
Nov 16, 2023

Conversation

mbzomowski
Copy link
Collaborator

ysiraichi and others added 29 commits November 16, 2023 21:47
…h#5751)

* fix squeeze op lowering issue when dim is not in sorted order

* remove debug info

* remove debug info

* refactor BuildSqueezedDimensions
…ytorch#5777)

* Move pure dtype conversion functions to `dtype.cpp`

* remove comments

* better names

* fix includes

* formatting

* consolidate

* fix test build

* more explicit names

* remove extra line
…Module (pytorch#5745)

Co-authored-by: Siyuan Liu <lsiyuan@google.coim>
* delete nccl_distributed

* remove async_task

* remove unique

* Remove hashing

* more random cleanup

* formatting

* remove util.cc

* Revert "remove unique"

This reverts commit ebe4567.

* Use upstream Unique
* Make the pjrt gpu allocator configurable

* the default value changed from 0.9 to 0.75

* return default GpuAllocatorConfig

---------

Co-authored-by: wangang.wa <wangang.wa@alibaba-inc.com>
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
* Transfer data directly to the device (pytorch#5752)

* Remove `populate_fn` from `TensorSource`

* Make TensorSource an interface

* Re-enable pjrt_computation_client_test

* server -> device

* add comment

* fix outbound data metric

* formatting

* implement byte_strides in TensorSource

* more formatting

* remove extra deps

* add missing deps

* Revert "server -> device"

This reverts commit 6384516.

* Use `at::Tensor`'s layout for byte strides

* Downcast at::Tensor if required

* formatting

* Simplify AtenSource

* fix build

* formatting

* fix typo that makes us ignore input type

* Revert "Simplify AtenSource"

This reverts commit 4225deb.

* Skip hanging test

* fix gil deadlock

* formatting
* lower full

* update test for full op

* formatting
…er (pytorch#5770)

* Add GKE support and various usability improvements in CheckpointManager

* Bug fix for async checkpointing fully sharded state dicts
* Record the lazy tracing time(C++) in metrics

* Delete torch_patches/.torch_pin
* port sandeep unbounded dynamism change
* Enable unbounded dynamism using env var, add more guards for unbounded dynamism code path

---------

Co-authored-by: Siyuan Liu <lsiyuan@google.coim>
* Use TSL threadpool

* remove multiwait

* fix test build

* Move threadpool namespace

* formatting

* fix test build

* Use BlockingCounter
@mbzomowski mbzomowski merged commit 26b52c3 into master Nov 16, 2023
1 of 2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.