Add IsVirtualDeivce() and refactor #6726

yeounoh · 2024-03-12T20:51:53Z

This addresses the following to support #6322

add IsVirtualDeivce()
Un-const tensors argument to XLAGraphExecutor::Compile method, since they can be updated during auto-sharding paass.
refactor to remove redundant codes

JackCaoG · 2024-03-12T23:06:03Z

torch_xla/csrc/ops/device_data.h

-    data_ = data;
+    data_.reset(data.get());


what's the difference between these two?

Better memory management, this is refactoring to reuse the existing data_ to hold the new data.

JackCaoG · 2024-03-12T23:09:21Z

torch_xla/csrc/xla_graph_executor.h

@@ -346,7 +346,7 @@ class XLAGraphExecutor : public torch::lazy::LazyGraphExecutor {
  std::vector<size_t> SetBufferDonors(LoweringContext* lowering_ctx);

  // We don't use upstream Compile to have BuildInputOutputAliases.
-  CompilationResult Compile(const std::vector<XLATensorPtr>& tensors,
+  CompilationResult Compile(std::vector<XLATensorPtr>& tensors,


I can see how this will become problematic for the dynamo, since in dynamo we will first dry run the compilation and don't execute the compiled program. Please leave a TODO somewhere for the dynamo integration with auto-sharding.

Makes sense I will also add a section in the design doc.

// We don't use upstream Compile to have BuildInputOutputAliases. // TODO(yeounoh) auto-sharding can change tensors shardings, which needs to be // accounted for in Dynamo integration. CompilationResult Compile(std::vector<XLATensorPtr>& tensors, absl::Span<const std::string> devices, const SyncTensorCollection& coll, PostOrderData* po_data, const std::vector<torch::lazy::Value>& ir_values);

JackCaoG · 2024-03-12T23:09:57Z

torch_xla/csrc/init_python_bindings.cpp

-              sharding_specs.push_back("");
-            }
+            sharding_specs.push_back(
+                GetXLAShardingSpec(bridge::GetXlaTensor(tensor)));


what will GetXLAShardingSpec return when sharding_spec is null?

"" empty string as expected.

yeounoh requested a review from JackCaoG March 12, 2024 20:51

yeounoh self-assigned this Mar 12, 2024

yeounoh added SPMD / Distributed backport_2.3 labels Mar 12, 2024

JackCaoG reviewed Mar 12, 2024

View reviewed changes

yeounoh force-pushed the add_virtual_deivce_utils branch from 11cc726 to 381077a Compare March 13, 2024 00:30

JackCaoG approved these changes Mar 13, 2024

View reviewed changes

yeounoh mentioned this pull request Mar 13, 2024

2.3 backport PR request list #6676

Closed

Add IsVirtualDeivce() and refactor to remove redundant codes

b21f260

yeounoh force-pushed the add_virtual_deivce_utils branch from 381077a to b21f260 Compare March 13, 2024 05:21

yeounoh merged commit 7d89913 into master Mar 13, 2024
2 checks passed

This was referenced Mar 13, 2024

[r2.3] Add IsVirtualDeivce() and refactor to remove redundant codes #6732

Closed

[r2.3] Backport #6726 #6733

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add IsVirtualDeivce() and refactor #6726

Add IsVirtualDeivce() and refactor #6726

yeounoh commented Mar 12, 2024 •

edited

Loading

JackCaoG Mar 12, 2024

yeounoh Mar 13, 2024

JackCaoG Mar 12, 2024

yeounoh Mar 13, 2024

yeounoh Mar 13, 2024

JackCaoG Mar 12, 2024

yeounoh Mar 13, 2024

Add IsVirtualDeivce() and refactor #6726

Add IsVirtualDeivce() and refactor #6726

Conversation

yeounoh commented Mar 12, 2024 • edited Loading

JackCaoG Mar 12, 2024

Choose a reason for hiding this comment

yeounoh Mar 13, 2024

Choose a reason for hiding this comment

JackCaoG Mar 12, 2024

Choose a reason for hiding this comment

yeounoh Mar 13, 2024

Choose a reason for hiding this comment

yeounoh Mar 13, 2024

Choose a reason for hiding this comment

JackCaoG Mar 12, 2024

Choose a reason for hiding this comment

yeounoh Mar 13, 2024

Choose a reason for hiding this comment

yeounoh commented Mar 12, 2024 •

edited

Loading