Updated the examples folder readme file #7208

sitamgithub-MSIT · 2024-06-06T18:39:55Z

This PR addresses the update of the readme file on the examples folder as requested here in this issue.

The following describes the Git difference for the changed files:

Changes:

diff --git a/examples/README.md b/examples/README.md
index 1ad0018c..04cc05bf 100644
--- a/examples/README.md
+++ b/examples/README.md
@@ -1,17 +1,42 @@
 ## Overview
-This repo aims to provide some basic examples of how to run an existing pytorch model with PyTorch/XLA. `train_resnet_base.py` is a minimal trainer to run ResNet50 with fake data on a single device. `train_decoder_only_base.py` is similar to `train_resnet_base.py` but with a decoder only model.
+This repo aims to provide some basic examples of how to run an existing PyTorch model with PyTorch/XLA. `train_resnet_base.py` is a minimal trainer to run ResNet50 with fake data on a single device. `train_decoder_only_base.py` is similar to `train_resnet_base.py` but with a decoder-only model.
 
-Other examples will import the `train_resnet_base` or `train_decoder_only_base` and demonstrate how to enable different features(distributed training, profiling, dynamo etc) on PyTorch/XLA.The objective of this repository is to offer fundamental examples of executing an existing PyTorch model utilizing PyTorch/XLA.
+Other examples will import the `train_resnet_base` or `train_decoder_only_base` and demonstrate how to enable different features (distributed training, profiling, dynamo, etc.) on PyTorch/XLA. The objective of this repository is to offer fundamental examples of executing an existing PyTorch model utilizing PyTorch/XLA.
 
 ## Setup
-Follow our [README](https://github.com/pytorch/xla#getting-started) to install latest release of torch_xla. Check out this [link](https://github.com/pytorch/xla#python-packages) for torch_xla at other versions. To install the nightly torchvision(required for the resnet) you can do
+Follow our [README](https://github.com/pytorch/xla#getting-started) to install the latest release of torch_xla. Check out this [link](https://github.com/pytorch/xla#python-packages) for torch_xla at other versions. To install the nightly torchvision(required for the resnet) you can do
 
 ```shell
 pip install --no-deps --pre torchvision -i https://download.pytorch.org/whl/nightly/cu118

Run the example

-You can run all models directly. Only environment you want to set is PJRT_DEVICE.
+You can run all models directly. The only environment you want to set is PJRT_DEVICE.

PJRT_DEVICE=TPU python fsdp/train_decoder_only_fsdp_v2.py

+## Examples and Description
+- train_resnet_base.py: A minimal example of training ResNet50. This is the baseline example for comparing performance with other training
strategies.
+- train_decoder_only_base.py: A minimal example of training a decoder-only model. This serves as a baseline for comparison with other train
ing strategies.
+- train_resnet_amp.py: Shows how to use Automatic Mixed Precision (AMP) with PyTorch/XLA to improve performance. This example demonstrates
the benefits of AMP for reducing memory usage and accelerating training.
+
+- data_parallel: A trainer implementation to run ResNet50 on multiple devices using data-parallel.
+

- train_resnet_ddp.py: Shows how to use PyTorch's DDP implementation for distributed training on TPUs. This example showcases how to inte
  grate PyTorch's DDP with PyTorch/XLA for distributed training.
- train_resnet_spmd_data_parallel.py: Leverages SPMD (Single Program Multiple Data) for distributed training. It shards the batch dimensi
  on across multiple devices and demonstrates how to achieve higher performance than DDP for specific workloads.
- train_resnet_xla_ddp.py: Shows how to use PyTorch/XLA's built-in DDP implementation for distributed training on TPUs. It demonstrates
  the benefits of distributed training and the simplicity of using PyTorch/XLA's DDP.

+- debug: A trainer implementation to run ResNet50 with debug mode.
+

- train_resnet_profile.py: Captures performance insights with PyTorch/XLA's profiler to identify bottlenecks. Helps diagnose and optimize
  model performance.
- train_resnet_benchmark.py: Provides a simple way to benchmark PyTorch/XLA, measuring device execution and tracing time for overall effi
  ciency analysis.

+- flash_attention: A trainer implementation to run a decoder-only model using Flash Attention.
+

- train_decoder_only_flash_attention.py: Incorporates flash attention, an efficient attention mechanism, utilizing custom kernels for acc
  elerated training.
- train_resnet_flash_attention_fsdp_v2.py: Combines flash attention with FSDP, showcasing the integration of custom kernels with FSDP for
  scalable and efficient model training.

+- fsdp: A trainer implementation to run a decoder-only model using FSDP (Fully Sharded Data Parallelism).

- train_decoder_only_fsdp_v2.py: Employs FSDP for training the decoder-only model, demonstrating parallel training of large transformer m
  odels on TPUs.
- train_resnet_fsdp_auto_wrap.py: Demonstrates FSDP (Fully Sharded Data Parallel) for model training, automatically wrapping model parts
  based on size or type criteria.
  (END)

duncantech · 2024-06-10T16:54:24Z

examples/README.md


 ```shell
 pip install --no-deps --pre torchvision -i https://download.pytorch.org/whl/nightly/cu118
 ```

 ## Run the example
-You can run all models directly. Only environment you want to set is `PJRT_DEVICE`.
+You can run all models directly. The only environment you want to set is `PJRT_DEVICE`.


Add "variable" after environment for specificity.

Done! Updated in the latest commit.

JackCaoG · 2024-06-13T17:28:48Z

examples/README.md

+  - `train_resnet_flash_attention_fsdp_v2.py`: Combines flash attention with FSDP, showcasing the integration of custom kernels with FSDP for scalable and efficient model training.
+
+- fsdp: A trainer implementation to run a decoder-only model using FSDP (Fully Sharded Data Parallelism).
+  - `train_decoder_only_fsdp_v2.py`: Employs FSDP for training the decoder-only model, demonstrating parallel training of large transformer models on TPUs.


Employs FSDP -> Employs FSDPv2(FSDP algorithm implemented with PyTorch/XLA GSPMD)

JackCaoG · 2024-06-13T17:29:07Z

examples/README.md

+- flash_attention: A trainer implementation to run a decoder-only model using Flash Attention.
+
+  - `train_decoder_only_flash_attention.py`: Incorporates flash attention, an efficient attention mechanism, utilizing custom kernels for accelerated training.
+  - `train_resnet_flash_attention_fsdp_v2.py`: Combines flash attention with FSDP, showcasing the integration of custom kernels with FSDP for scalable and efficient model training.


with FSDP -> with FSDPv2

Done in latest commit

Updated the examples folder readme file

5646940

duncantech reviewed Jun 10, 2024

View reviewed changes

review comment addressed

0520384

JackCaoG reviewed Jun 13, 2024

View reviewed changes

changed FSDP to FSDPv2

fbe44c2

JackCaoG added docathon-h1-2024 medium labels Jun 14, 2024

JackCaoG approved these changes Jun 14, 2024

View reviewed changes

JackCaoG merged commit ca72f1a into pytorch:master Jun 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updated the examples folder readme file #7208

Updated the examples folder readme file #7208

sitamgithub-MSIT commented Jun 6, 2024

duncantech Jun 10, 2024

sitamgithub-MSIT Jun 13, 2024

JackCaoG Jun 13, 2024

sitamgithub-MSIT Jun 14, 2024

JackCaoG Jun 13, 2024

sitamgithub-MSIT Jun 14, 2024

Updated the examples folder readme file #7208

Updated the examples folder readme file #7208

Conversation

sitamgithub-MSIT commented Jun 6, 2024

Run the example

duncantech Jun 10, 2024

Choose a reason for hiding this comment

sitamgithub-MSIT Jun 13, 2024

Choose a reason for hiding this comment

JackCaoG Jun 13, 2024

Choose a reason for hiding this comment

sitamgithub-MSIT Jun 14, 2024

Choose a reason for hiding this comment

JackCaoG Jun 13, 2024

Choose a reason for hiding this comment

sitamgithub-MSIT Jun 14, 2024

Choose a reason for hiding this comment