-
Notifications
You must be signed in to change notification settings - Fork 493
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updated the examples folder readme file #7208
Conversation
examples/README.md
Outdated
|
||
```shell | ||
pip install --no-deps --pre torchvision -i https://download.pytorch.org/whl/nightly/cu118 | ||
``` | ||
|
||
## Run the example | ||
You can run all models directly. Only environment you want to set is `PJRT_DEVICE`. | ||
You can run all models directly. The only environment you want to set is `PJRT_DEVICE`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add "variable" after environment for specificity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Updated in the latest commit.
examples/README.md
Outdated
- `train_resnet_flash_attention_fsdp_v2.py`: Combines flash attention with FSDP, showcasing the integration of custom kernels with FSDP for scalable and efficient model training. | ||
|
||
- fsdp: A trainer implementation to run a decoder-only model using FSDP (Fully Sharded Data Parallelism). | ||
- `train_decoder_only_fsdp_v2.py`: Employs FSDP for training the decoder-only model, demonstrating parallel training of large transformer models on TPUs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Employs FSDP
-> Employs FSDPv2(FSDP algorithm implemented with PyTorch/XLA GSPMD)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Completed
examples/README.md
Outdated
- flash_attention: A trainer implementation to run a decoder-only model using Flash Attention. | ||
|
||
- `train_decoder_only_flash_attention.py`: Incorporates flash attention, an efficient attention mechanism, utilizing custom kernels for accelerated training. | ||
- `train_resnet_flash_attention_fsdp_v2.py`: Combines flash attention with FSDP, showcasing the integration of custom kernels with FSDP for scalable and efficient model training. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with FSDP
-> with FSDPv2
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in latest commit
This PR addresses the update of the readme file on the examples folder as requested here in this issue.
cc: @duncantech @JackCaoG
The following describes the Git difference for the changed files:
Changes:
Run the example
-You can run all models directly. Only environment you want to set is
PJRT_DEVICE
.+You can run all models directly. The only environment you want to set is
PJRT_DEVICE
.+## Examples and Description
+-
train_resnet_base.py
: A minimal example of training ResNet50. This is the baseline example for comparing performance with other trainingstrategies.
+-
train_decoder_only_base.py
: A minimal example of training a decoder-only model. This serves as a baseline for comparison with other training strategies.
+-
train_resnet_amp.py
: Shows how to use Automatic Mixed Precision (AMP) with PyTorch/XLA to improve performance. This example demonstratesthe benefits of AMP for reducing memory usage and accelerating training.
+
+- data_parallel: A trainer implementation to run ResNet50 on multiple devices using data-parallel.
+
train_resnet_ddp.py
: Shows how to use PyTorch's DDP implementation for distributed training on TPUs. This example showcases how to integrate PyTorch's DDP with PyTorch/XLA for distributed training.
train_resnet_spmd_data_parallel.py
: Leverages SPMD (Single Program Multiple Data) for distributed training. It shards the batch dimension across multiple devices and demonstrates how to achieve higher performance than DDP for specific workloads.
train_resnet_xla_ddp.py
: Shows how to use PyTorch/XLA's built-in DDP implementation for distributed training on TPUs. It demonstratesthe benefits of distributed training and the simplicity of using PyTorch/XLA's DDP.
+- debug: A trainer implementation to run ResNet50 with debug mode.
+
train_resnet_profile.py
: Captures performance insights with PyTorch/XLA's profiler to identify bottlenecks. Helps diagnose and optimizemodel performance.
train_resnet_benchmark.py
: Provides a simple way to benchmark PyTorch/XLA, measuring device execution and tracing time for overall efficiency analysis.
+- flash_attention: A trainer implementation to run a decoder-only model using Flash Attention.
+
train_decoder_only_flash_attention.py
: Incorporates flash attention, an efficient attention mechanism, utilizing custom kernels for accelerated training.
train_resnet_flash_attention_fsdp_v2.py
: Combines flash attention with FSDP, showcasing the integration of custom kernels with FSDP forscalable and efficient model training.
+- fsdp: A trainer implementation to run a decoder-only model using FSDP (Fully Sharded Data Parallelism).
train_decoder_only_fsdp_v2.py
: Employs FSDP for training the decoder-only model, demonstrating parallel training of large transformer models on TPUs.
train_resnet_fsdp_auto_wrap.py
: Demonstrates FSDP (Fully Sharded Data Parallel) for model training, automatically wrapping model partsbased on size or type criteria.
(END)