Skip to content

Commit

Permalink
Merge branch 'main' into issue2637
Browse files Browse the repository at this point in the history
  • Loading branch information
markstur authored Nov 2, 2023
2 parents 7dba683 + 53e4142 commit d9c439b
Show file tree
Hide file tree
Showing 2 changed files with 5 additions and 8 deletions.
8 changes: 4 additions & 4 deletions intermediate_source/FSDP_adavnced_tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -74,8 +74,8 @@ summarization using WikiHow dataset. The main focus of this tutorial is to
highlight different available features in FSDP that are helpful for training
large scale model above 3B parameters. Also, we cover specific features for
Transformer based models. The code for this tutorial is available in `Pytorch
Examples
<https://github.com/HamidShojanazeri/examples/tree/FSDP_example/distributed/FSDP/>`__.
examples
<https://github.com/pytorch/examples/tree/main/distributed/FSDP/>`__.


*Setup*
Expand All @@ -97,13 +97,13 @@ Please create a `data` folder, download the WikiHow dataset from `wikihowAll.csv
`wikihowSep.cs <https://ucsb.app.box.com/s/7yq601ijl1lzvlfu4rjdbbxforzd2oag>`__,
and place them in the `data` folder. We will use the wikihow dataset from
`summarization_dataset
<https://github.com/HamidShojanazeri/examples/blob/FSDP_example/distributed/FSDP/summarization_dataset.py>`__.
<https://github.com/pytorch/examples/blob/main/distributed/FSDP/summarization_dataset.py>`__.

Next, we add the following code snippets to a Python script “T5_training.py”.

.. note::
The full source code for this tutorial is available in `PyTorch examples
<https://github.com/HamidShojanazeri/examples/tree/FSDP_example/distributed/FSDP>`__.
<https://github.com/pytorch/examples/tree/main/distributed/FSDP/>`__.

1.3 Import necessary packages:

Expand Down
5 changes: 1 addition & 4 deletions recipes_source/recipes/tuning_guide.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,15 +193,12 @@ def fused_gelu(x):
#
# numactl --cpunodebind=N --membind=N python <pytorch_script>

###############################################################################
# More detailed descriptions can be found `here <https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html>`_.

###############################################################################
# Utilize OpenMP
# ~~~~~~~~~~~~~~
# OpenMP is utilized to bring better performance for parallel computation tasks.
# ``OMP_NUM_THREADS`` is the easiest switch that can be used to accelerate computations. It determines number of threads used for OpenMP computations.
# CPU affinity setting controls how workloads are distributed over multiple cores. It affects communication overhead, cache line invalidation overhead, or page thrashing, thus proper setting of CPU affinity brings performance benefits. ``GOMP_CPU_AFFINITY`` or ``KMP_AFFINITY`` determines how to bind OpenMP* threads to physical processing units. Detailed information can be found `here <https://software.intel.com/content/www/us/en/develop/articles/how-to-get-better-performance-on-pytorchcaffe2-with-intel-acceleration.html>`_.
# CPU affinity setting controls how workloads are distributed over multiple cores. It affects communication overhead, cache line invalidation overhead, or page thrashing, thus proper setting of CPU affinity brings performance benefits. ``GOMP_CPU_AFFINITY`` or ``KMP_AFFINITY`` determines how to bind OpenMP* threads to physical processing units.

###############################################################################
# With the following command, PyTorch run the task on N OpenMP threads.
Expand Down

0 comments on commit d9c439b

Please sign in to comment.