final touches

huggingface · Apr 13, 2022 · 5fb8e15 · 5fb8e15
1 parent 71d8829
commit 5fb8e15
Show file tree

Hide file tree

Showing 2 changed files with 19 additions and 13 deletions.
diff --git a/docs/source/perf_train_gpu_one.mdx b/docs/source/perf_train_gpu_one.mdx
@@ -17,11 +17,13 @@ In this section we have a look at a few tricks to reduce the memory footprint an
 
 |Method|Speed|Memory|
 |:-----|:----|:-----|
-|Gradient accumulation| No | Yes |
-|Gradient checkpointing| No| Yes |
-|Mixed precision training| Yes | (No) |
-|Batch size| Yes | Yes |
-|Optimizer choice| (No) | Yes |
+| Gradient accumulation | No | Yes |
+| Gradient checkpointing | No| Yes |
+| Mixed precision training | Yes | (No) |
+| Batch size | Yes | Yes |
+| Optimizer choice | Yes | Yes |
+| DataLoader | Yes | No |
+| DeepSpeed Zero | No | Yes |
 
 A bracket means that it might not be strictly the case but is usually either not a main concern or negligable. Before we start make sure you have installed the following libraries:
 
@@ -648,8 +650,6 @@ Activation:
     ```
 - Deployment in Notebooks: see this [guide](main_classes/deepspeed#deployment-in-notebooks).
 
-- `accelerate`:  use: ... (XXX: Sylvain/Leandro?) _CUSTOM CONFIG NOT SUPPORTED, YET_
-
 - Custom training loop: This is somewhat complex but you can study how this is implemented in [HF Trainer](
 https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py) - simply search for `deepspeed` in the code.
 

diff --git a/docs/source/performance.mdx b/docs/source/performance.mdx
@@ -14,11 +14,13 @@ See the License for the specific language governing permissions and
 limitations under the License.
 -->
 
-# Performance
+# Performance and Scalability
 
 Training larger and larger transformer models and deploying them to production comes with a range of challenges. During training your model can require more GPU memory than is available or be very slow to train and when you deploy it for inference it can be overwhelmed with the throughput that is required in the production environment. This documentation is designed to help you navigate these challenges and find the best setting for your use-case. We split the guides into training and inference as they come with different challenges and solutions. Then within each of them we have separate guides for different kinds of hardware setting (e.g. single vs. multi-GPU for training or CPU vs. GPU for infrence).
 
-This document serves as an overview entry point for the methods that could be useful for your scenario. 
+![perf_overview](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/perf_overview.png)
+
+This document serves as an overview and entry point for the methods that could be useful for your scenario. 
 
 ## Training
 
@@ -48,25 +50,29 @@ _Coming soon_
 
 Efficient inference with large models in a production environment can be as challenging as training them. In the following sections we go through the steps to run inference on CPU and single/multi-GPU setups.
 
-
 ### CPU
 
-_TODO_
+_Coming soon_
 
 ### Single GPU
 
-_TODO_
+_Coming soon_
 
 ### Multi-GPU
 
-_TODO_
+_Coming soon_
 
 ### Specialized Hardware
 
 _Coming soon_
 
 ## Hardware
 
+In the hardware section you can find tips and tricks when building your own deep learning rig. 
+
+[Go to hardware section](perf_hardware)
+
+
 ## Contribute
 
 This document is far from being complete and a lot more needs to be added, so if you have additions or corrections to make please don't hesitate to open a PR or if you aren't sure start an Issue and we can discuss the details there.