Skip to content

Commit

Permalink
final touches
Browse files Browse the repository at this point in the history
  • Loading branch information
leandro committed Apr 13, 2022
1 parent 71d8829 commit 5fb8e15
Show file tree
Hide file tree
Showing 2 changed files with 19 additions and 13 deletions.
14 changes: 7 additions & 7 deletions docs/source/perf_train_gpu_one.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -17,11 +17,13 @@ In this section we have a look at a few tricks to reduce the memory footprint an

|Method|Speed|Memory|
|:-----|:----|:-----|
|Gradient accumulation| No | Yes |
|Gradient checkpointing| No| Yes |
|Mixed precision training| Yes | (No) |
|Batch size| Yes | Yes |
|Optimizer choice| (No) | Yes |
| Gradient accumulation | No | Yes |
| Gradient checkpointing | No| Yes |
| Mixed precision training | Yes | (No) |
| Batch size | Yes | Yes |
| Optimizer choice | Yes | Yes |
| DataLoader | Yes | No |
| DeepSpeed Zero | No | Yes |

A bracket means that it might not be strictly the case but is usually either not a main concern or negligable. Before we start make sure you have installed the following libraries:

Expand Down Expand Up @@ -648,8 +650,6 @@ Activation:
```
- Deployment in Notebooks: see this [guide](main_classes/deepspeed#deployment-in-notebooks).
- `accelerate`: use: ... (XXX: Sylvain/Leandro?) _CUSTOM CONFIG NOT SUPPORTED, YET_
- Custom training loop: This is somewhat complex but you can study how this is implemented in [HF Trainer](
https://github.com/huggingface/transformers/blob/master/src/transformers/trainer.py) - simply search for `deepspeed` in the code.
Expand Down
18 changes: 12 additions & 6 deletions docs/source/performance.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -14,11 +14,13 @@ See the License for the specific language governing permissions and
limitations under the License.
-->

# Performance
# Performance and Scalability

Training larger and larger transformer models and deploying them to production comes with a range of challenges. During training your model can require more GPU memory than is available or be very slow to train and when you deploy it for inference it can be overwhelmed with the throughput that is required in the production environment. This documentation is designed to help you navigate these challenges and find the best setting for your use-case. We split the guides into training and inference as they come with different challenges and solutions. Then within each of them we have separate guides for different kinds of hardware setting (e.g. single vs. multi-GPU for training or CPU vs. GPU for infrence).

This document serves as an overview entry point for the methods that could be useful for your scenario.
![perf_overview](https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/perf_overview.png)

This document serves as an overview and entry point for the methods that could be useful for your scenario.

## Training

Expand Down Expand Up @@ -48,25 +50,29 @@ _Coming soon_

Efficient inference with large models in a production environment can be as challenging as training them. In the following sections we go through the steps to run inference on CPU and single/multi-GPU setups.


### CPU

_TODO_
_Coming soon_

### Single GPU

_TODO_
_Coming soon_

### Multi-GPU

_TODO_
_Coming soon_

### Specialized Hardware

_Coming soon_

## Hardware

In the hardware section you can find tips and tricks when building your own deep learning rig.

[Go to hardware section](perf_hardware)


## Contribute

This document is far from being complete and a lot more needs to be added, so if you have additions or corrections to make please don't hesitate to open a PR or if you aren't sure start an Issue and we can discuss the details there.
Expand Down

0 comments on commit 5fb8e15

Please sign in to comment.