Skip to content

Commit

Permalink
[fbsync] Clarification for training resnext101_32x8d on ImageNet (#4390)
Browse files Browse the repository at this point in the history
Summary:
* Fix training resuming in references/segmentation

* Clarification for training resnext101_32x8d

* Update references/classification/README.md

Reviewed By: kazhang

Differential Revision: D30898330

fbshipit-source-id: 195c24c57ad3abe2e23e08b3b9251db68790914c

Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
  • Loading branch information
2 people authored and facebook-github-bot committed Sep 13, 2021
1 parent 16e774a commit ff5c5b3
Showing 1 changed file with 6 additions and 1 deletion.
7 changes: 6 additions & 1 deletion references/classification/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -40,12 +40,17 @@ python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\

### ResNext-101 32x8d

On 8 nodes, each with 8 GPUs (for a total of 64 GPUS)
```
python -m torch.distributed.launch --nproc_per_node=8 --use_env train.py\
--model resnext101_32x8d --epochs 100
```

Note that the above command corresponds to a single node with 8 GPUs. If you use
a different number of GPUs and/or a different batch size, then the learning rate
should be scaled accordingly. For example, the pretrained model provided by
`torchvision` was trained on 8 nodes, each with 8 GPUs (for a total of 64 GPUs),
with `--batch_size 16` and `--lr 0.4`, instead of the current defaults
which are respectively batch_size=32 and lr=0.1

### MobileNetV2
```
Expand Down

0 comments on commit ff5c5b3

Please sign in to comment.