Automatically and linearly scale the learning rate of the SSL encoder to the number of GPUS #667

maxilse · 2022-02-22T14:27:36Z

The learning rate is now linearly scaled by the number of GPUs available, e.g., lr = 0.001, 8 GPUs are available => lr=0.008.

I tested it for SimCLR and BYOL (https://ml.azure.com/experiments/id/81fa8775-1a25-47ae-9fe7-c13a6b91a421?wsid=/subscriptions/db9fc1d1-b44e-45a8-902d-8c766c255568/resourceGroups/innereyerg/providers/Microsoft.MachineLearningServices/workspaces/innereye4ws&tid=72f988bf-86f1-41af-91ab-2d7cd011db47).

In both cases the results are as expected:

SimCLR: More GPUs give better representations due to larger batches == more negative eaxmples.
BYOL: More GPUs speed up the training but do not give better results.

I did not add any test since we have these two tests already: test_simclr_num_gpus() and test_simclr_num_nodes()

fepegar

LGTM in general, I just added some questions.

InnerEye/ML/SSL/lightning_containers/ssl_container.py

CHANGELOG.md

InnerEye/ML/SSL/lightning_containers/ssl_container.py

maxilse added 3 commits February 21, 2022 11:04

quick fix for learning rate, waiting for runs to see if the fix worked

90752b8

moved learning rate fix to sslcontainer

067f0e6

added changelog

2dd4a93

maxilse requested a review from ant0nsc February 22, 2022 15:47

fepegar reviewed Feb 22, 2022

View reviewed changes

InnerEye/ML/SSL/lightning_containers/ssl_container.py Outdated Show resolved Hide resolved

InnerEye/ML/SSL/lightning_containers/ssl_container.py Outdated Show resolved Hide resolved

CHANGELOG.md Show resolved Hide resolved

after review

75dfe60

fepegar suggested changes Feb 22, 2022

View reviewed changes

InnerEye/ML/SSL/lightning_containers/ssl_container.py Outdated Show resolved Hide resolved

more fixes

d638948

maxilse enabled auto-merge (squash) February 22, 2022 17:44

fepegar previously approved these changes Feb 22, 2022

View reviewed changes

InnerEye/ML/SSL/lightning_containers/ssl_container.py Show resolved Hide resolved

added class weights

92f3ed3

maxilse dismissed fepegar’s stale review via 92f3ed3 February 23, 2022 14:10

ant0nsc previously approved these changes Mar 16, 2022

View reviewed changes

rebase

0acfb17

maxilse dismissed ant0nsc’s stale review via 0acfb17 March 21, 2022 09:22

maxilse added 2 commits March 21, 2022 09:36

mypy

9c8a6e3

accidentily added old files

343a8e8

fepegar approved these changes Mar 21, 2022

View reviewed changes

mebristo approved these changes Mar 22, 2022

View reviewed changes

Merge branch 'main' into maxilse/scale_lr

ec888fa

maxilse merged commit 6791dce into main Mar 22, 2022

maxilse deleted the maxilse/scale_lr branch March 22, 2022 10:23

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically and linearly scale the learning rate of the SSL encoder to the number of GPUS #667

Automatically and linearly scale the learning rate of the SSL encoder to the number of GPUS #667

maxilse commented Feb 22, 2022

fepegar left a comment

Automatically and linearly scale the learning rate of the SSL encoder to the number of GPUS #667

Automatically and linearly scale the learning rate of the SSL encoder to the number of GPUS #667

Conversation

maxilse commented Feb 22, 2022

fepegar left a comment

Choose a reason for hiding this comment