Skip to content
This repository has been archived by the owner on Mar 21, 2024. It is now read-only.

Automatically and linearly scale the learning rate of the SSL encoder to the number of GPUS #667

Merged
merged 10 commits into from
Mar 22, 2022

Conversation

maxilse
Copy link
Contributor

@maxilse maxilse commented Feb 22, 2022

The learning rate is now linearly scaled by the number of GPUs available, e.g., lr = 0.001, 8 GPUs are available => lr=0.008.

I tested it for SimCLR and BYOL (https://ml.azure.com/experiments/id/81fa8775-1a25-47ae-9fe7-c13a6b91a421?wsid=/subscriptions/db9fc1d1-b44e-45a8-902d-8c766c255568/resourceGroups/innereyerg/providers/Microsoft.MachineLearningServices/workspaces/innereye4ws&tid=72f988bf-86f1-41af-91ab-2d7cd011db47).

In both cases the results are as expected:

  • SimCLR: More GPUs give better representations due to larger batches == more negative eaxmples.
  • BYOL: More GPUs speed up the training but do not give better results.

I did not add any test since we have these two tests already: test_simclr_num_gpus() and test_simclr_num_nodes()

@maxilse maxilse requested a review from ant0nsc February 22, 2022 15:47
Copy link
Contributor

@fepegar fepegar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM in general, I just added some questions.

InnerEye/ML/SSL/lightning_containers/ssl_container.py Outdated Show resolved Hide resolved
InnerEye/ML/SSL/lightning_containers/ssl_container.py Outdated Show resolved Hide resolved
CHANGELOG.md Show resolved Hide resolved
@maxilse maxilse enabled auto-merge (squash) February 22, 2022 17:44
ant0nsc
ant0nsc previously approved these changes Mar 16, 2022
@maxilse maxilse merged commit 6791dce into main Mar 22, 2022
@maxilse maxilse deleted the maxilse/scale_lr branch March 22, 2022 10:23
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants