Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature] Add support for OffloadModel to enable training large models on 1 GPU. #432

Merged
merged 57 commits into from
Feb 26, 2021

Commits on Dec 30, 2020

  1. clean start

    blefaudeux committed Dec 30, 2020
    Configuration menu
    Copy the full SHA
    f166609 View commit details
    Browse the repository at this point in the history

Commits on Jan 4, 2021

  1. Configuration menu
    Copy the full SHA
    fc1310a View commit details
    Browse the repository at this point in the history

Commits on Jan 5, 2021

  1. Configuration menu
    Copy the full SHA
    26cfd92 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    0363630 View commit details
    Browse the repository at this point in the history

Commits on Jan 7, 2021

  1. Configuration menu
    Copy the full SHA
    ff44ddd View commit details
    Browse the repository at this point in the history

Commits on Jan 11, 2021

  1. Configuration menu
    Copy the full SHA
    f20e3f8 View commit details
    Browse the repository at this point in the history

Commits on Jan 12, 2021

  1. hack, enable testing ViT + offload, python3 benchmarks/oss.py --epoch…

    …s 2 --optim_type oss_offload_ddp --batch_size=32 --model vit_large_patch16_224
    blefaudeux committed Jan 12, 2021
    Configuration menu
    Copy the full SHA
    3bcea0a View commit details
    Browse the repository at this point in the history

Commits on Jan 13, 2021

  1. Configuration menu
    Copy the full SHA
    62c15e4 View commit details
    Browse the repository at this point in the history
  2. minor, stashing

    blefaudeux committed Jan 13, 2021
    Configuration menu
    Copy the full SHA
    43c56cd View commit details
    Browse the repository at this point in the history

Commits on Jan 22, 2021

  1. Configuration menu
    Copy the full SHA
    042daa3 View commit details
    Browse the repository at this point in the history
  2. unit test fix

    blefaudeux committed Jan 22, 2021
    Configuration menu
    Copy the full SHA
    850c5bf View commit details
    Browse the repository at this point in the history

Commits on Feb 1, 2021

  1. Configuration menu
    Copy the full SHA
    52e0be4 View commit details
    Browse the repository at this point in the history

Commits on Feb 4, 2021

  1. Configuration menu
    Copy the full SHA
    1f6c018 View commit details
    Browse the repository at this point in the history

Commits on Feb 5, 2021

  1. Configuration menu
    Copy the full SHA
    8490dd8 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    9ec3892 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    d4e929d View commit details
    Browse the repository at this point in the history
  4. spring cleaning

    blefaudeux committed Feb 5, 2021
    Configuration menu
    Copy the full SHA
    8e92a4c View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    6bfeaed View commit details
    Browse the repository at this point in the history

Commits on Feb 12, 2021

  1. [offload] Add support for activation offloading + other changes (#367)

    * initial fwd/bwd commit
    
    * checkpoint work
    
    * modify shard loop
    
    * activation offloading and test to start with
    
    * fix lint errors
    
    * update comments
    
    * fix lint
    
    * remove unused var
    
    * remove commented out lines
    
    * modify name
    
    * remove break
    
    * remove profiler comments
    
    * avoid saving inputs
    
    * fix lint errors
    
    Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
    anj-s and Anjali Sridhar committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    e1c0a7a View commit details
    Browse the repository at this point in the history
  2. [offload] Add support for fp16 training (#374)

    * initial fwd/bwd commit
    
    * checkpoint work
    
    * modify shard loop
    
    * activation offloading and test to start with
    
    * fix lint errors
    
    * update comments
    
    * fix lint
    
    * remove unused var
    
    * remove commented out lines
    
    * modify name
    
    * remove break
    
    * remove profiler comments
    
    * add support for fp16
    
    * add unit tests
    
    * fix lint errors
    
    * fix test failure
    
    Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
    anj-s and Anjali Sridhar committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    c2ac144 View commit details
    Browse the repository at this point in the history
  3. [offload] Add support for activation checkpointing for all layers. (#381

    )
    
    * initial fwd/bwd commit
    
    * checkpoint work
    
    * modify shard loop
    
    * activation offloading and test to start with
    
    * fix lint errors
    
    * update comments
    
    * fix lint
    
    * remove unused var
    
    * remove commented out lines
    
    * modify name
    
    * remove break
    
    * remove profiler comments
    
    * add support for fp16
    
    * add unit tests
    
    * fix lint errors
    
    * fix test failure
    
    * cp work, incorrect output dimensions still need to be fixed
    
    * fixed activation outputs
    
    * intermediate cp of work
    
    * add tests
    
    * fix lint errors
    
    Co-authored-by: Anjali Sridhar <anj@devfair0443.h2.fair>
    anj-s and Anjali Sridhar committed Feb 12, 2021
    Configuration menu
    Copy the full SHA
    bff0cdb View commit details
    Browse the repository at this point in the history

Commits on Feb 17, 2021

  1. add support for microbatches

    Anjali Sridhar committed Feb 17, 2021
    Configuration menu
    Copy the full SHA
    0ca26a2 View commit details
    Browse the repository at this point in the history
  2. revert benchmark config changes

    Anjali Sridhar committed Feb 17, 2021
    Configuration menu
    Copy the full SHA
    0b70ffa View commit details
    Browse the repository at this point in the history

Commits on Feb 19, 2021

  1. add parametrization

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    9cdea8b View commit details
    Browse the repository at this point in the history
  2. fix lint errors and tests

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    38c541d View commit details
    Browse the repository at this point in the history
  3. skip test for 1.5

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    8e32380 View commit details
    Browse the repository at this point in the history
  4. fix lint errors

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    0d7201f View commit details
    Browse the repository at this point in the history
  5. skip test if there are no GPUs

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    cbe8acc View commit details
    Browse the repository at this point in the history
  6. fix lint errors

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    2dff98e View commit details
    Browse the repository at this point in the history
  7. fix lint errors

    Anjali Sridhar committed Feb 19, 2021
    Configuration menu
    Copy the full SHA
    239713d View commit details
    Browse the repository at this point in the history

Commits on Feb 22, 2021

  1. move experimental to the fairscale repo

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    7867f4f View commit details
    Browse the repository at this point in the history
  2. lint error fixes

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    256d4b4 View commit details
    Browse the repository at this point in the history
  3. modify test imports

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    7bc20fc View commit details
    Browse the repository at this point in the history
  4. lint error fixes

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    9ad7c12 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    78f4906 View commit details
    Browse the repository at this point in the history
  6. move offload files to the experimental directory

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    822e38e View commit details
    Browse the repository at this point in the history
  7. move tests and benchmarks to their forlder

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    c9a02be View commit details
    Browse the repository at this point in the history
  8. fix mypy errors

    Anjali Sridhar committed Feb 22, 2021
    Configuration menu
    Copy the full SHA
    595399b View commit details
    Browse the repository at this point in the history

Commits on Feb 23, 2021

  1. cp intermediate working benchmarks

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    60cecaa View commit details
    Browse the repository at this point in the history
  2. more changes

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    cbfdb27 View commit details
    Browse the repository at this point in the history
  3. Merge branch 'master' into offload_experimental

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    8870d17 View commit details
    Browse the repository at this point in the history
  4. Merge branch 'offload_experimental' into seq_benchmark

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    cea1426 View commit details
    Browse the repository at this point in the history
  5. split benchmark configs

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    4306696 View commit details
    Browse the repository at this point in the history
  6. Merge branch 'split-benchmark-configs' into seq_benchmark

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    ce575df View commit details
    Browse the repository at this point in the history
  7. remove print statements

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    697887c View commit details
    Browse the repository at this point in the history
  8. fix lint errors

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    e7336e9 View commit details
    Browse the repository at this point in the history
  9. remove unused print

    Anjali Sridhar committed Feb 23, 2021
    Configuration menu
    Copy the full SHA
    e73d04b View commit details
    Browse the repository at this point in the history

Commits on Feb 24, 2021

  1. stress testing

    Anjali Sridhar committed Feb 24, 2021
    Configuration menu
    Copy the full SHA
    65f2f92 View commit details
    Browse the repository at this point in the history
  2. remove unused file

    Anjali Sridhar committed Feb 24, 2021
    Configuration menu
    Copy the full SHA
    2d0d7f5 View commit details
    Browse the repository at this point in the history
  3. change param nae

    Anjali Sridhar committed Feb 24, 2021
    Configuration menu
    Copy the full SHA
    b8c493f View commit details
    Browse the repository at this point in the history
  4. fix merge conflicts

    Anjali Sridhar committed Feb 24, 2021
    Configuration menu
    Copy the full SHA
    f02b6e2 View commit details
    Browse the repository at this point in the history
  5. lint fixes

    Anjali Sridhar committed Feb 24, 2021
    Configuration menu
    Copy the full SHA
    3867297 View commit details
    Browse the repository at this point in the history

Commits on Feb 25, 2021

  1. Merge branch 'master' into offload_experimental

    Anjali Sridhar committed Feb 25, 2021
    Configuration menu
    Copy the full SHA
    1eb8082 View commit details
    Browse the repository at this point in the history
  2. move file to the right folder

    Anjali Sridhar committed Feb 25, 2021
    Configuration menu
    Copy the full SHA
    8e56a5b View commit details
    Browse the repository at this point in the history
  3. offload_experimental

    Anjali Sridhar committed Feb 25, 2021
    Configuration menu
    Copy the full SHA
    59199b9 View commit details
    Browse the repository at this point in the history
  4. add doc string

    Anjali Sridhar committed Feb 25, 2021
    Configuration menu
    Copy the full SHA
    ef50486 View commit details
    Browse the repository at this point in the history
  5. add error message

    Anjali Sridhar committed Feb 25, 2021
    Configuration menu
    Copy the full SHA
    13bb24c View commit details
    Browse the repository at this point in the history