Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[NeMo-UX] Make TE and Apex dependencies optional #9732

Merged
merged 3 commits into from
Jul 15, 2024

Commits on Jul 15, 2024

  1. [NeMo-UX] Make TE and Apex dependencies optional (#9550)

    * Provide a pure pytorch/jit path to avoid required dependency on TE and Apex
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * add missing file
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * add minimal gpt pretraining example
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix pre-training datamodule initialization
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * add non-te/non-apex test
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * add comment to pretraining script
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * use microbatch calculator from mcore
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * fix nemo 2 test name
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * update Mcore commit for CI
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * replace apex microbatch calculator with megatron's in more places
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * fix missing import
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix typo
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix missed apex import
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * move imports
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * move imports
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * add types to command-line args
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * bug fix
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix path
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Disable distributed optimizer in nemo 2.0 test
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * fix optimizer config
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * update checkpointing
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * move import
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * fix failing unit test
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix failing test
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * Updating num_weights check of RETRO due to underlying changes from mcore RETRO MLM
    
    Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
    
    * fix typo
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    * remove stale warning
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix lora notebook
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * fix small typo
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * add import guards to gemma2
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    
    * Apply isort and black reformatting
    
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    
    ---------
    
    Signed-off-by: ashors1 <ashors@nvidia.com>
    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
    Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
    Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
    Co-authored-by: Eric Harper <complex451@gmail.com>
    Co-authored-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
    Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
    5 people committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    8141139 View commit details
    Browse the repository at this point in the history
  2. fix cherry-pick

    Signed-off-by: ashors1 <ashors@nvidia.com>
    ashors1 committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    81302ba View commit details
    Browse the repository at this point in the history
  3. Apply isort and black reformatting

    Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
    ashors1 committed Jul 15, 2024
    Configuration menu
    Copy the full SHA
    db92504 View commit details
    Browse the repository at this point in the history