* Provide a pure pytorch/jit path to avoid required dependency on TE and Apex
Signed-off-by: ashors1 <ashors@nvidia.com>
* add missing file
Signed-off-by: ashors1 <ashors@nvidia.com>
* add minimal gpt pretraining example
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix pre-training datamodule initialization
Signed-off-by: ashors1 <ashors@nvidia.com>
* add non-te/non-apex test
Signed-off-by: ashors1 <ashors@nvidia.com>
* add comment to pretraining script
Signed-off-by: ashors1 <ashors@nvidia.com>
* use microbatch calculator from mcore
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* fix nemo 2 test name
Signed-off-by: ashors1 <ashors@nvidia.com>
* update Mcore commit for CI
Signed-off-by: ashors1 <ashors@nvidia.com>
* replace apex microbatch calculator with megatron's in more places
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* fix missing import
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix typo
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix missed apex import
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* move imports
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* move imports
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* add types to command-line args
Signed-off-by: ashors1 <ashors@nvidia.com>
* bug fix
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix path
Signed-off-by: ashors1 <ashors@nvidia.com>
* Disable distributed optimizer in nemo 2.0 test
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* fix optimizer config
Signed-off-by: ashors1 <ashors@nvidia.com>
* update checkpointing
Signed-off-by: ashors1 <ashors@nvidia.com>
* move import
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* fix failing unit test
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix failing test
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* Updating num_weights check of RETRO due to underlying changes from mcore RETRO MLM
Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
* Apply isort and black reformatting
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
* fix typo
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
* remove stale warning
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix lora notebook
Signed-off-by: ashors1 <ashors@nvidia.com>
* fix small typo
Signed-off-by: ashors1 <ashors@nvidia.com>
* add import guards to gemma2
Signed-off-by: ashors1 <ashors@nvidia.com>
* Apply isort and black reformatting
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
---------
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Co-authored-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>