Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Release 2.0.0rc1 #9786

Closed
wants to merge 173 commits into from
Closed

Release 2.0.0rc1 #9786

wants to merge 173 commits into from

Conversation

ko3n1g
Copy link
Collaborator

@ko3n1g ko3n1g commented Jul 18, 2024

🚀 PR to release NeMo 2.0.0rc1.

📝 Please remember the following to-do's before merge:

  • Fill-in the comment Highlights
  • Review the comment Detailed Changelogs

🚨 Please also keep in mind to not delete the headings of the task commits. They are required by the post-merge automation.

🙏 Please merge this PR only if the CI workflow completed successfully.

borisfom and others added 30 commits July 8, 2024 14:50
* Nemotron ONNX export fixed

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Cleanup

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

* Addressing code review comments

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>

---------

Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com>
Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com>
* add slurm files to .gitignore

* add differentiable decode to SDXL VAE

* Optionally return predicted noise during the single step sampling process
* also change  `get_gamma` as a new function to use inside other
  functions which may interact with sampling (e.g. draft+)

* debugging sdunet converter script

* Added SD/SDXL conversion script from HF to NeMo
* added 'from_nemo' config for VAE

* tmp commit, please make changes (oci is super slow, cannot even run vim)

* new inference yaml works

* add logging to autoencoder

* !(dont squash) Added enabling support for LinearWrapper for SDLoRA

* added samples_per_batch and fsdp arguments to SDXL inference

* added extra optionally wrapper to FSDP

* remove unncessary comments

* remove unnecessary comments

* Apply isort and black reformatting

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>

---------

Signed-off-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
Co-authored-by: Rohit Jena <rohitkumarj@nvidia.com>
Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: yaoyu-33 <yaoyu-33@users.noreply.github.com>
* add NemoQueryLLMPyTorch class for triton query of in-framework models

* nemo_export.py changes to better support in-framework models

* separate out in-framework version of triton deploy script

* add generate() function to MegatronLLMDeployable to allow for direct use in export tests

* use NemoQueryLLMPyTorch in deploy tests

* add warning message for when MegatronLLMDeployable overrides transformer_engine

* remove enable_streaming argument from deploy_inframework_triton.py since MegatronLLMDeployable does not support streaming
add query_inframework.py since original query.py does not work with in-framework deployments

* Apply isort and black reformatting

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>

* skip trtllm support check if in_framework testing

* remove unused imports

* run_existing_checkpoints was passing wrong prompts argument for in-framework mode

* fix unused import in query_inframework.py

---------

Signed-off-by: jukim-nv <jukim-nv@users.noreply.github.com>
Co-authored-by: jukim-nv <jukim-nv@users.noreply.github.com>
Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
* Use FP8 in GPT TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add hydra options to use TE, TP overlap and FP8

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override presence checks in hydra

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* WIP: Add debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Add more debug code

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>

* Remove debug code and change underlying transformer layer to TE

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Override hydra error

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove tp overlap from the test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Change runner for fp8 tests

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* fix

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Add tp overlap test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove TP overlap from tests. It is unsupported in docker environment

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Adjust GPT PP2 test to use FP8. Change optimizer in TP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

* Remove env overrides form GPT PP2 test

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>

---------

Signed-off-by: Jan Baczek <jbaczek@nvidia.com>
Signed-off-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: jbaczek <jbaczek@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
…variety of tensors (#9641)

* enables default data step in megatron parallel to operate on a wider variety of tensors coming out of the dataloader

* handles the case where a batch is empty

* Apply isort and black reformatting

Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com>

* Allows the default data step to operate on more types
than just dictionaries

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

---------

Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com>
Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
Co-authored-by: jomitchellnv <jomitchellnv@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
* wip contrastive reranker

Signed-off-by: arendu <adithya.r@gmail.com>

* wip

Signed-off-by: arendu <adithya.r@gmail.com>

* wip

Signed-off-by: arendu <adithya.r@gmail.com>

* working reranker training and validation

Signed-off-by: arendu <adithya.r@gmail.com>

* default peft for reranker

Signed-off-by: arendu <adithya.r@gmail.com>

* validation time update

Signed-off-by: arendu <adithya.r@gmail.com>

* reranker test

Signed-off-by: arendu <adithya.r@gmail.com>

* reranker inference

Signed-off-by: arendu <adithya.r@gmail.com>

* reranker inference

Signed-off-by: arendu <adithya.r@gmail.com>

* Apply isort and black reformatting

Signed-off-by: arendu <arendu@users.noreply.github.com>

* updates

Signed-off-by: arendu <adithya.r@gmail.com>

* Apply isort and black reformatting

Signed-off-by: arendu <arendu@users.noreply.github.com>

* updates

Signed-off-by: arendu <adithya.r@gmail.com>

* Apply isort and black reformatting

Signed-off-by: arendu <arendu@users.noreply.github.com>

* also can support rlhf style reward model loss

Signed-off-by: arendu <adithya.r@gmail.com>

* Apply isort and black reformatting

Signed-off-by: arendu <arendu@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: arendu <arendu@users.noreply.github.com>

* typo in cicd

Signed-off-by: arendu <adithya.r@gmail.com>

---------

Signed-off-by: arendu <adithya.r@gmail.com>
Signed-off-by: arendu <arendu@users.noreply.github.com>
Signed-off-by: Adi Renduchintala <adithya.r@gmail.com>
Co-authored-by: arendu <arendu@users.noreply.github.com>
* unpin transformers

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* guard deprecated imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix import guards

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix import guards

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* try fixing

Signed-off-by: Chen Cui <chcui@nvidia.com>

* disable HF tests

Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>

* try fixing

Signed-off-by: Chen Cui <chcui@nvidia.com>

* hard code model lists

Signed-off-by: Chen Cui <chcui@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>

* hard code model lists

Signed-off-by: Chen Cui <chcui@nvidia.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: Chen Cui <chcui@nvidia.com>
Signed-off-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: cuichenx <cuichenx@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: Dmytro Pykhtar <dpykhtar@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: cuichenx <cuichenx@users.noreply.github.com>
* Added CPU offloading docs

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com>

* Tech writer review

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com>

---------

Signed-off-by: Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: Selvaraj Anandaraj <selvaraja@login-eos02.eos.clusters.nvidia.com>
Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
* Update llama-3 PEFT notebook to download model from NGC

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

* Fix broken link in llama-3 PEFT tutorial README

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

* Fix broken code block in llama 3 PEFT tutorial README

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

* Copy-edits to Llama-3 8B PEFT tutorial README

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

* Fix broken link

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

* Minor formatting fixes

Signed-off-by: Shashank Verma <shashank3959@gmail.com>

---------

Signed-off-by: Shashank Verma <shashank3959@gmail.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: ashors1 <ashors@nvidia.com>
* add lita

Signed-off-by: Slyne Deng <slyned@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Slyne <Slyne@users.noreply.github.com>

* add part of the tutorial and fix format

Signed-off-by: slyne deng <slyned@nvidia.com>

* add tutorial

Signed-off-by: slyne deng <slyned@nvidia.com>

* fix Tutorial ckpt conversion

Signed-off-by: slyne deng <slyned@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: Slyne <Slyne@users.noreply.github.com>

* update cicd

Signed-off-by: Slyne Deng <slyned@nvidia.com>

* add to CIICD test

Signed-off-by: Slyne Deng <slyned@nvidia.com>

* changes based on review comments

Signed-off-by: Slyne Deng <slyned@nvidia.com>

* fix bot warning

Signed-off-by: Slyne Deng <slyned@nvidia.com>

* update cicd main

Signed-off-by: Slyne Deng <slyned@nvidia.com>

* fix cicd ckpt conversion

Signed-off-by: Slyne Deng <slyned@nvidia.com>

---------

Signed-off-by: Slyne Deng <slyned@nvidia.com>
Signed-off-by: Slyne <Slyne@users.noreply.github.com>
Signed-off-by: slyne deng <slyned@nvidia.com>
Co-authored-by: Slyne Deng <slyned@nvidia.com>
Co-authored-by: Slyne <Slyne@users.noreply.github.com>
Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
* Parametrize FPS group



* Apply isort and black reformatting



* Change deafult to False



* Add logic to new ckptIO



* Turn on parallel save by default



---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Signed-off-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
* huvu/mcore_t5 first commit from local

* removing DEBUGGING prints

* cleaning megatron_lm_encoder_decoder_model.py code

* cleaning code

* adding Github action test

* only run mcore T5 test

* only run mcore T5 test

* only run mcore T5 test

* only run mcore T5 test

* reset .github/workflows/cicd-main.yml

* reset .github/workflows/cicd-main.yml

* adding condition self.mcore_t5 when running self.build_transformer_config()

* refractor megatron_lm_encoder_decoder_model.py to not use self.model

* only run T5-related tests

* remove all self.model

* reset cicd file

* reset cicd file

* updating codes remove duplicate if/else; adding mcore/transformer_engine to config file

* adjust +model.mcore_t5=True

* fix training for non-mcore, bf16, O2

* reset cicd-main.yml

---------

Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: Pablo Garay <pagaray@nvidia.com>
* adding mamba support

* fix import mixins

* rm convert jamba

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* more cleanups

* use GPT text gen

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* fixing gbs in TP convetor

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* add reqs

* add tutorial

* minor fix to tutorial

* moving finetuning files

Signed-off-by: arendu <adithya.r@gmail.com>

* moving finetuning files

Signed-off-by: arendu <adithya.r@gmail.com>

* address comments

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* address comments

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* address comments

* add mamba dependancies

* add mcore tag

* modify dockerfile ci

* modify dockerfile ci

* fix TP>1 to TP1

* add inference, update based on latest mcore commits

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* minor fix

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* minor fix

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* bug fix, tutorial update

---------

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
Signed-off-by: arendu <adithya.r@gmail.com>
Co-authored-by: Ali Taghibakhshi <ataghibakhsh@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
Co-authored-by: arendu <adithya.r@gmail.com>
Signed-off-by: Ryan <rlangman@nvidia.com>
* commit to eval/sft/peft

* update MCORE_COMMIT

* address Chen's comments, updating retro unit test

* Apply isort and black reformatting

Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>

---------

Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com>
Co-authored-by: Huy Vu2 <huvu@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
* Allow non-strict load



* Point to non-stric load MCore branch



* Avoid module level StrictHandling



* Use MCore fork



* Update to MCore fix



* Restore ackward compatibility



* Update flag defaults



* Update MCore tag



* Update PyT Dist interface



* Update to latest core_r0.8.0



---------

Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com>
Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix legacy ds padding bug

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* avoid code repetition

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix typo

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
…variety of tensors - second try (#9671)

* enables default data step in megatron parallel to operate on a wider variety of tensors coming out of the dataloader

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

* handles the case where a batch is empty

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com>
Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

* Allows the default data step to operate on more types
than just dictionaries

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com>

---------

Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com>
Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com>
Co-authored-by: jomitchellnv <jomitchellnv@users.noreply.github.com>
Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
* Fix when optimizers are setup for PEFT

* Apply isort and black reformatting



* Init DDP inside PEFT

* Apply isort and black reformatting



* Some fixes, loss seems to become nan with peft for some reason

* Apply isort and black reformatting



* Loss goes down on fp32

* Apply isort and black reformatting



* Simplifying FNMixin

* Apply isort and black reformatting



* Fix bug with new checkpoint-io

* Apply isort and black reformatting



* Fix failing test: test_peft_on_train_epoch_start_with_adapter

* Apply isort and black reformatting



---------

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: Chen Cui <chcui@nvidia.com>
Co-authored-by: ashors1 <ashors@nvidia.com>
* refactor: README
* refactor: Use new README in `setup.py`

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Remove mask if use fusion mask

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hsiehjackson <hsiehjackson@users.noreply.github.com>

---------

Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com>
Signed-off-by: hsiehjackson <hsiehjackson@users.noreply.github.com>
Co-authored-by: hsiehjackson <hsiehjackson@users.noreply.github.com>
akoumpa and others added 4 commits August 6, 2024 11:52
* nemo ux mixtral 8x22b config

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add mixtral 8x22b recipe

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add note

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix type hint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix type hint

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
)

* Fix logging of consumed samples in MegatronDataSampler

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>

* Remove unused import

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: hemildesai <hemildesai@users.noreply.github.com>
Co-authored-by: hemildesai <hemildesai@users.noreply.github.com>
* updat default PTL logging directories



* fix logger versions



* fix failed test



* add better documentation for 'update_logger_directory'



---------

Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
* wrap task config save in a try/except



* move fiddle import



---------

Signed-off-by: ashors1 <ashors@nvidia.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
janekl and others added 11 commits August 7, 2024 11:43
* Use directly trtllm-build command for quantized checkpoints and remove depedency on modelopt for this

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Fix error messages

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Move setting max_seq_len level up

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

* Apply isort and black reformatting

Signed-off-by: janekl <janekl@users.noreply.github.com>

* Bump ModelOpt version

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>

---------

Signed-off-by: Jan Lasek <janek.lasek@gmail.com>
Signed-off-by: janekl <janekl@users.noreply.github.com>
Co-authored-by: janekl <janekl@users.noreply.github.com>
* Fix transcription move_to_device

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* fix

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

* Fix Canary's transcribe after introducing dataclass for mini-batch representation

Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Piotr Żelasko <pzelasko@nvidia.com>
…e // head… (#9994)

* Use kv_channels to enable cases where head_dim != hidden_size // head_num

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add head_dim to exporter

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Drop default values for kv_channels

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* [lhotse] Support for NeMo tarred manifests with offset field

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* typo fix

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* fix basename

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* relieve heavy CPU memory usage for super-long tarred recordings

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

* Tests and fixes

Signed-off-by: Piotr Żelasko <petezor@gmail.com>

---------

Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
…#9929)

Signed-off-by: paul-gibbons <paul@gibbonspaul.com>
Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Co-authored-by: Paul Gibbons <87940629+paul-gibbons@users.noreply.github.com>
Co-authored-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Make MegatronStrategy.parallelism return ParallelismConfig

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* Make PrallelismConfig a dataclass

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add note on import cycle

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* update structure

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* update structure

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* add image

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

* address comments

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>

---------

Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com>
@ko3n1g
Copy link
Collaborator Author

ko3n1g commented Aug 7, 2024

Highlights

Training

Features and Model architectures

  • PEFT: QLoRA support, LoRA/QLora for Mixture-of-Experts (MoE) dense layer
  • State Space Models & Hybrid Architecture support (Mamba2 and NV-Mamba2-hybrid)
  • Support Nemotron, Minitron, Gemma2, Qwen, RAG

Multimodal

  • NeVA: Add SOTA LLM backbone support (Mixtral/LLaMA3) and suite of model parallelism support (PP/EP)
  • Support Language Instructed Temporal-Localization Assistant (LITA) on top of video NeVA

ASR

SpeechLM and SALM

Adapters for Canary Customization

Pytorch allocator in PyTorch 2.2 improves training speed up to 30% for all ASR models

Cuda Graphs for Transducer Inference

Replaced webdataset with Lhotse - gives up to 2x speedup

Transcription Improvements - Speedup and QoL Changes

ASR Prompt Formatter for multimodal Canary

* Fix torch version for tts asr import check test

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Ignore torch requirement

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

* Update base image used for import check

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>

---------

Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com>
Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
farhadrgh and others added 2 commits August 9, 2024 11:44
* rm torch version check

Signed-off-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com>

* bump min torch version

Signed-off-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com>

* rm version

Signed-off-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com>

---------

Signed-off-by: Farhad Ramezanghorbani <farhad.ghorbani@gmail.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
* Moe doc fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* JG fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
ericharper and others added 3 commits August 12, 2024 00:28
* comment docs

Signed-off-by: eharper <eharper@nvidia.com>

* fix link

Signed-off-by: eharper <eharper@nvidia.com>

* comment

Signed-off-by: eharper <eharper@nvidia.com>

* fix noindex syntax

Signed-off-by: eharper <eharper@nvidia.com>

---------

Signed-off-by: eharper <eharper@nvidia.com>
* ci: Token permission to cancel Workflow run

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Use template

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Combine cleanup and main job

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.