forked from NVIDIA/NeMo
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
use torch scaled_dot_product_attention #1
Draft
WoodieDudy
wants to merge
466
commits into
main
Choose a base branch
from
sdpa-asr
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…IDIA#9715) * Allow non-strict load * Point to non-stric load MCore branch * Avoid module level StrictHandling * Use MCore fork * Update to MCore fix * Restore ackward compatibility * Update flag defaults * Update MCore tag * Update PyT Dist interface * Update to latest core_r0.8.0 --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix legacy ds padding bug Signed-off-by: dimapihtar <dpihtar@gmail.com> * Apply isort and black reformatting Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> * avoid code repetition Signed-off-by: dimapihtar <dpihtar@gmail.com> * fix typo Signed-off-by: dimapihtar <dpihtar@gmail.com> --------- Signed-off-by: dimapihtar <dpihtar@gmail.com> Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com> Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
…variety of tensors - second try (NVIDIA#9671) * enables default data step in megatron parallel to operate on a wider variety of tensors coming out of the dataloader Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> * handles the case where a batch is empty Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> * Apply isort and black reformatting Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com> Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> * Allows the default data step to operate on more types than just dictionaries Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> * Apply isort and black reformatting Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com> --------- Signed-off-by: Jonathan Mitchell <jomitchell@nvidia.com> Signed-off-by: jomitchellnv <jomitchellnv@users.noreply.github.com> Co-authored-by: jomitchellnv <jomitchellnv@users.noreply.github.com> Co-authored-by: John St. John <jstjohn@users.noreply.github.com>
…A#9647) * Fix when optimizers are setup for PEFT * Apply isort and black reformatting * Init DDP inside PEFT * Apply isort and black reformatting * Some fixes, loss seems to become nan with peft for some reason * Apply isort and black reformatting * Loss goes down on fp32 * Apply isort and black reformatting * Simplifying FNMixin * Apply isort and black reformatting * Fix bug with new checkpoint-io * Apply isort and black reformatting * Fix failing test: test_peft_on_train_epoch_start_with_adapter * Apply isort and black reformatting --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: ashors1 <ashors@nvidia.com>
* refactor: README * refactor: Use new README in `setup.py` Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* Remove mask if use fusion mask Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> * Apply isort and black reformatting Signed-off-by: hsiehjackson <hsiehjackson@users.noreply.github.com> --------- Signed-off-by: Cheng-Ping Hsieh <chsieh@nvidia.com> Signed-off-by: hsiehjackson <hsiehjackson@users.noreply.github.com> Co-authored-by: hsiehjackson <hsiehjackson@users.noreply.github.com>
…DIA#9690) (NVIDIA#9694) * Move tensorstore import inline * Moving AsyncFinalizableCheckpointIO import inline * Wrap AsyncCompatibleCheckpointIO in try/catch inside pl.py * Moving gpt_layer_specs import inline * Apply isort and black reformatting --------- Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* add contianer * modify tutorial * modify tutorial * modify tutorial --------- Co-authored-by: Ali Taghibakhshi <ataghibakhsh@login-eos01.eos.clusters.nvidia.com>
Signed-off-by: Elena Rastorgueva <erastorgueva@nvidia.com> Co-authored-by: Elena Rastorgueva <80532067+erastorgueva-nv@users.noreply.github.com>
…#9650) (NVIDIA#9691) * Nemotron export - fixing megatron_export.py (NVIDIA#9625) * Nemotron ONNX export fixed * Cleanup * Addressing code review comments --------- * Including all trainable-params in a PEFT-checkpoint * Apply isort and black reformatting * Small fixes to make model-importer work * Fixing failing tests --------- Signed-off-by: Boris Fomitchev <bfomitchev@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com> Co-authored-by: Boris Fomitchev <borisfom@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com> Co-authored-by: ashors1 <ashors@nvidia.com>
* [NeMo-UX] Make TE and Apex dependencies optional (NVIDIA#9550) * Provide a pure pytorch/jit path to avoid required dependency on TE and Apex Signed-off-by: ashors1 <ashors@nvidia.com> * add missing file Signed-off-by: ashors1 <ashors@nvidia.com> * add minimal gpt pretraining example Signed-off-by: ashors1 <ashors@nvidia.com> * fix pre-training datamodule initialization Signed-off-by: ashors1 <ashors@nvidia.com> * add non-te/non-apex test Signed-off-by: ashors1 <ashors@nvidia.com> * add comment to pretraining script Signed-off-by: ashors1 <ashors@nvidia.com> * use microbatch calculator from mcore Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * fix nemo 2 test name Signed-off-by: ashors1 <ashors@nvidia.com> * update Mcore commit for CI Signed-off-by: ashors1 <ashors@nvidia.com> * replace apex microbatch calculator with megatron's in more places Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * fix missing import Signed-off-by: ashors1 <ashors@nvidia.com> * fix typo Signed-off-by: ashors1 <ashors@nvidia.com> * fix missed apex import Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Signed-off-by: ashors1 <ashors@nvidia.com> * move imports Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Signed-off-by: ashors1 <ashors@nvidia.com> * move imports Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * add types to command-line args Signed-off-by: ashors1 <ashors@nvidia.com> * bug fix Signed-off-by: ashors1 <ashors@nvidia.com> * fix path Signed-off-by: ashors1 <ashors@nvidia.com> * Disable distributed optimizer in nemo 2.0 test Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * fix optimizer config Signed-off-by: ashors1 <ashors@nvidia.com> * update checkpointing Signed-off-by: ashors1 <ashors@nvidia.com> * move import Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * fix failing unit test Signed-off-by: ashors1 <ashors@nvidia.com> * fix failing test Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * Updating num_weights check of RETRO due to underlying changes from mcore RETRO MLM Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com> * Apply isort and black reformatting Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> * fix typo Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * remove stale warning Signed-off-by: ashors1 <ashors@nvidia.com> * fix lora notebook Signed-off-by: ashors1 <ashors@nvidia.com> * fix small typo Signed-off-by: ashors1 <ashors@nvidia.com> * add import guards to gemma2 Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com> Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com> Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com> * fix cherry-pick Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Signed-off-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com> Signed-off-by: huvunvidia <huvunvidia@users.noreply.github.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com> Co-authored-by: huvunvidia <86480512+huvunvidia@users.noreply.github.com> Co-authored-by: huvunvidia <huvunvidia@users.noreply.github.com>
* minor 2.0 bug fix when TE/Apex not installed Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Pablo Garay <palenq@gmail.com> Co-authored-by: ashors1 <ashors@nvidia.com>
…v variable (NVIDIA#9736) (NVIDIA#9750) Signed-off-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com>
Signed-off-by: He Huang (Steve) <105218074+stevehuang52@users.noreply.github.com>
* Fix issue with prompt_defaults Signed-off-by: smajumdar <titu1994@gmail.com> * Add core level support for grad map tracking Signed-off-by: smajumdar <titu1994@gmail.com> * Add core level support for grad map tracking Signed-off-by: smajumdar <titu1994@gmail.com> * Apply isort and black reformatting Signed-off-by: titu1994 <titu1994@users.noreply.github.com> * Add tutorial and update repr of formatters Signed-off-by: smajumdar <titu1994@gmail.com> * Update docs Signed-off-by: smajumdar <titu1994@gmail.com> --------- Signed-off-by: smajumdar <titu1994@gmail.com> Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
…al_batch_size (NVIDIA#9707) (NVIDIA#9753) Signed-off-by: ashors1 <ashors@nvidia.com> Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com> Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
* fix serialization of partial function * update serialization to handle value.args Signed-off-by: srabhi <srabhi@nvidia.com> * add unit test Signed-off-by: srabhi <srabhi@nvidia.com> * remove redundant code from unit-test Signed-off-by: srabhi <srabhi@nvidia.com> --------- Signed-off-by: srabhi <srabhi@nvidia.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
…or (NVIDIA#9682) * Speeds up copying of neccesary artifact files with SaveRestoreConnector Previously, the SaveRestoreConnector would copy and untar entire checkpoints just to copy out a tokenizer. For models in the >100GB, this led to timeouts since only rank=0 did this work, while other ranks moved on and waited at an all-gather barrier (observed NCCL timeout at 10min). Signed-off-by: Terry Kong <terryk@nvidia.com> * cleanup Signed-off-by: Terry Kong <terryk@nvidia.com> * black formatting Signed-off-by: Terry Kong <terryk@nvidia.com> * Apply isort and black reformatting Signed-off-by: terrykong <terrykong@users.noreply.github.com> Signed-off-by: Terry Kong <terryk@nvidia.com> * restoring logic to previous tempdir logic Signed-off-by: Terry Kong <terryk@nvidia.com> * nlp overrides too Signed-off-by: Terry Kong <terryk@nvidia.com> * respect return_config Signed-off-by: Terry Kong <terryk@nvidia.com> * some unit tests Signed-off-by: Terry Kong <terryk@nvidia.com> * nodbg Signed-off-by: Terry Kong <terryk@nvidia.com> * Apply isort and black reformatting Signed-off-by: terrykong <terrykong@users.noreply.github.com> * correct typing Signed-off-by: Terry Kong <terryk@nvidia.com> * Fixes directory issue Signed-off-by: Terry Kong <terryk@nvidia.com> * Apply isort and black reformatting Signed-off-by: terrykong <terrykong@users.noreply.github.com> --------- Signed-off-by: Terry Kong <terryk@nvidia.com> Signed-off-by: terrykong <terrykong@users.noreply.github.com> Co-authored-by: terrykong <terrykong@users.noreply.github.com> Co-authored-by: Eric Harper <complex451@gmail.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
* Add checkpoints section Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Fix title Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * update Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Add section on ".qnemo" checkpoints (NVIDIA#9503) * Add 'Quantized Checkpoints' section Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Address review comments Signed-off-by: Jan Lasek <janek.lasek@gmail.com> --------- Signed-off-by: Jan Lasek <janek.lasek@gmail.com> * Distributed checkpointing user guide (NVIDIA#9494) * Describe shardings and entrypoints Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Strategies, optimizers, finalize entrypoints Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Transformations Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Integration Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Add link from intro Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Apply grammar suggestions Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Explain the example Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Apply review suggestions Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * Add zarr and torch_dist explanation --------- Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> * add subsection Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * Update docs/source/checkpoints/intro.rst Co-authored-by: Chen Cui <chcui@nvidia.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> * address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix code block Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * address comments Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * formatting Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> * fix Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> --------- Signed-off-by: yaoyu-33 <yaoyu.094@gmail.com> Signed-off-by: Jan Lasek <janek.lasek@gmail.com> Signed-off-by: Mikołaj Błaż <mblaz@nvidia.com> Signed-off-by: Yu Yao <54727607+yaoyu-33@users.noreply.github.com> Co-authored-by: Jan Lasek <janek.lasek@gmail.com> Co-authored-by: mikolajblaz <mikolajblaz@users.noreply.github.com> Co-authored-by: Chen Cui <chcui@nvidia.com>
* ci: Add workflow for code-freeze Signed-off-by: Oliver Koenig <okoenig@nvidia.com> * ci: Add workflow for releasing NeMo Tookit Signed-off-by: Oliver Koenig <okoenig@nvidia.com> --------- Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
WoodieDudy
force-pushed
the
sdpa-asr
branch
2 times, most recently
from
July 18, 2024 10:20
95ea37c
to
c82fbc3
Compare
Signed-off-by: WoodieDudy <goshagks@gmail.com>
* 24.07 vboost numbers Signed-off-by: Malay Nagda <malayn@nvidia.com> * 175b 512gpus Signed-off-by: Malay Nagda <malayn@nvidia.com> --------- Signed-off-by: Malay Nagda <malayn@nvidia.com> Co-authored-by: Sangkug Lym <slym@nvidia.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
* fix mamba convert/ add test * Apply isort and black reformatting Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> * add mamba test * fix ngroup in cicd --------- Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com> Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
NVIDIA#10127) * Resolve merge conflicts with consumed sample logging Signed-off-by: John St John <jstjohn@nvidia.com> * Add test file that captures the predict step error Signed-off-by: John St John <jstjohn@nvidia.com> * Add fixme comment around proper checkpoint nemo2 handling Signed-off-by: John St John <jstjohn@nvidia.com> * Skip megatron training test on CPU nodes Signed-off-by: John St John <jstjohn@nvidia.com> * Move output_log to last arg for compatibility Signed-off-by: John St John <jstjohn@nvidia.com> * try setting the default root dir in predict to avoid writing artifacts to cwd Signed-off-by: John St John <jstjohn@nvidia.com> * Handle the new check for batch samplers to enable predict_step Signed-off-by: John St John <jstjohn@nvidia.com> * Only reset the global microbatch, not entire parallel state Signed-off-by: John St John <jstjohn@nvidia.com> * Destroy the right sets of state in test of lightning trainer Signed-off-by: John St John <jstjohn@nvidia.com> * Fix typo and rename state resetting functions Signed-off-by: John St John <jstjohn@nvidia.com> * Run test in a subprocess to avoid contaminating global state Signed-off-by: John St John <jstjohn@nvidia.com> --------- Signed-off-by: John St John <jstjohn@nvidia.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
* add nemotron * add nemotron exporter. make converted model identical * Apply isort and black reformatting Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> * add more config * Apply isort and black reformatting Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> * add config * Apply isort and black reformatting Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> * import refactor * Apply isort and black reformatting Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> * refactor config * add 22B config --------- Signed-off-by: suiyoubi <suiyoubi@users.noreply.github.com> Co-authored-by: suiyoubi <suiyoubi@users.noreply.github.com>
* Riva and k2 ASR WFST decoding (2) (NVIDIA#9391) * upload Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * add comments and use case Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * Apply isort and black reformatting Signed-off-by: GNroy <GNroy@users.noreply.github.com> * add initial doc Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * fix doc and k2+cuda eval Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * isolate decoder components installation and fix suggestions Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> * Apply isort and black reformatting Signed-off-by: GNroy <GNroy@users.noreply.github.com> * fix trailing newline Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: GNroy <GNroy@users.noreply.github.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: GNroy <GNroy@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add DdpParamParityChecker Callback Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Improve messaging Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Rename to DdpParityChecker Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add ddp test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * rename to ddp_parity_checker Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove red. imports Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * test fix Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * missign import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * ignore test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * add missing import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * another missing import Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * make limit_val_batches int Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * remove dup file Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * AG groups decisions on DDP parity Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * fix test Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> * Exclude from pytest Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Add L2_NeMo_2_GPT_DDP_Param_Parity_check to NeMo_CICD_Test.needs Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Aleksandr Laptev <alaptev@nvidia.com> Signed-off-by: GNroy <GNroy@users.noreply.github.com> Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: Aleksandr Laptev <alaptev@nvidia.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: GNroy <GNroy@users.noreply.github.com> Co-authored-by: Vladimir Bataev <vbataev@nvidia.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
…ter restoring from a checkpoint (NVIDIA#10225) Signed-off-by: ashors1 <ashors@nvidia.com>
* Update TRTLLM 0.12 * Add model config * Change config * Change deploy script * Apply isort and black reformatting Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> * Remove parameter --------- Signed-off-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: meatybobby <meatybobby@users.noreply.github.com> Co-authored-by: Onur Yilmaz <35306097+oyilmaz-nvidia@users.noreply.github.com>
Signed-off-by: Ante Jukić <ajukic@nvidia.com>
…VIDIA#10234) Signed-off-by: Hemil Desai <hemild@nvidia.com>
…nd offsets in manifest (NVIDIA#10198) * Add tests for LazyNeMoIterator and fix case with manifest_only=True and offsets in manifest Signed-off-by: Piotr Żelasko <petezor@gmail.com> * Address code review Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> * fix tests Signed-off-by: Piotr Żelasko <petezor@gmail.com> --------- Signed-off-by: Piotr Żelasko <petezor@gmail.com>
…ckpoints (NVIDIA#9939) * perfor serialization using relative paths to allow users to move checkpoints after they're saved Signed-off-by: ashors1 <ashors@nvidia.com> * Apply isort and black reformatting Signed-off-by: ashors1 <ashors1@users.noreply.github.com> * remove unused import Signed-off-by: ashors1 <ashors@nvidia.com> * fix artifact load Signed-off-by: ashors1 <ashors@nvidia.com> * fix path artifact Signed-off-by: ashors1 <ashors@nvidia.com> * remove unused import Signed-off-by: ashors1 <ashors@nvidia.com> --------- Signed-off-by: ashors1 <ashors@nvidia.com> Signed-off-by: ashors1 <ashors1@users.noreply.github.com> Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
* Add MemoryProfileCallback Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com> * Remove reference cycles, save snapshot on specific ranks Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Remove unnecessary imports Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> * Apply isort and black reformatting Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com> * Update docstring Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> --------- Signed-off-by: Shriya Palsamudram <spalsamudram@nvidia.com> Signed-off-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com> Signed-off-by: Shriya Rishab <69161273+ShriyaPalsamudram@users.noreply.github.com> Co-authored-by: ShriyaPalsamudram <ShriyaPalsamudram@users.noreply.github.com>
Signed-off-by: Dong Hyuk Chang <donghyukc@nvidia.com> Co-authored-by: Dong Hyuk Chang <donghyukc@nvidia.com>
…rocessing (NVIDIA#10052) Flow matching generative model with SSL pretraining framework Signed-off-by: Pin-Jui Ku <pku@nvidia.com> Co-authored-by: Kuray107 <Kuray107@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
* Move nemotron transformers + tokenizer imports inline to reduce number of required deps Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> * Apply isort and black reformatting Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> --------- Signed-off-by: Marc Romeyn <mromeijn@nvidia.com> Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com> Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
* Wrap CPU model init with megatron_lazy_init_context Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Cleanup checkpoint-dir if saving fails Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> * Apply isort and black reformatting Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> --------- Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com> Signed-off-by: akoumpa <akoumpa@users.noreply.github.com> Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: titu1994 <titu1994@users.noreply.github.com>
Signed-off-by: WoodieDudy <goshagks@gmail.com>
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Signed-off-by: WoodieDudy <goshagks@gmail.com>
Signed-off-by: WoodieDudy <WoodieDudy@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.