Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix mixtraltopk #10366

Merged
merged 4 commits into from
Sep 8, 2024
Merged

fix mixtraltopk #10366

merged 4 commits into from
Sep 8, 2024

Conversation

akoumpa
Copy link
Collaborator

@akoumpa akoumpa commented Sep 6, 2024

What does this PR do ?

fix topk 1 -> 2

Collection: [Note which collection this PR will affect]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

  • You can potentially add a usage example below
# Add a code snippet demonstrating how to use this 

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
@akoumpa akoumpa self-assigned this Sep 6, 2024
@akoumpa akoumpa added Run CICD and removed Run CICD labels Sep 8, 2024
@akoumpa akoumpa merged commit f666682 into main Sep 8, 2024
143 of 144 checks passed
@akoumpa akoumpa deleted the akoumparouli/nemo_ux_mixtral_config_fix branch September 8, 2024 15:52
ssh-meister added a commit that referenced this pull request Sep 9, 2024
* [🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3396356 ! (#10353)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>

* [NeMo-UX] Turn on mcore performance optimizations (#10209)

* expose TP overlap

Signed-off-by: Jieming Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add tp overlap recipes

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* turn on pipeline parallel overlap

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* refactor

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* Update base.py

Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>

* Update megatron_parallel.py

Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>

* remove env var

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* add optimization config

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix typo

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* refactor into megatron parallel setup

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* refactor

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix config ordering, add wgrad deferral

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* cleanup

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* use config

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* clean

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* enable wgrad defferal

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add grad bucket size

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* move everthing into a callback

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* cleanup

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix imports

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* move userbuffer init

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* cleanup

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* fix VP

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* address comments

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* add gradient accum guard

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* Update base.py

Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>

* address comments

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* address comments

Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

---------

Signed-off-by: Jieming Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jieming Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>

* [NeMo-UX] checkpointing improvements (#10241)

* save model weights and artifacts to separate directories

Signed-off-by: ashors1 <ashors@nvidia.com>

* add save_artifacts_on_train_end

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* do not save optimizer states in final checkpoint

Signed-off-by: ashors1 <ashors@nvidia.com>

* WIP support for saving only last k optimizer states

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* minor cleanup

Signed-off-by: ashors1 <ashors@nvidia.com>

* Revert support for saving last k optimizer states. This will be addressed in a subsequent PR.

* use storage_options to determine when to skip saving optimizer states

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* fix variable names, make checkpoint load work when optimizer states don't exist in the checkpoint

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* FSDP updates, provide option to save optimizer states on_train_end

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* simplify implementation, remove save_best_model option

Signed-off-by: ashors1 <ashors@nvidia.com>

* update default value of ckpt_include_optimizer for fsdp

Signed-off-by: ashors1 <ashors@nvidia.com>

* remove unused imports

Signed-off-by: ashors1 <ashors@nvidia.com>

* remove unused import

Signed-off-by: ashors1 <ashors@nvidia.com>

* cleanup

Signed-off-by: ashors1 <ashors@nvidia.com>

* make storage_options optional again

Signed-off-by: ashors1 <ashors@nvidia.com>

* fix failing tests

Signed-off-by: ashors1 <ashors@nvidia.com>

* address some comments

Signed-off-by: ashors1 <ashors@nvidia.com>

* use save_weights_only to determine whether to save optimizer states

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* add some comments

Signed-off-by: ashors1 <ashors@nvidia.com>

* fix tests

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* fixes

Signed-off-by: ashors1 <ashors@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: ashors1 <ashors1@users.noreply.github.com>

* remove unnecessary line

Signed-off-by: ashors1 <ashors@nvidia.com>

---------

Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>

* [Nemo Unit Tests] Split CPU unit tests (#10365)

* Split CPU unit tests

* Split CPU unit tests

* Fix:Run pytest in specific paths

* Fix:Run pytest in specific paths

* Fix:Run pytest in specific paths

* ci: Fix checkout of secrets detector (#10381)

* ci: Fix checkout of secrets detector

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* only log consumed samples during training (#10371)

Signed-off-by: ashors1 <ashors@nvidia.com>

* Alit/mamba 2 0 migration (#10338)

* [NeMo-UX] Checkpointing fixes (#10376)

* remove save_best_model from default logger

Signed-off-by: ashors1 <ashors@nvidia.com>

* fix broken checkpoint restore

Signed-off-by: ashors1 <ashors@nvidia.com>

* fix fsdp

Signed-off-by: ashors1 <ashors@nvidia.com>

* rename weights path to avoid confusion

Signed-off-by: ashors1 <ashors@nvidia.com>

* Revert "rename weights path to avoid confusion". We'll add this in a separate PR

This reverts commit 72bae8b.

---------

Signed-off-by: ashors1 <ashors@nvidia.com>

* add auto configurator to NeMo (#10270)

* add base configs

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add auto configurator functionality

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* add runner

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add end-to-end example for auto configurator

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add unit tests for auto configurator

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add GPT configs

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add GPT configs

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* switch to dataclass

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* switch to dataclass

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix dataclasses usage

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove unused imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove extra function

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix docstring style

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* take Config object as input for model

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* add nemotron support

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove search_config.py

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* move configs creation to Basic class

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* move to common basic class

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* rename main config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove base configs for models

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: artbataev <artbataev@users.noreply.github.com>

* change auto conf functionality

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix docstring

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* remove unused imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add changes

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove activations_checkpoint_num_layers

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove gbs from config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix logs

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix performance calculation

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix end-to-end example

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix model config

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* minor changes

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* minor changes

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* fix unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* Apply isort and black reformatting

Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>

* add README

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix README

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix README

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix readme

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix readme

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove extra arg

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* remove unused imports

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* add nemo-run installation

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

* fix unit tests

Signed-off-by: dimapihtar <dpihtar@gmail.com>

---------

Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>

* fix mixtraltopk (#10366)

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>

* ci: Fix release tag (#10367)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Akoumparouli/nemo ux tokenizer fix (#10351)

* save tokenizer to disk

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Track Hf tokenizer assets

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* raise exception if dst file exists

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* minor

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* remove print

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add tokenizercontext

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add TokenizerContext

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* restore tokenizer from separate dir

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* update artifact __init__.py

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* TokenizerContext connector

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* bugix on_import_ckpt

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm code

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Drop tokenizercontext

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* drop tokenizer load from tokenizercontext

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* undo

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Move to util function

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* use save_hf_tokenizer_assets

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* add tokenizer restoration in resume.py

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* bot fixes

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* rm

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* wrap tokenizer restoration in try/catch

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* load_artifacts

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* param fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* more fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* lazy import tensorboard

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* move code out of file context manager

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Allow skippable artifacts

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* rebase fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* checkpoint structure change update

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* Add option to resume from specific path in AutoResume (#10373)

* Add option to resume from specific path in AutoResume

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Fix path

Signed-off-by: Hemil Desai <hemild@nvidia.com>

---------

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* ci: Cleanup of release-freeze automation (#10392)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Toggle pre-release (#10394)

* ci: Toggle pre-release

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* f

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Toggle pre-release (#10395)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Toggle pre-release (#10396)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Automate pre-release (#10397)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Akoumparouli/nemo ux validate dataset asset accessibility (#10309)

* Add validate_dataset_asset_accessibility

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Add CI tests for validate_dataset_asset_accessibility

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* fix for zipped lists

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>

* fix

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>

* [🤠]: Howdy folks, let's bump NeMo `2.1.0rc0` ! (#10399)

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ko3n1g <16716991+ko3n1g@users.noreply.github.com>

* ci: Update baseline (#10400)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci(chore): Minor change (#10401)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Swap merge/cherry-pick order (#10389)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Fix release tag (#10402)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Ko3n1g/ci/fix release workflow 2 (#10403)

* ci: Improve release workflow

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Fix cherry-picking

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Send Slack alert on failed cherry pick (#10404)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Allow concurrent docker system prune (#10405)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Use PAT for cherry-picking (#10406)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Alit/mamba ux cicd (#10370)

* add mamba init

* more ssm

* add 370m

* add hybrid

* fix issue

* integrate model and tokenizer config for ssm

* add all mamba configs

* modify state re pattern

* revert gpt stuff

* remove SSM class and training script

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* remove faulty export

* add script to test

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* some recent fixes

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* test script tp/pp1

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* add cicd

* include MLM mamba dist ckpt commit

* add license head and address more comments

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

* add guard

* remove guard from TransformerConfig

* update scripts

* Apply isort and black reformatting

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>

---------

Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: Ali Taghibakhshi <ataghibakhsh@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>

* ci: Allow default token to write workflows (#10407)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: More permissions for cherry-pick automation (#10409)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Overhaul cherry-pick workflow (#10410)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Ignore failures on cherry-picking (#10411)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Minor change (#10412)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Fix cherry-pick config (#10413)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Minor change (#10414)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Minor change (#10415)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Remove dead code (#10416)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* Ko3n1g/ci/test cherry picking 2 (#10417)

* ci: Cherrypick continue on error

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Fix cherry pick branch

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

---------

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Small test (#10419)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* ci: Small fix (#10420)

Signed-off-by: Oliver Koenig <okoenig@nvidia.com>

* [NeMo-UX] Integrating CLI (#10300)

* Adding nemo-run to requirements

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Updating nemo-run entrypoint inside setup.py

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Remove nemo-run from requirements until we have a pypi package

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Update entrypoint naming

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Setting up cli recipe for llama3-8b

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import inline for starcoder

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move AutoTokenizer import inline for starcoder2

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Use target for factories inside llama3_8b

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Update other recipes

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Fix some bugs in the recipes

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Adding some examples

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Adding repl example

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Starting to add a notebook example as well

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Fix wrong imports

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* [pre-commit.ci] auto fixes from pre-commit.com hooks

for more information, see https://pre-commit.ci

* Apply isort and black reformatting

Signed-off-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com>

* Fix wrong imports

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fix typo + add script with default executor

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Add nemo-run to Dockerfile.ci

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Adding copyright to recipes

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Adding guides to recipes dir

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Adding hatchling to Dockerfile.ci

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Move install to different line

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* fix install

Signed-off-by: Hemil Desai <hemild@nvidia.com>

* Move llama3_pretraining to scripts for now

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Remove img folder & use images from release instead

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Updating default of num_nodes in all recipes

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Adding tests for all recipes

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>

* ddAing docstrings

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>

* Fix failing tests inside test_mixtral_8x7b_64k

Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>

* Rename fabric to _fabric to avoid name collision with package fabric

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

* add rename comment

Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>

---------

Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>

---------

Signed-off-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Signed-off-by: Jieming Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Signed-off-by: Jimmy Zhang <jiemingz@nvidia.com>
Signed-off-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Signed-off-by: ashors1 <ashors@nvidia.com>
Signed-off-by: ashors1 <ashors1@users.noreply.github.com>
Signed-off-by: Oliver Koenig <okoenig@nvidia.com>
Signed-off-by: dimapihtar <dpihtar@gmail.com>
Signed-off-by: dimapihtar <dimapihtar@users.noreply.github.com>
Signed-off-by: artbataev <artbataev@users.noreply.github.com>
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Signed-off-by: akoumpa <akoumpa@users.noreply.github.com>
Signed-off-by: Hemil Desai <hemild@nvidia.com>
Signed-off-by: JRD971000 <JRD971000@users.noreply.github.com>
Signed-off-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Signed-off-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com>
Signed-off-by: marcromeyn <marcromeyn@users.noreply.github.com>
Signed-off-by: Marc Romeijn <mromeijn@nvidia.com>
Signed-off-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: oliver könig <okoenig@nvidia.com>
Co-authored-by: pablo-garay <7166088+pablo-garay@users.noreply.github.com>
Co-authored-by: JimmyZhang12 <67203904+JimmyZhang12@users.noreply.github.com>
Co-authored-by: Jieming Zhang <jiemingz@nvidia.com>
Co-authored-by: JimmyZhang12 <JimmyZhang12@users.noreply.github.com>
Co-authored-by: Anna Shors <71393111+ashors1@users.noreply.github.com>
Co-authored-by: ashors1 <ashors1@users.noreply.github.com>
Co-authored-by: Pablo Garay <palenq@gmail.com>
Co-authored-by: Ali Taghibakhshi <71892896+JRD971000@users.noreply.github.com>
Co-authored-by: Dmytro Pykhtar <37850217+dimapihtar@users.noreply.github.com>
Co-authored-by: dimapihtar <dimapihtar@users.noreply.github.com>
Co-authored-by: artbataev <artbataev@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <153118171+akoumpa@users.noreply.github.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Co-authored-by: akoumpa <akoumpa@users.noreply.github.com>
Co-authored-by: Hemil Desai <hemild@nvidia.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ko3n1g <16716991+ko3n1g@users.noreply.github.com>
Co-authored-by: Ali Taghibakhshi <ataghibakhsh@login-eos01.eos.clusters.nvidia.com>
Co-authored-by: JRD971000 <JRD971000@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: marcromeyn <marcromeyn@users.noreply.github.com>
Co-authored-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
adityavavre pushed a commit to adityavavre/NeMo that referenced this pull request Sep 15, 2024
Signed-off-by: Alexandros Koumparoulis <akoumparouli@nvidia.com>
Co-authored-by: Marc Romeyn <mromeijn@nvidia.com>
Signed-off-by: adityavavre <aditya.vavre@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants