Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use torch scaled_dot_product_attention #1

Draft
wants to merge 697 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
697 commits
Select commit Hold shift + click to select a range
78f57fe
force optimizer.param_groups to match mcore_optimizer.param_groups af…
ashors1 Aug 22, 2024
5269caf
Update TRTLLM 0.12 (#10215)
meatybobby Aug 22, 2024
42c2910
Tutorial: audio codec inference (#10186)
anteju Aug 22, 2024
753c70e
Move trt imports in nemo.collections.llm inside respective functions …
hemildesai Aug 23, 2024
d4d6a5b
Add tests for LazyNeMoIterator and fix case with metadata_only=True a…
pzelasko Aug 23, 2024
1c90b5e
[NeMo-UX] Fix a serialization bug that prevents users from moving che…
ashors1 Aug 23, 2024
6d1be93
Add MemoryProfileCallback (#10166)
ShriyaPalsamudram Aug 23, 2024
d415621
Lower bound transformers to support nemotron (#10240)
thomasdhc Aug 23, 2024
7cc99e9
[Audio] SSL Pretraining framework for flow-matching model for audio p…
Kuray107 Aug 24, 2024
8d9cfee
Revert torchrun fix for model import (#10251)
akoumpa Aug 26, 2024
642c97a
[NeMo-UX[ Move nemotron imports inline (#10255)
marcromeyn Aug 26, 2024
8210e9c
Wrap CPU model init with megatron_lazy_init_context (#10219)
akoumpa Aug 26, 2024
fad3414
sdpa work
WoodieDudy Aug 26, 2024
6f6fc27
Merge branch 'main' into sdpa-asr
titu1994 Aug 26, 2024
941c7f5
Apply isort and black reformatting
titu1994 Aug 26, 2024
ea8f49b
Bump `Dockerfile.ci` (2024-08-22) (#10227)
ko3n1g Aug 26, 2024
69973f9
salm export trtllm (#10245)
Slyne Aug 26, 2024
59a3e96
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to ef85bc9 ! (#10250)
ko3n1g Aug 27, 2024
49f13fb
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 01ca03f ! (#10266)
ko3n1g Aug 27, 2024
2f422dd
Load model in the target export precision by default in PTQ (#10267)
janekl Aug 27, 2024
fd75162
Add WandbPlugin, NsysPlugin and PreemptionPlugin to nemo.lightning.ru…
hemildesai Aug 27, 2024
38800cd
[NeMo-UX] Handle absolute logger directories in nemo_logger (#10259)
ashors1 Aug 27, 2024
57aa305
Add sdxl notebook (#10139)
Victor49152 Aug 27, 2024
19668e5
Add Llama31 Config (#10260)
suiyoubi Aug 27, 2024
c7c3eae
Added offloading support for LoRA adapters (#10237)
sanandaraj5597 Aug 27, 2024
f53600a
Add Qwen2 to Nemo 2 (#10258)
suiyoubi Aug 27, 2024
e68f981
Lazy import tokenizers (#10213)
akoumpa Aug 28, 2024
5ff7f22
add rampup bs documentation (#9884) (#10289)
dimapihtar Aug 28, 2024
4805fe9
Add Starcoder to Nemo 2 (#10230)
suiyoubi Aug 28, 2024
2438fa9
comment out ASR_dev_run_Speech_To_Text_HF_Finetuning until fixed (#10…
pablo-garay Aug 28, 2024
5040546
Adding a Garbage-collection callback for a synchronized garbage-colle…
gautham-kollu Aug 28, 2024
1d2d507
Do not overwrite wandb name in NeMo Logger (#10265)
hemildesai Aug 28, 2024
5bbfa53
Bump `Dockerfile.ci` (2024-08-28) (#10278)
ko3n1g Aug 28, 2024
60ac8aa
Multimodal trtllm export and infer script (#10287)
Slyne Aug 28, 2024
a860e6b
[TTS] Add config and modules for 22khz and 44khz audio codec (#10107)
rlangman Aug 28, 2024
f45422a
Add example script to run NeMo 2.0 llama pretraining with NeMo-Run (#…
hemildesai Aug 28, 2024
22f0bb0
Add FSDP for NeMo 2.0 (#9748)
blahBlahhhJ Aug 29, 2024
9796b69
Export fp8 te nemo to trt-llm (#10096)
Laplasjan107 Aug 29, 2024
3ed93c1
Bugfix: loading scaling factors for pyt 24.07 (#10297)
Laplasjan107 Aug 29, 2024
006d65f
Sanity checks for unfinished checkpoints removal (#10228)
jbieniusiewi Aug 29, 2024
cdf61f9
allow disabling validation (#10273)
maanug-nv Aug 29, 2024
736a6fc
make torch_dist ckpt strategy as default (#9852) (#10291)
dimapihtar Aug 29, 2024
ea0f69f
TRT-LLM 0.12 + ModelOpt 0.17.0 updates (#10301)
janekl Aug 29, 2024
eff7ddd
add documentation for reset_lr feature (#9639) (#10290)
dimapihtar Aug 29, 2024
3ebe567
[NeMo UX] expose `num_dataset_builder_threads` argument (#10281)
ashors1 Aug 29, 2024
d0128da
Disable SP (#10282)
akoumpa Aug 29, 2024
81f18f6
ci: Selective triggering (#10195)
ko3n1g Aug 29, 2024
4d5f1aa
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 9ab31cb ! (#10311)
ko3n1g Aug 30, 2024
b5d1d5f
Log Gradient Norms (#10244)
maanug-nv Aug 30, 2024
d886151
Add support for LoRA on vLLM (#10009)
apanteleev Aug 30, 2024
1ce9089
Flexible passing args to TensorRTLLM in nemo_export.py (#10315)
janekl Aug 30, 2024
e5f22a8
add back HF Finetune script to CI (#10308)
nithinraok Aug 30, 2024
a777a44
Add Yi 1.5 34b Neva support (#10083)
HuiyingLi Aug 30, 2024
0ba9979
move to cpu only for log probs (#10316)
nithinraok Aug 30, 2024
b87e1e3
[NeMo-UX] Don't create attention mask for GPTs (#10242)
JimmyZhang12 Aug 30, 2024
9a22005
Make get_optim_config iterable (#10318)
akoumpa Aug 31, 2024
b698ae5
Fix llama3 pretraining NeMo 2.0 script (#10307)
hemildesai Aug 31, 2024
78357ae
Support TE-DPA For Stable Diffusion (#10288)
alpha0422 Sep 2, 2024
8cd751b
fix tokenizer restoration (#10336)
akoumpa Sep 3, 2024
9472fc3
remove virtual pipeline parallel apex dependency (#10317)
ashors1 Sep 3, 2024
ac89593
Add option to selectively load context in nemo.lightning.io (#10279)
hemildesai Sep 3, 2024
8eb1827
Add EP to mixtral-8x22b recipe (#10337)
akoumpa Sep 3, 2024
ab6aba3
Bugfix: export to trt-llm multi_block_mode flag (#10334)
Laplasjan107 Sep 3, 2024
a1fd899
fix (#10339)
yaoyu-33 Sep 3, 2024
0d2d7c4
Add comment to address a frequently asked question (#10321)
cuichenx Sep 4, 2024
dd02d02
Fix async checkpointing in nemo.lightning (#10324)
hemildesai Sep 4, 2024
32ba985
[Draft]Add Nemotron4 recipes and Long Context Recipe (#10262)
BoxiangW Sep 4, 2024
8134f33
[NeMo-UX] Adding copyright to collections.llm & lightning (#10345)
marcromeyn Sep 4, 2024
73bec06
added support for FC model in Diarization with ASR and timestamps (#1…
KunalDhawan Sep 4, 2024
d8efee9
Remove apply_query_key_layer_scaling for GPT models (#10349)
suiyoubi Sep 4, 2024
7738b1d
remove grad clipping from mixed_precision plugin (#10303)
akoumpa Sep 5, 2024
19f904e
Add option to selectively restore model weights and optimizer states …
hemildesai Sep 5, 2024
e6db2f3
alltoall (#10357)
malay-nagda Sep 5, 2024
a567380
Fix links (#10359)
ericharper Sep 5, 2024
a9746a6
Improve TE import guards (#10322)
ashors1 Sep 5, 2024
5bd2b89
ci: Detect secrets (#10343)
ko3n1g Sep 5, 2024
fdf1979
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 3396356 ! (#10353)
ko3n1g Sep 5, 2024
1d5de59
[NeMo-UX] Turn on mcore performance optimizations (#10209)
JimmyZhang12 Sep 6, 2024
34393c6
[NeMo-UX] checkpointing improvements (#10241)
ashors1 Sep 6, 2024
ad5ef75
[Nemo Unit Tests] Split CPU unit tests (#10365)
pablo-garay Sep 6, 2024
95944ee
ci: Fix checkout of secrets detector (#10381)
ko3n1g Sep 6, 2024
7ba0681
only log consumed samples during training (#10371)
ashors1 Sep 6, 2024
62c1dce
Alit/mamba 2 0 migration (#10338)
JRD971000 Sep 7, 2024
9e372d3
[NeMo-UX] Checkpointing fixes (#10376)
ashors1 Sep 7, 2024
cda2a63
add auto configurator to NeMo (#10270)
dimapihtar Sep 7, 2024
f666682
fix mixtraltopk (#10366)
akoumpa Sep 8, 2024
e1f375e
ci: Fix release tag (#10367)
ko3n1g Sep 8, 2024
a26ed2f
Akoumparouli/nemo ux tokenizer fix (#10351)
akoumpa Sep 8, 2024
dd63de1
Add option to resume from specific path in AutoResume (#10373)
hemildesai Sep 8, 2024
6f1c414
ci: Cleanup of release-freeze automation (#10392)
ko3n1g Sep 8, 2024
ab82b56
ci: Toggle pre-release (#10394)
ko3n1g Sep 8, 2024
bcf7e0f
ci: Toggle pre-release (#10395)
ko3n1g Sep 8, 2024
21cb949
ci: Toggle pre-release (#10396)
ko3n1g Sep 8, 2024
30385aa
ci: Automate pre-release (#10397)
ko3n1g Sep 8, 2024
2404c4e
Akoumparouli/nemo ux validate dataset asset accessibility (#10309)
akoumpa Sep 8, 2024
9921e6c
[🤠]: Howdy folks, let's bump NeMo `2.1.0rc0` ! (#10399)
github-actions[bot] Sep 8, 2024
f6cd74b
ci: Update baseline (#10400)
ko3n1g Sep 8, 2024
94c5fd8
ci(chore): Minor change (#10401)
ko3n1g Sep 8, 2024
41502ff
ci: Swap merge/cherry-pick order (#10389)
ko3n1g Sep 8, 2024
19382eb
ci: Fix release tag (#10402)
ko3n1g Sep 8, 2024
73a8ef8
Ko3n1g/ci/fix release workflow 2 (#10403)
ko3n1g Sep 8, 2024
a4f95f1
ci: Send Slack alert on failed cherry pick (#10404)
ko3n1g Sep 8, 2024
0e5e5d5
ci: Allow concurrent docker system prune (#10405)
ko3n1g Sep 8, 2024
46e908e
ci: Use PAT for cherry-picking (#10406)
ko3n1g Sep 8, 2024
9f9bf4d
Alit/mamba ux cicd (#10370)
JRD971000 Sep 8, 2024
a95f3a2
ci: Allow default token to write workflows (#10407)
ko3n1g Sep 8, 2024
4bf8101
ci: More permissions for cherry-pick automation (#10409)
ko3n1g Sep 8, 2024
0a40662
ci: Overhaul cherry-pick workflow (#10410)
ko3n1g Sep 8, 2024
0d0e724
ci: Ignore failures on cherry-picking (#10411)
ko3n1g Sep 8, 2024
52c7f2a
ci: Minor change (#10412)
ko3n1g Sep 8, 2024
7d27792
ci: Fix cherry-pick config (#10413)
ko3n1g Sep 8, 2024
91863d2
ci: Minor change (#10414)
ko3n1g Sep 8, 2024
48fab9d
ci: Minor change (#10415)
ko3n1g Sep 8, 2024
573d910
ci: Remove dead code (#10416)
ko3n1g Sep 8, 2024
14c3d4a
Ko3n1g/ci/test cherry picking 2 (#10417)
ko3n1g Sep 8, 2024
aab78f0
ci: Small test (#10419)
ko3n1g Sep 8, 2024
b7ee0b8
ci: Small fix (#10420)
ko3n1g Sep 8, 2024
d12fbbd
[NeMo-UX] Integrating CLI (#10300)
marcromeyn Sep 9, 2024
fb39fad
[Nemo Unit Tests] Split GPU unit tests (#10380)
pablo-garay Sep 9, 2024
dc61f7a
Support Energon as dataloader in NeVA (#10305)
paul-gibbons Sep 9, 2024
8e3d65d
24.07 perf numbers (#10253)
malay-nagda Sep 9, 2024
ba7962e
remove scripts (#10427)
JRD971000 Sep 9, 2024
176c54f
Neva update to NeMo 2.0 (#10292)
yaoyu-33 Sep 9, 2024
e6f6a48
CICD: Attempt fix for False positive (not all tests have run) (#10436)
pablo-garay Sep 9, 2024
8f0d0c7
[Nemo CICD] Make flaky test optional (#10438)
pablo-garay Sep 9, 2024
4259169
[Nemo CICD] Make flaky test optional (#10442)
pablo-garay Sep 10, 2024
ae243d4
ci: Fix secrets detector on forks (#10426)
ko3n1g Sep 10, 2024
c3e6a6e
[Nemo CICD] Make flaky test optional (#10446)
pablo-garay Sep 10, 2024
18d81b1
Expand pyproject.toml to include package metadata for uv (#10350)
pstjohn Sep 10, 2024
1be9cc1
Make flaky test optional (#10448)
pablo-garay Sep 10, 2024
766ded5
[Nemo CICD] Make flaky test optional (#10450)
pablo-garay Sep 10, 2024
3c4def6
[Nemo CICD] Make flaky test optional (#10452)
pablo-garay Sep 10, 2024
79c4786
Make flaky test optional (#10456)
pablo-garay Sep 11, 2024
0df6610
Make flaky test optional (#10459)
pablo-garay Sep 11, 2024
fd8c6a4
add parakeet-tdt_ctc-110m model (#10461)
nithinraok Sep 11, 2024
1163e1e
Bump `Dockerfile.ci` (2024-09-09) (#10423)
ko3n1g Sep 11, 2024
46aa1ee
MCORE interface for TP-only FP8 AMAX reduction (#10437)
erhoo82 Sep 11, 2024
2089c53
Support MCORE Distributed Optimizer (#10363)
erhoo82 Sep 11, 2024
2c21e0a
Optional test needs optional field set true (#10475)
pablo-garay Sep 12, 2024
c611e53
Fix nemo run entrypoints (#10464)
hemildesai Sep 12, 2024
70f4426
[Nemo CICD] Make flaky test optional (#10476)
pablo-garay Sep 12, 2024
057041c
add ci tests for Auto Configurator (#10390)
dimapihtar Sep 12, 2024
cb3df0b
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 76f9f48 ! (#10477)
ko3n1g Sep 13, 2024
3a60491
[NeMo-UX] Rename weights path during resume (#10391)
ashors1 Sep 15, 2024
b5798de
[NeMo-UX] Use custom `BatchProgress` class which does not restore sta…
ashors1 Sep 15, 2024
9621be2
Remove Apex dependency if not using norm (#10468)
cuichenx Sep 16, 2024
0f8a531
Update adapter saving logic to be compatible with `save_weights_only`…
cuichenx Sep 16, 2024
62deef0
Akoumparouli/nemo ux update param name (#10441)
akoumpa Sep 16, 2024
99af1ce
Draft: Expose MCore Cudagraph interface (#10121)
JimmyZhang12 Sep 16, 2024
cc494c9
[NeMo-UX] Add token drop callback and optimize mixtral configs (#10361)
JimmyZhang12 Sep 16, 2024
a250726
fix partial audio transcription order: (#10379)
nithinraok Sep 16, 2024
d419955
ci: Fix hyperlink to PR (#10494)
ko3n1g Sep 16, 2024
16568d7
Flaky test optional until fixed (#10495)
pablo-garay Sep 16, 2024
8ff8804
fix NeMoLogger log -> log_dir rename (#10498)
akoumpa Sep 17, 2024
f6a905c
ci: Fix base branch of secrets detector (#10501)
ko3n1g Sep 17, 2024
9f67409
Call reload_model_params only if there's no optimizer state (#10470)
akoumpa Sep 17, 2024
cda4be3
ci: Disable flaky secrets test (#10503)
ko3n1g Sep 17, 2024
df3575a
Add missing import guards for causal_conv1d and mamba_ssm dependencie…
janekl Sep 17, 2024
308eaac
Update doc for fp8 trt-llm export (#10444)
Laplasjan107 Sep 17, 2024
16a1e0c
[SD] TE-DPA: disbale use te-dpa in inference flow. (#10488)
alpha0422 Sep 17, 2024
da993db
Add py-modules to pyproject.toml (#10509)
thomasdhc Sep 17, 2024
a7d1896
Add nemo2 conversion scripts for export (#10375)
meatybobby Sep 17, 2024
848bdfb
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 0bda578 ! (#10518)
ko3n1g Sep 18, 2024
bb4b5c6
Update modelopt to 0.17.0 (#10489)
janekl Sep 18, 2024
0504c92
add save_last_n_optim_states flag (#10098)
dimapihtar Sep 18, 2024
bb721f8
fix asr finetune (#10508)
stevehuang52 Sep 18, 2024
f4ef524
add a feature to drop checkpoint layers (#10200)
dimapihtar Sep 18, 2024
cd861e2
move test (#10529)
ashors1 Sep 18, 2024
07c1c80
handle logging case where grad_norm is None (#10457)
akoumpa Sep 19, 2024
05573d7
Make nemo_run dependency optional (llm/__init__ ) (#10453)
akoumpa Sep 19, 2024
8a244ff
move mamba installation (#10447)
akoumpa Sep 19, 2024
28851be
Update inference tests scripts and models (#10505)
janekl Sep 19, 2024
b721f12
Adds Llama 3.1 405b configurations (#10472)
Elnifio Sep 19, 2024
3e251a0
FP8 plugin recipes (#10208)
maanug-nv Sep 19, 2024
259744e
[nemo-ux] Added nemotron recipes and tests (#10432)
ahmadki Sep 19, 2024
3653bed
Pass mock to GPTDatasetConfig (#10435)
akoumpa Sep 19, 2024
7354740
added energon dataloader for neva training (#10451)
yashaswikarnati Sep 19, 2024
9e1ce6f
Add unit tests for model configs in nemo.collections.llm (#10497)
hemildesai Sep 19, 2024
8d3e561
nemo-ux: optim & model state restore test (#10325)
akoumpa Sep 19, 2024
45ff28f
Add copyright headers to nemo llm examples (#10543)
hemildesai Sep 19, 2024
d2d2aa0
upgrade librosa version to fix librosa.display.specshow issue, matplo…
github-actions[bot] Sep 20, 2024
44d2ae7
replace unbiased with correction (#10555)
nithinraok Sep 20, 2024
bc10d7c
Akoumparouli/nemo ux ckpt conv bugfix (#10558)
akoumpa Sep 21, 2024
d2af2a4
add autoresume to nemo 2 test (#10556)
ashors1 Sep 21, 2024
cfc9a6c
ci: Add original author as reviewer to cherry-pick (#10566)
ko3n1g Sep 22, 2024
e7e55b2
ci: Improve title of cherry-picked PR (#10568)
ko3n1g Sep 22, 2024
0ee4d7e
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to c394f78 ! (#10562)
ko3n1g Sep 22, 2024
ed23cc7
ci: Further improve cherry pick title (#10569)
ko3n1g Sep 22, 2024
f7f7d1a
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 811a26a ! (#10565)
ko3n1g Sep 22, 2024
cb5c2b7
ci: Send link to release page (#10570)
ko3n1g Sep 22, 2024
38c0e3d
ci: Add label to cherry pick PR (#10574)
ko3n1g Sep 22, 2024
eb274ab
[🤠]: Howdy folks, let's bump `Dockerfile.ci` to 8e69382 ! (#10577)
ko3n1g Sep 23, 2024
110db0c
Remove running validating after finetuning (#10560)
huvunvidia Sep 23, 2024
9ed0d6c
bugfix (#10561)
maanug-nv Sep 23, 2024
c6d1b7d
remove exp dir (#10460)
JRD971000 Sep 23, 2024
6400bd5
ci: Send direct alert on failed cherry-pick (#10588)
ko3n1g Sep 23, 2024
7439b13
Add ConfigValidation plugin to nemo.lightning.run (#10541)
hemildesai Sep 23, 2024
c02ea12
Fix pps issue on nemo export (#10544)
oyilmaz-nvidia Sep 23, 2024
53a10a7
fix type error in llm collection (#10552)
stevehuang52 Sep 24, 2024
6023b80
ci: Safer sequence escaping (#10595)
ko3n1g Sep 24, 2024
c4e4157
ci: Fix issues with version bump (#10467)
ko3n1g Sep 24, 2024
810d07f
ci: Add missing test specs (#10597)
ko3n1g Sep 24, 2024
0fad1c1
Extending modelopt spec for TEDotProductAttention (#10523)
janekl Sep 24, 2024
849e7e0
Update Multi_Task_Adapters.ipynb (#10600)
pzelasko Sep 24, 2024
f351f64
Change default for always_save_context to True (#10547)
athitten Sep 24, 2024
70bc06b
Import guard for SimpleMultiModalDataModule (#10592)
akoumpa Sep 24, 2024
9d5a1aa
add support for train_time_interval to consider hydra object (#10559)
nithinraok Sep 24, 2024
877144a
Move update_config_with_dtype_overrides logging to debug (#10602)
akoumpa Sep 24, 2024
0ec10d2
ci: Wrap into quotes (#10616)
ko3n1g Sep 25, 2024
e8304d6
Romeyn/sampler (#10525)
akoumpa Sep 25, 2024
e35a659
Add inference optimization blog post announcement to README (#10623)
pzelasko Sep 25, 2024
dcc3a16
Fix mb_calculator import in lora tutorial (#10624)
BoxiangW Sep 26, 2024
51f47f1
Fix LoRA contiguous tensor (#10611)
cuichenx Sep 26, 2024
016c1e4
Fix Clip initializing issue in r2.0.0 (#10585)
yaoyu-33 Sep 26, 2024
eee0137
Adding T5 to NeMo 2.0 (#10263)
huvunvidia Sep 26, 2024
a98c5ed
ci: Add CICD result feedback (#10629)
ko3n1g Sep 26, 2024
a6c2fef
.nemo conversion bug fix (#10598)
dimapihtar Sep 26, 2024
38e5e09
ci: Fix mention (#10635)
ko3n1g Sep 26, 2024
4a9a226
Fix asr warnings (#10469)
nithinraok Sep 26, 2024
ab4859b
ci: Fix hyperlink for feedback (#10637)
ko3n1g Sep 26, 2024
3e31500
sdpa flag to false & sdpa_backend arg
WoodieDudy Sep 26, 2024
d604f8b
Apply isort and black reformatting
WoodieDudy Sep 26, 2024
41acec1
change arg name
WoodieDudy Sep 26, 2024
e2aab5b
Apply isort and black reformatting
WoodieDudy Sep 26, 2024
5e66cad
Support LoRA in TensorRTMMExporter (#10347)
meatybobby Sep 26, 2024
5b88aaa
Nemo ux HF import tests (#10274)
akoumpa Sep 27, 2024
a725511
chore(ci): Increase shm to 64gb (#10656)
ko3n1g Sep 27, 2024
fdaf607
Add lazy init for export (#10613)
akoumpa Sep 27, 2024
4f59502
Update modelopt layer spec for Mixtral (#10660)
janekl Sep 27, 2024
d51d8b9
Update llm recipe README to add a note about handling multi-process j…
hemildesai Sep 27, 2024
cbb1344
Support Canary parallel inference (#9517)
karpnv Sep 27, 2024
23c7de1
adding resume pretraining to CICD (#10640)
huvunvidia Sep 27, 2024
fd78cc6
Require setuptools>=70 and update deprecated api (#10659)
thomasdhc Sep 30, 2024
9913441
Akoumparouli/fix get tokenizer list (#10596)
akoumpa Sep 30, 2024
d664b74
[NeMo-UX] Support `save_last="link"` (#10548)
ashors1 Sep 30, 2024
c0a05f6
Update the downloading path (#10662)
Victor49152 Sep 30, 2024
32503fd
ci: Stability to CI/CD (#10694)
ko3n1g Oct 1, 2024
7660730
Merge branch 'main' into sdpa-asr
nithinraok Oct 1, 2024
86e60c3
fix config args
WoodieDudy Oct 1, 2024
f712628
Apply isort and black reformatting
WoodieDudy Oct 1, 2024
fd78849
add condition on version
WoodieDudy Oct 3, 2024
1aec220
Apply isort and black reformatting
WoodieDudy Oct 3, 2024
e978045
update condition on version
WoodieDudy Oct 5, 2024
18e30ed
remove condition on torch version
WoodieDudy Oct 5, 2024
c95dc01
Apply isort and black reformatting
WoodieDudy Oct 5, 2024
ca21430
move code to init
WoodieDudy Oct 7, 2024
6741826
Apply isort and black reformatting
WoodieDudy Oct 7, 2024
5dee79f
refactor
WoodieDudy Oct 8, 2024
de3835d
Apply isort and black reformatting
WoodieDudy Oct 8, 2024
8af1241
refactor
WoodieDudy Oct 8, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
2 changes: 2 additions & 0 deletions .github/CODEOWNERS
Validating CODEOWNERS rules …
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
.github/ @pablo-garay @ko3n1g
Dockerfile.ci @pablo-garay @ko3n1g
35 changes: 35 additions & 0 deletions .github/ISSUE_TEMPLATE/dev_container_bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
container pulled on date: mm/dd/yyyy
name: Dev container - Bug report
about: Create a report to help us improve
title: ''
labels: bug
assignees: ''

---

**Describe the bug**

A clear and concise description of what the bug is.

**Steps/Code to reproduce bug**

Please list *minimal* steps or code snippet for us to be able to reproduce the bug.

A helpful guide on on how to craft a minimal bug report http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports.


**Expected behavior**

A clear and concise description of what you expected to happen.

**Environment overview (please complete the following information)**

- Environment location: Docker
- Method of install: Please specify exact commands you used to install.
- If method of install is [Docker], provide `docker pull` & `docker run` commands used

**Additional context**

Add any other context about the problem here.
Example: GPU model
7 changes: 7 additions & 0 deletions .github/labeler.yml
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,13 @@ TTS:
- tests/collections/tts/**
- tests/collections/common/tokenizers/text_to_speech/**

Audio:
- nemo/collections/audio/**/*
- examples/audio/**/*
- tutorials/audio/**/*
- docs/source/audio/**/*
- tests/collections/audio/**

core:
- nemo/core/**/*
- tests/core/**
Expand Down
23 changes: 0 additions & 23 deletions .github/scripts/slackHelper.sh

This file was deleted.

76 changes: 76 additions & 0 deletions .github/workflows/_test_template.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
name: ~test template

on:
workflow_call:
inputs:
RUNNER:
type: string
description: Runner to use for test
required: true
TIMEOUT:
type: number
description: Max runtime of test in minutes
required: false
default: 10
SCRIPT:
type: string
description: Test script to execute
required: true
AFTER_SCRIPT:
type: string
description: Script to run after main test
required: false
default: ":"
IS_OPTIONAL:
type: boolean
description: Failure will cancel all other tests if set to true
required: false
default: false
outputs:
conclusion:
description: Conclusion of main test step
value: ${{ jobs.main.outputs.conclusion }}
log:
description: Last 2000 characters of the test step's log
value: ${{ jobs.main.outputs.log }}
jobs:

main:
runs-on: ${{ inputs.RUNNER }}
outputs:
conclusion: ${{ steps.main.conclusion }}
log: ${{ steps.main.outputs.log }}
steps:
- name: Docker system cleanup
run: |
docker system prune -a --filter "until=48h" --force || true

- name: Docker pull image
run: |
docker pull nemoci.azurecr.io/nemo_container_${{ github.run_id }}

- id: main
name: Run main script
timeout-minutes: ${{ inputs.TIMEOUT }}
run: |
mkdir -p ${{ github.run_id }}
cd ${{ github.run_id }}/
set +e
(
set -e

docker run --rm --runtime=nvidia --gpus all --shm-size=64g --env TRANSFORMERS_OFFLINE=0 --env HYDRA_FULL_ERROR=1 --volume /mnt/datadrive/TestData:/home/TestData nemoci.azurecr.io/nemo_container_${{ github.run_id }} bash -c '${{ inputs.SCRIPT }}'
) 2> >(tee err.log)

EXIT_CODE=$?

echo "log=$(tail -c 2000 err.log | base64 -w 0)" >> "$GITHUB_OUTPUT"

exit $EXIT_CODE

- uses: "NVIDIA/NeMo/.github/actions/cancel-workflow@main"
if: failure() && inputs.IS_OPTIONAL == false
- name: after_script
if: always() && inputs.AFTER_SCRIPT != ':'
run: |
docker run --rm --runtime=nvidia --gpus all --shm-size=64g --env TRANSFORMERS_OFFLINE=0 --env HYDRA_FULL_ERROR=1 --volume /mnt/datadrive/TestData:/home/TestData nemoci.azurecr.io/nemo_container_${{ github.run_id }} bash -c '${{ inputs.AFTER_SCRIPT }}'
6 changes: 3 additions & 3 deletions .github/workflows/changelog-build.yml
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
name: 'Changelog Build (Release)'

on:
workflow_dispatch:
push:
tags:
- '*'

jobs:
changelog:
if: startsWith(github.ref, 'refs/tags/')
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
Expand Down Expand Up @@ -39,7 +39,7 @@ jobs:
ignorePreReleases: "false"
failOnError: "false"
fromTag: ${{ steps.previous_tag.outputs.tag_name }}
toTag: ${{ github.ref_name }}
toTag: ${{ github.ref_name || github.sha }}

- name: Print Changelog
run: |
Expand Down
136 changes: 123 additions & 13 deletions .github/workflows/cherry-pick-release-commit.yml
Original file line number Diff line number Diff line change
@@ -1,28 +1,138 @@
name: Create PR to main with cherry-pick from release

on:
pull_request_target:
push:
branches:
- 'r*.*.*'
types: ["closed"]
- main

jobs:
cherry-pick-release-commit:
name: Cherry-pick release commit
main:
runs-on: ubuntu-latest
environment:
name: main
steps:
- name: Checkout
uses: actions/checkout@v3
with:
fetch-depth: 0
- name: github-cherry-pick-action v1.0.3
uses: carloscastrojumo/github-cherry-pick-action@bb0869df47c27be4ae4c7a2d93d22827aa5a0054
with:
branch: main
labels: |
cherry-pick
reviewers: |
${{ github.event.pull_request.user.login }}
token: ${{ secrets.PAT }}


- name: Cherry pick
env:
GH_TOKEN: ${{ secrets.PAT }}
run: |
set -x
set +e

git config --global user.email "nemo-bot@nvidia.com"
git config --global user.name "NeMo Bot"

SHA=$(git rev-list --no-merges -n 1 HEAD)
MESSAGE=$(git log -n 1 --pretty=format:%s $SHA)
PR_ID=$(echo $MESSAGE | awk -F'#' '{print $2}' | awk -F')' '{print $1}' )
USERNAME=$(git log -n 1 --pretty=format:%ae $SHA | awk -F'@' '{print $1}')

PR=$(curl -L \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GH_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/NVIDIA/NeMo/pulls/$PR_ID)
PR_TITLE=$(echo -E $PR | jq '.title' | tr -d '"')

LABELS=$(echo -E $PR | jq '.labels | [.[].name] | join(",")' | tr -d '"')
AUTHOR=$(echo -E $PR | jq '.user.login' | tr -d '"')

TARGET_BRANCHES=$(echo "$LABELS" | grep -o 'r[^,]*')

if [[ $TARGET_BRANCHES == '' ]]; then
echo Nothing to cherry-pick
exit 0
fi

echo $TARGET_BRANCHES | while read -r RELEASE_BRANCH ; do
TARGET_BRANCH_EXISTS_OK=$([[ "$(git ls-remote --heads origin refs/heads/$RELEASE_BRANCH)" != "" ]] && echo true || echo false)

if [[ "$TARGET_BRANCH_EXISTS_OK" == "false" ]]; then
echo Release branch does not yet exist, will not cherry-pick
continue
fi

(
git fetch origin $RELEASE_BRANCH:$RELEASE_BRANCH
git switch --force-create cherry-pick-$PR_ID-$RELEASE_BRANCH $RELEASE_BRANCH
git cherry-pick $SHA
git push -u origin --force cherry-pick-$PR_ID-$RELEASE_BRANCH
git checkout ${CI_DEFAULT_BRANCH:-main}
)

CHERRYPICK_SUCCESSFUL=$?

if [[ $CHERRYPICK_SUCCESSFUL -eq 0 ]]; then
PR_URL="https://github.com/NVIDIA/NeMo/pull/$PR_ID"

PAYLOAD=$(jq \
-n \
-c \
--arg TITLE "Cherry pick \`$PR_TITLE ($PR_ID)\` into \`$RELEASE_BRANCH\`" \
--arg HEAD "cherry-pick-$PR_ID-$RELEASE_BRANCH" \
--arg RELEASE_BRANCH "$RELEASE_BRANCH" \
--arg BODY "[🤖]: Hi @$AUTHOR 👋,<br><br>we've cherry picked #$PR_ID into \`$RELEASE_BRANCH\` for you! 🚀<br><br>Please review and approve this cherry pick by your convenience\!" \
'{
"title": $TITLE,
"head": $HEAD,
"base": $RELEASE_BRANCH,
"body": $BODY
}'
)

NEW_PR=$(curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GH_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/NVIDIA/NeMo/pulls \
-d "$PAYLOAD")

NEW_PR_ID=$(echo -E $NEW_PR | jq '.number')
curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GH_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/NVIDIA/NeMo/pulls/$NEW_PR_ID/requested_reviewers \
-d '{"reviewers":["'$AUTHOR'"]}'

curl -L \
-X POST \
-H "Accept: application/vnd.github+json" \
-H "Authorization: Bearer $GH_TOKEN" \
-H "X-GitHub-Api-Version: 2022-11-28" \
https://api.github.com/repos/NVIDIA/NeMo/issues/$NEW_PR_ID/labels \
-d '{"labels":["Run CICD"]}'

else
URL="https://github.com/NVIDIA/NeMo/pull/$PR_ID"

MESSAGE='{
"blocks": [
{
"type": "section",
"text": {
"type": "mrkdwn",
"text": ":alert: Cherrypick bot 🤖: Hey <@'$USERNAME'>: Cherry-pick of <'$URL'|#'$PR_ID'> failed (3-way merge impossible). Please resolve manually and create a PR.\n\ncc: <@${{ secrets.SLACK_WEBHOOK_ADMIN }}>"
}
}
]
}'

curl -X POST -H "Content-type: application/json" --data "$MESSAGE" ${{ secrets.SLACK_WEBHOOK }}

fi

done



env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
Loading
Loading