Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Canary parallel inference #9517

Merged
merged 35 commits into from
Sep 27, 2024
Merged

Support Canary parallel inference #9517

merged 35 commits into from
Sep 27, 2024

Conversation

karpnv
Copy link
Collaborator

@karpnv karpnv commented Jun 21, 2024

What does this PR do ?

Support Canary at transcribe_speech_parallel.py script

Collection: ASR

#python3 ./examples/asr/transcribe_speech_parallel.py model=./canary-1b.nemo predict_ds.manifest_filepath=./manifest.json output_path=/tmp trainer.devices=-1

PR Type:

  • [ V] New Feature
  • Bugfix
  • Documentation

Who can review?

@pzelasko

@github-actions github-actions bot added the ASR label Jun 21, 2024
@karpnv karpnv requested a review from pzelasko June 21, 2024 13:28
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
@@ -109,7 +112,9 @@ class ParallelTranscriptionConfig:
# att_context_size can be set for cache-aware streaming models with multiple look-aheads
att_context_size: Optional[list] = None

trainer: TrainerConfig = TrainerConfig(devices=-1, accelerator="gpu", strategy="ddp")
trainer: TrainerConfig = TrainerConfig(
devices=-1, accelerator="gpu", strategy="ddp", use_distributed_sampler=False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

be careful with distributed sampler setting here: non-lhotse datasets still likely require True. it might be better to just override this for EncDec model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, rm use_distributed_sampler=False

@@ -72,7 +72,7 @@ def __getitem__(self, cuts: CutSet) -> tuple[torch.Tensor, torch.Tensor, torch.T
prompts = None
prompts_lens = None

return audio, audio_lens, prompts_with_answers, prompts_with_answers_lens, prompts, prompts_lens
return audio, audio_lens, prompts_with_answers, prompts_with_answers_lens, prompts, prompts_lens, cuts
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not ideal as returning cuts here will transfer the data held in-memory across dataloading worker subprocesses to the main training/inference loop process. we should return cuts.drop_recordings() instead.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@pzelasko
Copy link
Collaborator

pzelasko commented Jun 24, 2024

+ like we talked offline, we have to work around global_rank being set incorrectly before trainer.predict() is called - the dataloader has to be initialized with the correct global_rank (and world_size) in order for lhotse's distributed sampler to work correctly

@titu1994
Copy link
Collaborator

trainer.global_rank is apriori set by PTL in slurm environment. It does not require model to be built or it's functions to be called c

Is this a case where PTL cannot apriori detect global rank ?

karpnv and others added 6 commits July 1, 2024 09:51
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
@karpnv karpnv requested a review from pzelasko July 1, 2024 17:07
Copy link
Contributor

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Jul 16, 2024
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Jul 24, 2024
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
@karpnv karpnv reopened this Aug 2, 2024
karpnv and others added 4 commits August 2, 2024 14:33
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Copy link
Contributor

github-actions bot commented Sep 8, 2024

This PR is stale because it has been open for 14 days with no activity. Remove stale label or comment or update or this will be closed in 7 days.

@github-actions github-actions bot added the stale label Sep 8, 2024
Copy link
Contributor

This PR was closed because it has been inactive for 7 days since being marked as stale.

@github-actions github-actions bot closed this Sep 15, 2024
@karpnv karpnv reopened this Sep 16, 2024
@karpnv
Copy link
Collaborator Author

karpnv commented Sep 27, 2024

Jenkins

@karpnv karpnv added Run CICD and removed Run CICD labels Sep 27, 2024
Copy link
Contributor

[🤖]: Hi @karpnv 👋,

I just wanted to let you know that, you know, a CICD pipeline for this PR just finished successfully ✨

So it might be time to merge this PR or like to get some approvals 🚀

But I'm just a 🤖 so I'll leave it you what to do next.

Have a great day!

//cc @ko3n1g

Copy link
Contributor

[🤖]: Hi @karpnv 👋,

I just wanted to let you know that, you know, a CICD pipeline for this PR just finished successfully ✨

So it might be time to merge this PR or like to get some approvals 🚀

But I'm just a 🤖 so I'll leave it you what to do next.

Have a great day!

//cc @ko3n1g

1 similar comment
Copy link
Contributor

[🤖]: Hi @karpnv 👋,

I just wanted to let you know that, you know, a CICD pipeline for this PR just finished successfully ✨

So it might be time to merge this PR or like to get some approvals 🚀

But I'm just a 🤖 so I'll leave it you what to do next.

Have a great day!

//cc @ko3n1g

@karpnv karpnv merged commit cbb1344 into main Sep 27, 2024
295 of 302 checks passed
@karpnv karpnv deleted the karpnv/canary_parallel branch September 27, 2024 22:19
monica-sekoyan pushed a commit that referenced this pull request Oct 14, 2024
* add Canary cats

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler=False

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* update lhotse

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* fix global_rank

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* OmegaConf.set_struct

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* review fix

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>

* predict_step return

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
Co-authored-by: karpnv <karpnv@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: pzelasko <pzelasko@users.noreply.github.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* add Canary cats

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler=False

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* update lhotse

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* fix global_rank

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* OmegaConf.set_struct

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* review fix

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>

* predict_step return

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
Co-authored-by: karpnv <karpnv@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: pzelasko <pzelasko@users.noreply.github.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
tomlifu pushed a commit to tomlifu/NeMo that referenced this pull request Oct 25, 2024
* add Canary cats

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler=False

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* update lhotse

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* fix global_rank

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* OmegaConf.set_struct

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* review fix

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>

* predict_step return

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
Co-authored-by: karpnv <karpnv@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: pzelasko <pzelasko@users.noreply.github.com>
Signed-off-by: Lifu Zhang <tomzhanglf@gmail.com>
hainan-xv pushed a commit to hainan-xv/NeMo that referenced this pull request Nov 5, 2024
* add Canary cats

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler=False

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* rm use_distributed_sampler

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* update lhotse

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* fix global_rank

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* OmegaConf.set_struct

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* review fix

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: karpnv <karpnv@users.noreply.github.com>

* Apply isort and black reformatting

Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>

* predict_step return

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>

---------

Signed-off-by: Nikolay Karpov <nkarpov@nvidia.com>
Signed-off-by: karpnv <karpnv@users.noreply.github.com>
Signed-off-by: Piotr Żelasko <petezor@gmail.com>
Signed-off-by: pzelasko <pzelasko@users.noreply.github.com>
Co-authored-by: karpnv <karpnv@users.noreply.github.com>
Co-authored-by: Piotr Żelasko <petezor@gmail.com>
Co-authored-by: pzelasko <pzelasko@users.noreply.github.com>
Signed-off-by: Hainan Xu <hainanx@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants