Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Demanding CPU Utilization? #5

Open
JeromeNi opened this issue Jan 10, 2022 · 3 comments
Open

Demanding CPU Utilization? #5

JeromeNi opened this issue Jan 10, 2022 · 3 comments

Comments

@JeromeNi
Copy link

I created my own dataset based on the provided templates. The training set consists of around 100h of audio/18000 utterances while the dev evaluation set consists of around 2000 utterances. I also extracted all mel-spectrograms beforehand instead of computing them on-the-fly. I trained with 1 GPU (V100).

First, I found that even loading the dev set takes around an hour. During training, the code sometimes just hangs at a single step, and during that time I see 100% CPU utilization for all the workers and GPU utilIization in nvidia-smi is 0%. I tried to set the num_workers to 0, 8, or 80 (total number of CPUs) and this happens for all three cases. With 80 workers, I only managed to do an initial validation check and two training epochs in around 10 hours.

Is it normal, and is there any way to speed it up?

Thanks for your help!

@dhchoi99
Copy link
Owner

Yes, that also happened to me. I found out that parselmouth and praat augmentation hangs for some unknown reason. It seemed like the problem happens if unappropriate voice segment is given to parselmouth.

Switching the order of dataset code from 1. cropping the audio 2. augmenting the audio to 1. augmenting the audio 2. cropping the audio worked fine with me.
I fixed it at 2a234ba, so please try with most recent master branch.

@dhchoi99
Copy link
Owner

dhchoi99 commented Feb 9, 2022

After fixing issue of YannickJadoul/Parselmouth#68 (f7bddba),
dataloader speed seems to be a problem related to hardware spec.

For me, when using

  • 1 Tesla V100 GPU + 32 Intel(R) Xeon(R) Silver 4110 CPU,
  • batch_size=32, num_workers=16,
  • without extracted mel-spectrogram
    I had average [1.4s/it] while looping for the first epoch.

Although it may depend on each utterance length, your case sounds somewhat weird.
Could you share your hardware spec?

@dhchoi99 dhchoi99 reopened this Feb 9, 2022
@JeromeNi
Copy link
Author

JeromeNi commented Feb 12, 2022

I was previously using an IBM Power9 (ppc64le) node with 80 CPUs and 1 out of the 4 Tesla V100 on that node, as multi-GPU was stuck in initialization. I tried num_workers from 0 to 80 and it seemed that with more workers it was always faster, but never faster than 14s/it and it was prone to get stuck on some iteration for much much longer. However, that might be because I was adapting NANSY to 16kHz LibriSpeech utterances, and those were very long.

I haven't tried the newest commit here yet; I will see if the issue is resolved when some server bandwidth opens up between my current projects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants