Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: compute fbank on selected device #1529

Merged
merged 2 commits into from
Nov 7, 2023
Merged

fix: compute fbank on selected device #1529

merged 2 commits into from
Nov 7, 2023

Conversation

hbredin
Copy link
Member

@hbredin hbredin commented Nov 5, 2023

@asr-pub: this is a more generic attempt at solving #1522 as it uses the internal self.device that can be set by WeSpeakerPretrainedEmbeddding.to(device) (and does not force using GPU when available). Can you please try it out and confirm that this solves your issue?

@juanmc2005: side effect of this PR is that it should solved #1518. Can you please try it out and confirm?

@juanmc2005
Copy link
Collaborator

@hbredin I confirm this solves #1518

@asr-pub
Copy link

asr-pub commented Nov 5, 2023

@hbredin Based on my previous testing, besides moving compute_fbank to the GPU, torch.vstack also runs very slowly on the CPU.

waveform_batch = torch.vstack(waveforms)
# (batch_size, 1, num_samples) torch.Tensor
mask_batch = torch.vstack(masks)
# (batch_size, num_frames) torch.Tensor

@hbredin
Copy link
Member Author

hbredin commented Nov 5, 2023

I don't observe this behavior on Google Colab (T4)

import torch
waveforms = [torch.randn(1, 160000) for i in range(32)]

%%timeit
torch.vstack(waveforms)
# 3.63 ms ± 380 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

gpu = torch.device("cuda")

%%timeit
torch.vstack([w.to(gpu) for w in waveforms])
# 6.37 ms ± 55.8 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Can you please double check?

@asr-pub
Copy link

asr-pub commented Nov 5, 2023

Sure, tomorrow when I'm at work, I'll run the program and see the results.

@asr-pub
Copy link

asr-pub commented Nov 6, 2023

@hbredin Hello, I've tested the behavior of the torch.vstack function on the CPU. When I run the following code, I've observed that this program consumes a significant amount of CPU resources, as shown in the figure below. It occupies 128 cores. When we run multiple processes in parallel to execute torch.vstack on the CPU, the performance becomes very slow. You can run this code on your computer and then use the 'top' command to check the CPU usage.

import torch
from loguru import logger
import time

a = torch.randn((1, 1, 160000))
b = (a,)

start = time.perf_counter()
for i in range(1000000):
    torch.vstack(b)
end = time.perf_counter()

logger.info(f"takes {end-start:>.2f}")


# output
# 2023-11-06 20:05:48.691 | INFO     | __main__:<module>:13 - takes 44.06

image

@hbredin
Copy link
Member Author

hbredin commented Nov 6, 2023

Thanks. But I think we are still missing a comparison with GPU, don't we?

@asr-pub
Copy link

asr-pub commented Nov 6, 2023

Thanks. But I think we are still missing a comparison with GPU, don't we?

GPU A100

import torch
from loguru import logger
import time

a = torch.randn((1, 1, 160000))
a = a.cuda()
b = (a,)

start = time.perf_counter()
for i in range(1000000):
    torch.vstack(b)
end = time.perf_counter()

logger.info(f"takes {end-start:>.2f}")

# output
# 2023-11-06 20:34:31.094 | INFO     | __main__:<module>:14 - takes 12.19

image

@hbredin
Copy link
Member Author

hbredin commented Nov 6, 2023

Can you please check what happens when reducing OMP_NUM_THREADS?
See https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#utilize-openmp

@asr-pub
Copy link

asr-pub commented Nov 7, 2023

Can you please check what happens when reducing OMP_NUM_THREADS? See https://pytorch.org/tutorials/recipes/recipes/tuning_guide.html#utilize-openmp

export OMP_NUM_THREADS=64,It occupies 64 cores. The runtime has been reduced, which came as a pleasant surprise to me.

2023-11-07 14:20:28.237 | INFO     | __main__:<module>:13 - takes 25.43

@grazder
Copy link

grazder commented Nov 7, 2023

#1523

I tried this fix and it didn't work for me (i described it in following issue), so this fix looks more workable for me

@hbredin hbredin merged commit 40fa67b into develop Nov 7, 2023
3 checks passed
@hbredin hbredin deleted the fix/fbank_device branch November 7, 2023 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants