Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ConvTasNet pretrained huggingface model inference setup #697

Open
Rodolfo-S opened this issue May 8, 2024 · 1 comment
Open

ConvTasNet pretrained huggingface model inference setup #697

Rodolfo-S opened this issue May 8, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Rodolfo-S
Copy link
Contributor

Rodolfo-S commented May 8, 2024

I'm trying to do some inferencing on this pretrained ConvTasNet single source enhancement model on hugging face and I'm getting notably poor output.

I tried passing an ~18.5 sec, 16kHz clean speech clip mixed with -40dB white Gaussian noise and the output seemed to have about the SNR and the scaling ballooned well passed +/-1 (max sample value around 1500). Additionally, the speech itself sounds slightly distorted.

I should note that I also tried passing just the clean speech to the model and got similar results, as far as added distortion goes.
image

I'm trying to figure out if I've configured everything correctly to inference using LambdaOverlapAdd. I mostly used the Process large audio files notebook as reference. Here's my code.

kernel_size = 32
stride = 16

model = torch.hub.load('mpariente/asteroid', 'conv_tasnet', 'JorisCos/ConvTasNet_Libri1Mix_enhsingle_16k')

continuous_nnet = LambdaOverlapAdd(
    nnet=model,
    n_src=1,
    window_size=kernel_size,
    hop_size=stride,
    window=None,
    reorder_chunks=False
)

in_tensor = torch.from_numpy(noisy_audio[None, None, :])
out_tensor = continuous_nnet.forward(in_tensor)

out_wav = out_tensor.numpy().squeeze()

Where noisy_audio is the 1-D noisy speech signal, and window_size and hop_size were inferred from the config provided on the hugging face page for the model.

Is there something I'm missing or doing wrong here?

@Rodolfo-S Rodolfo-S added the question Further information is requested label May 8, 2024
@MonolithFoundation
Copy link

The result looks extremly bad.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants