ConvTasNet pretrained huggingface model inference setup #697

Rodolfo-S · 2024-05-08T00:18:49Z

I'm trying to do some inferencing on this pretrained ConvTasNet single source enhancement model on hugging face and I'm getting notably poor output.

I tried passing an ~18.5 sec, 16kHz clean speech clip mixed with -40dB white Gaussian noise and the output seemed to have about the SNR and the scaling ballooned well passed +/-1 (max sample value around 1500). Additionally, the speech itself sounds slightly distorted.

I should note that I also tried passing just the clean speech to the model and got similar results, as far as added distortion goes.

I'm trying to figure out if I've configured everything correctly to inference using LambdaOverlapAdd. I mostly used the Process large audio files notebook as reference. Here's my code.

kernel_size = 32
stride = 16

model = torch.hub.load('mpariente/asteroid', 'conv_tasnet', 'JorisCos/ConvTasNet_Libri1Mix_enhsingle_16k')

continuous_nnet = LambdaOverlapAdd(
    nnet=model,
    n_src=1,
    window_size=kernel_size,
    hop_size=stride,
    window=None,
    reorder_chunks=False
)

in_tensor = torch.from_numpy(noisy_audio[None, None, :])
out_tensor = continuous_nnet.forward(in_tensor)

out_wav = out_tensor.numpy().squeeze()

Where noisy_audio is the 1-D noisy speech signal, and window_size and hop_size were inferred from the config provided on the hugging face page for the model.

Is there something I'm missing or doing wrong here?

The text was updated successfully, but these errors were encountered:

MonolithFoundation · 2024-12-13T09:52:19Z

The result looks extremly bad.

Rodolfo-S added the question Further information is requested label May 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ConvTasNet pretrained huggingface model inference setup #697

ConvTasNet pretrained huggingface model inference setup #697

Rodolfo-S commented May 8, 2024 •

edited

Loading

MonolithFoundation commented Dec 13, 2024

ConvTasNet pretrained huggingface model inference setup #697

ConvTasNet pretrained huggingface model inference setup #697

Comments

Rodolfo-S commented May 8, 2024 • edited Loading

MonolithFoundation commented Dec 13, 2024

Rodolfo-S commented May 8, 2024 •

edited

Loading