torchaudio 11.0 yields different results than torchaudio 12.1 when loading MP3 #4889

patrickvonplaten · 2022-08-24T16:54:43Z

Describe the bug

When loading Common Voice with torchaudio 0.11.0 the results are different to 0.12.1 which leads to problems in transformers see: huggingface/transformers#18749

Steps to reproduce the bug

If you run the following code once with torchaudio==0.11.0+cu102 and torchaudio==0.12.1+cu102 you can see that the tensors differ. This is a pretty big breaking change and makes some integration tests fail in Transformers.

#!/usr/bin/env python3
from datasets import load_dataset
import datasets
import numpy as np
import torch
import torchaudio
print("torch vesion", torch.__version__)
print("torchaudio vesion", torchaudio.__version__)

save_audio = True
load_audios = False

if save_audio:
    ds = load_dataset("common_voice", "en", split="train", streaming=True)
    ds = ds.cast_column("audio", datasets.Audio(sampling_rate=16_000))
    ds_iter = iter(ds)
    sample = next(ds_iter)

    np.save(f"audio_sample_{torch.__version__}", sample["audio"]["array"])
    print(sample["audio"]["array"])

if load_audios:
    array_torch_11 = np.load("/home/patrick/audio_sample_1.11.0+cu102.npy")
    print("Array 11 Shape", array_torch_11.shape)
    print("Array 11 abs sum", np.sum(np.abs(array_torch_11)))
    array_torch_12 = np.load("/home/patrick/audio_sample_1.12.1+cu102.npy")
    print("Array 12 Shape", array_torch_12.shape)
    print("Array 12 abs sum", np.sum(np.abs(array_torch_12)))

Having saved the tensors the print output yields:

torch vesion 1.12.1+cu102
torchaudio vesion 0.12.1+cu102
Array 11 Shape (122880,)
Array 11 abs sum 1396.4988
Array 12 Shape (123264,)
Array 12 abs sum 1396.5193

Expected results

torchaudio 11.0 and 12.1 should yield same results.

Actual results

See above.

Environment info

datasets version: 2.1.1.dev0
Platform: Linux-5.18.10-76051810-generic-x86_64-with-glibc2.34
Python version: 3.9.7
PyArrow version: 6.0.1
Pandas version: 1.4.2

The text was updated successfully, but these errors were encountered:

patrickvonplaten · 2022-08-24T16:55:17Z

Maybe we can just pass this along to torchaudio @lhoestq @albertvillanova ? It be great if you could investigate if the errors lies in datasets or in torchaudio.

lhoestq · 2022-08-24T17:09:15Z

torchaudio did a change in 0.12 on MP3 decoding (which affects common voice):

MP3 decoding is now handled by FFmpeg in sox_io backend. (pytorch/audio#2419, pytorch/audio#2428)

FFmpeg is now used as fallback in sox_io backend, and now MP3 decoding is handled by FFmpeg. To load MP3 audio with torchaudio.load, please install a compatible version of FFmpeg (Version 4 when using an official binary distribution).

Note that, whereas the previous MP3 decoding scheme pads the output audio, the new scheme does not. As a consequence, the new version returns shorter audio tensors.

patrickvonplaten · 2022-10-05T10:11:57Z

Do we have a solution for this now? Should we just upgrade to torchaudio 0.12.0 then?

lhoestq · 2022-10-05T13:54:04Z

datasets supports torchaudio 0.12 if you have an environment that supports reading MP3 with torchaudio, i.e. if you have ffmpeg>=4

mariosasko · 2023-03-02T15:33:04Z

Closing as we no longer use torchaudio for decoding.

patrickvonplaten added the bug Something isn't working label Aug 24, 2022

patrickvonplaten mentioned this issue Aug 24, 2022

RuntimeError when using torchaudio 0.12.0 to load MP3 audio file #4776

Closed

2 tasks

kradonneoh mentioned this issue Jan 31, 2023

Error loading MP3 files from CommonVoice #5488

Closed

mariosasko closed this as completed Mar 2, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

torchaudio 11.0 yields different results than torchaudio 12.1 when loading MP3 #4889

torchaudio 11.0 yields different results than torchaudio 12.1 when loading MP3 #4889

patrickvonplaten commented Aug 24, 2022

patrickvonplaten commented Aug 24, 2022

lhoestq commented Aug 24, 2022 •

edited

Loading

patrickvonplaten commented Oct 5, 2022

lhoestq commented Oct 5, 2022

mariosasko commented Mar 2, 2023

torchaudio 11.0 yields different results than torchaudio 12.1 when loading MP3 #4889

torchaudio 11.0 yields different results than torchaudio 12.1 when loading MP3 #4889

Comments

patrickvonplaten commented Aug 24, 2022

Describe the bug

Steps to reproduce the bug

Expected results

Actual results

Environment info

patrickvonplaten commented Aug 24, 2022

lhoestq commented Aug 24, 2022 • edited Loading

patrickvonplaten commented Oct 5, 2022

lhoestq commented Oct 5, 2022

mariosasko commented Mar 2, 2023

lhoestq commented Aug 24, 2022 •

edited

Loading