Backwards breaking change for MP3 loading in 0.12 #2652

patrickvonplaten · 2022-08-26T08:20:04Z

🐛 Describe the bug

It is written pretty clearly in the release notes that there is a breaking change when loading MP3 files:

MP3 decoding is now handled by FFmpeg in sox_io backend. (https://github.com/pytorch/audio/pull/2419, https://github.com/pytorch/audio/pull/2428)
FFmpeg is now used as fallback in sox_io backend, and now MP3 decoding is handled by FFmpeg. To load MP3 audio with torchaudio.load, please install a compatible version of FFmpeg (Version 4 when using an official binary distribution).
Note that, whereas the previous MP3 decoding scheme pads the output audio, the new scheme does not. As a consequence, the new version returns shorter audio tensors.
torchaudio.info now returns num_frames=0 for MP3.

However I think it's a pretty extreme change that could have been done also more over time with an option to fall back to the previous backend (or is this somehow possible?).

When running the following experiments, one can see that it's not just 0-padding that has change, there are also numerical differences now. E.g. if you can the following code snippet (with the respective versions 0.11 and 0.12:

#!/usr/bin/env python3
# download https://file-examples.com/storage/fe8bd9dfd063066d39cfd5a/2017/11/file_example_MP3_1MG.mp3
import numpy as np
import torch
import torchaudio
print("torch vesion", torch.__version__)
print("torchaudio vesion", torchaudio.__version__)

save_audio = True
load_audios = False

array, _ = torchaudio.load("./file_example_MP3_1MG.mp3")

if "0.11" in torchaudio.__version__:
    print("Array 11 Shape", array.shape)
    print("Array 11 abs sum", np.sum(np.abs(array.numpy())))
if "0.12" in torchaudio.__version__:
    print("Array 12 Shape", array.shape)
    print("Array 13 abs sum", np.sum(np.abs(array.numpy())))

gives:

torch vesion 1.11.0+cu102
torchaudio vesion 0.11.0+cu102
Array 11 Shape torch.Size([2, 1198080])
Array 11 abs sum 255783.23

and

torch vesion 1.12.1+cu102
torchaudio vesion 0.12.1+cu102
Array 12 Shape torch.Size([2, 1196135])
Array 13 abs sum 255885.97

This is a pretty big numerical difference IMO and too much of a backwards breaking change.

It broke of test in Transformers: huggingface/transformers#18749 - we're getting different values now for time steps with Wav2Vec2.
and also lead to some problems in Datasets: huggingface/datasets#4776

Sorry for the very long message! Is there any way we could still fall back to the previous backend in 0.12 and adapt more slowly to the new (breaking) MP3 decoding?

Versions

PyTorch version: 1.11.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Pop!_OS 21.10 (x86_64)
GCC version: (Ubuntu 11.2.0-7ubuntu2) 11.2.0
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.34

Python version: 3.9.7 (default, Jun 22 2022, 20:11:26)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.18.10-76051810-generic-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 470.129.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] pytorch-lightning==1.6.1
[pip3] pytorch-nlp==0.5.0
[pip3] pytorch-pretrained-biggan==0.1.1
[pip3] torch==1.11.0
[pip3] torch-scatter==2.0.9
[pip3] torchaudio==0.11.0
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.12.0+cu113
[conda] Could not collect

(note that torch and torchaudio was both 1.11/1.12 and 0.11/0.12 respectively)

The text was updated successfully, but these errors were encountered:

mthrok · 2023-07-31T16:30:44Z

Hi

Sorry for the inconvenience and sorry for not replying. I missed this issue filed as I was on leave.
We had to do some unusual change with libsox and mp3.

Recently, we changed the way torchaudio integrates with libsox #3497, and now it relies on externally installed libsox, which most likely has mp3 support, therefore, I think is now resolved.

mthrok closed this as completed Jul 31, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Backwards breaking change for MP3 loading in 0.12 #2652

Backwards breaking change for MP3 loading in 0.12 #2652

patrickvonplaten commented Aug 26, 2022 •

edited

Loading

mthrok commented Jul 31, 2023

Backwards breaking change for MP3 loading in 0.12 #2652

Backwards breaking change for MP3 loading in 0.12 #2652

Comments

patrickvonplaten commented Aug 26, 2022 • edited Loading

🐛 Describe the bug

Versions

mthrok commented Jul 31, 2023

patrickvonplaten commented Aug 26, 2022 •

edited

Loading