Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Backwards breaking change for MP3 loading in 0.12 #2652

Closed
patrickvonplaten opened this issue Aug 26, 2022 · 1 comment
Closed

Backwards breaking change for MP3 loading in 0.12 #2652

patrickvonplaten opened this issue Aug 26, 2022 · 1 comment

Comments

@patrickvonplaten
Copy link

patrickvonplaten commented Aug 26, 2022

🐛 Describe the bug

It is written pretty clearly in the release notes that there is a breaking change when loading MP3 files:

MP3 decoding is now handled by FFmpeg in sox_io backend. (https://github.com/pytorch/audio/pull/2419, https://github.com/pytorch/audio/pull/2428)
FFmpeg is now used as fallback in sox_io backend, and now MP3 decoding is handled by FFmpeg. To load MP3 audio with torchaudio.load, please install a compatible version of FFmpeg (Version 4 when using an official binary distribution).
Note that, whereas the previous MP3 decoding scheme pads the output audio, the new scheme does not. As a consequence, the new version returns shorter audio tensors.
torchaudio.info now returns num_frames=0 for MP3.

However I think it's a pretty extreme change that could have been done also more over time with an option to fall back to the previous backend (or is this somehow possible?).

When running the following experiments, one can see that it's not just 0-padding that has change, there are also numerical differences now. E.g. if you can the following code snippet (with the respective versions 0.11 and 0.12:

#!/usr/bin/env python3
# download https://file-examples.com/storage/fe8bd9dfd063066d39cfd5a/2017/11/file_example_MP3_1MG.mp3
import numpy as np
import torch
import torchaudio
print("torch vesion", torch.__version__)
print("torchaudio vesion", torchaudio.__version__)

save_audio = True
load_audios = False

array, _ = torchaudio.load("./file_example_MP3_1MG.mp3")

if "0.11" in torchaudio.__version__:
    print("Array 11 Shape", array.shape)
    print("Array 11 abs sum", np.sum(np.abs(array.numpy())))
if "0.12" in torchaudio.__version__:
    print("Array 12 Shape", array.shape)
    print("Array 13 abs sum", np.sum(np.abs(array.numpy())))

gives:

torch vesion 1.11.0+cu102
torchaudio vesion 0.11.0+cu102
Array 11 Shape torch.Size([2, 1198080])
Array 11 abs sum 255783.23

and

torch vesion 1.12.1+cu102
torchaudio vesion 0.12.1+cu102
Array 12 Shape torch.Size([2, 1196135])
Array 13 abs sum 255885.97

This is a pretty big numerical difference IMO and too much of a backwards breaking change.

It broke of test in Transformers: huggingface/transformers#18749 - we're getting different values now for time steps with Wav2Vec2.
and also lead to some problems in Datasets: huggingface/datasets#4776

Sorry for the very long message! Is there any way we could still fall back to the previous backend in 0.12 and adapt more slowly to the new (breaking) MP3 decoding?

Versions

PyTorch version: 1.11.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A

OS: Pop!_OS 21.10 (x86_64)
GCC version: (Ubuntu 11.2.0-7ubuntu2) 11.2.0
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.34

Python version: 3.9.7 (default, Jun 22 2022, 20:11:26)  [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.18.10-76051810-generic-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 470.129.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] pytorch-lightning==1.6.1
[pip3] pytorch-nlp==0.5.0
[pip3] pytorch-pretrained-biggan==0.1.1
[pip3] torch==1.11.0
[pip3] torch-scatter==2.0.9
[pip3] torchaudio==0.11.0
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.12.0+cu113
[conda] Could not collect

(note that torch and torchaudio was both 1.11/1.12 and 0.11/0.12 respectively)

@mthrok
Copy link
Collaborator

mthrok commented Jul 31, 2023

Hi

Sorry for the inconvenience and sorry for not replying. I missed this issue filed as I was on leave.
We had to do some unusual change with libsox and mp3.

Recently, we changed the way torchaudio integrates with libsox #3497, and now it relies on externally installed libsox, which most likely has mp3 support, therefore, I think is now resolved.

@mthrok mthrok closed this as completed Jul 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants