You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It is written pretty clearly in the release notes that there is a breaking change when loading MP3 files:
MP3 decoding is now handled by FFmpeg in sox_io backend. (https://github.com/pytorch/audio/pull/2419, https://github.com/pytorch/audio/pull/2428)
FFmpeg is now used as fallback in sox_io backend, and now MP3 decoding is handled by FFmpeg. To load MP3 audio with torchaudio.load, please install a compatible version of FFmpeg (Version 4 when using an official binary distribution).
Note that, whereas the previous MP3 decoding scheme pads the output audio, the new scheme does not. As a consequence, the new version returns shorter audio tensors.
torchaudio.info now returns num_frames=0 for MP3.
However I think it's a pretty extreme change that could have been done also more over time with an option to fall back to the previous backend (or is this somehow possible?).
When running the following experiments, one can see that it's not just 0-padding that has change, there are also numerical differences now. E.g. if you can the following code snippet (with the respective versions 0.11 and 0.12:
Sorry for the very long message! Is there any way we could still fall back to the previous backend in 0.12 and adapt more slowly to the new (breaking) MP3 decoding?
Versions
PyTorch version: 1.11.0+cu102
Is debug build: False
CUDA used to build PyTorch: 10.2
ROCM used to build PyTorch: N/A
OS: Pop!_OS 21.10 (x86_64)
GCC version: (Ubuntu 11.2.0-7ubuntu2) 11.2.0
Clang version: Could not collect
CMake version: version 3.18.4
Libc version: glibc-2.34
Python version: 3.9.7 (default, Jun 22 2022, 20:11:26) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-5.18.10-76051810-generic-x86_64-with-glibc2.34
Is CUDA available: True
CUDA runtime version: 11.3.109
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 3070 Laptop GPU
Nvidia driver version: 470.129.06
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
Versions of relevant libraries:
[pip3] mypy-extensions==0.4.3
[pip3] numpy==1.21.6
[pip3] pytorch-lightning==1.6.1
[pip3] pytorch-nlp==0.5.0
[pip3] pytorch-pretrained-biggan==0.1.1
[pip3] torch==1.11.0
[pip3] torch-scatter==2.0.9
[pip3] torchaudio==0.11.0
[pip3] torchmetrics==0.6.0
[pip3] torchvision==0.12.0+cu113
[conda] Could not collect
(note that torch and torchaudio was both 1.11/1.12 and 0.11/0.12 respectively)
The text was updated successfully, but these errors were encountered:
Sorry for the inconvenience and sorry for not replying. I missed this issue filed as I was on leave.
We had to do some unusual change with libsox and mp3.
Recently, we changed the way torchaudio integrates with libsox #3497, and now it relies on externally installed libsox, which most likely has mp3 support, therefore, I think is now resolved.
🐛 Describe the bug
It is written pretty clearly in the release notes that there is a breaking change when loading MP3 files:
However I think it's a pretty extreme change that could have been done also more over time with an option to fall back to the previous backend (or is this somehow possible?).
When running the following experiments, one can see that it's not just 0-padding that has change, there are also numerical differences now. E.g. if you can the following code snippet (with the respective versions 0.11 and 0.12:
gives:
and
This is a pretty big numerical difference IMO and too much of a backwards breaking change.
It broke of test in Transformers: huggingface/transformers#18749 - we're getting different values now for time steps with Wav2Vec2.
and also lead to some problems in Datasets: huggingface/datasets#4776
Sorry for the very long message! Is there any way we could still fall back to the previous backend in 0.12 and adapt more slowly to the new (breaking) MP3 decoding?
Versions
(note that torch and torchaudio was both 1.11/1.12 and 0.11/0.12 respectively)
The text was updated successfully, but these errors were encountered: