Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

decode mp3 with librosa if torchaudio is > 0.12 as a temporary workaround #4923

Merged
merged 38 commits into from
Sep 20, 2022

Conversation

polinaeterna
Copy link
Contributor

@polinaeterna polinaeterna commented Aug 31, 2022

torchaudio>0.12 fails with decoding mp3 files if ffmpeg<4. currently we ask users to downgrade torchaudio, but sometimes it's not possible as torchaudio version is binded to torch version. as a temporary workaround we can decode mp3 with librosa (though it 60 times slower, at least it works)

another option would be to ask users to install the required version of ffmpeg, but is non-trivial on colab: it's not in apt packages in ubuntu 18 and conda is not preinstalled (with conda it would be easily installable)

  • decode with torchaudio anyway if the version of ffmpeg is correct? it's 60 times faster
  • tests
  • DO NOT FORGET to get back all the tests

see #4776 and #3663 (comment) (there is a Colab notebook to reproduce the error)

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Aug 31, 2022

The documentation is not available anymore as the PR was closed or merged.

@lhoestq
Copy link
Member

lhoestq commented Sep 9, 2022

Thanks ! Should we still support torchaudio>0.12 if it works ? And if it doesn't we can explain that downgrading is the right solution, or alternatively use librosa

@polinaeterna
Copy link
Contributor Author

polinaeterna commented Sep 12, 2022

@lhoestq

Should we still support torchaudio>0.12 if it works ? And if it doesn't we can explain that downgrading is the right solution, or alternatively use librosa

I'm not sure here, because from the one hand, if torchaudio works - it works 60 times faster then librosa.
But from the other hand, we will get inconsistent behavior (=different results of decoding) for users of torchaudio>=0.12.
I'd better go for using librosa only to avoid inconsistency then. wdyt?

@lhoestq
Copy link
Member

lhoestq commented Sep 12, 2022

It seems a bit too constraining to not allow users who have a working torchaudio 0.12 setup to not use it.

If the issue is about avoiding silent errors if the decoding changes, maybe we can log which back-end is used ? It can even be a warning with performance suggestions ("you're using librosa but torchaudio 0.xx is recommended").

Note that users can still have a requirements.txt or whatever in their projects if they really want full reproducibility (and it's the bare minimum imo)

There are multiple possible back-ends so it's maybe not reasonable to only allow one back-end, especially since each back-end has installation constrains and there's no "best" back-end.

@polinaeterna polinaeterna marked this pull request as ready for review September 19, 2022 18:19
Copy link
Member

@lhoestq lhoestq left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool thank you !

"`pip install librosa`. Note that decoding will be extremely slow in that case."
) from err
# try to decode with librosa for torchaudio>=0.12.0 as a workaround
logger.warning("Decoding mp3 with `librosa` instead of `torchaudio`, decoding is slow.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should warn this only once ? You can use warnings.warn to do that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't figure out how to do this. I changed logger.warning to warnings.warn here and tried setting warnings.filterwarnings("once") / warnings.simplefilter("once") but it didn't work, What am I missing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

@polinaeterna polinaeterna Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah lol I used warnings from logger to check if decoding was done with librosa (as they arrays have the same shapes), and now warnings from .warn are not captured in caplog in pytest, so these tests fail. maybe leave it as is?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does the warning always once by default, you don't need to use an extra filter.

And I think with pytest.warns(UserWarning): should work in the test

Copy link
Contributor Author

@polinaeterna polinaeterna Sep 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! fixed it here ebf77d7

I don't know why but for some reason in librosa case warnings are shown at each decoding anyway, I checked it on Colab, see pic. Might it be because of that librosa.load itself gives warnings on decoding: UserWarning: PySoundFile failed. Trying audioread instead.

Anyway, maybe not that important for now? Users can use warnings.filterwarnings("ignore")

image

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Users can still mute those warnings, I don't think we can do something about it.

tests/utils.py Outdated Show resolved Hide resolved
@lhoestq
Copy link
Member

lhoestq commented Sep 20, 2022

Woohoo all green ! Feel free to merge if it's all good for you :)

@polinaeterna polinaeterna changed the title WIP: decode mp3 with librosa if torchaudio is > 0.12 as a temporary workaround decode mp3 with librosa if torchaudio is > 0.12 as a temporary workaround Sep 20, 2022
@polinaeterna polinaeterna merged commit 142404f into huggingface:main Sep 20, 2022
@polinaeterna polinaeterna deleted the workaround-torchaudio-0.12 branch November 2, 2022 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants