Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

replace soundfile with librosa #726

Merged
merged 10 commits into from
Sep 6, 2021
Merged

replace soundfile with librosa #726

merged 10 commits into from
Sep 6, 2021

Conversation

flozi00
Copy link
Contributor

@flozi00 flozi00 commented Sep 3, 2021

What does this PR do?

This PR replaces soundfile with librosa to support more audio formats like .mp3 by default

Fixes #724

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? [not needed for typos/docs]
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Sep 3, 2021

Codecov Report

Merging #726 (6ee1fc3) into master (cf86275) will decrease coverage by 0.01%.
The diff coverage is 82.14%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #726      +/-   ##
==========================================
- Coverage   89.37%   89.36%   -0.02%     
==========================================
  Files         198      198              
  Lines       10552    10565      +13     
==========================================
+ Hits         9431     9441      +10     
- Misses       1121     1124       +3     
Flag Coverage Δ
unittests 89.36% <82.14%> (-0.02%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/audio/speech_recognition/data.py 76.74% <80.76%> (+0.02%) ⬆️
flash/core/utilities/imports.py 90.47% <100.00%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update cf86275...6ee1fc3. Read the comment docs.

@flozi00
Copy link
Contributor Author

flozi00 commented Sep 4, 2021

@ethanwharris

I have an strange error, while training on mp3 data everything works fine, but when using mp3 files to predict librosa is never called to load the audio. When using wav-files the print statement executes and everything works fine.

Traceback (most recent call last):
  File "f:\codes\python apps\lightning-flash\flash\core\data\batch.py", line 231, in forward
    samples = self.collate_fn(samples, metadata)
  File "C:\Users\flozi\anaconda3\envs\wav2vec\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "f:\codes\python apps\lightning-flash\flash\core\data\utils.py", line 178, in forward
    return self.func(*args, **kwargs)
  File "f:\codes\python apps\lightning-flash\flash\core\data\process.py", line 383, in collate
    return collate_fn(samples, metadata)
  File "f:\codes\python apps\lightning-flash\flash\audio\speech_recognition\collate.py", line 65, in __call__
    sampling_rates = [sample["sampling_rate"] for sample in metadata]
TypeError: 'NoneType' object is not iterable

During handling of the above exception, another exception occurred: 

Traceback (most recent call last):
  File "test-flash.py", line 15, in <module>
    print(model.predict(sample_path))
  File "f:\codes\python apps\lightning-flash\flash\core\model.py", line 247, in wrapper
    result = func(self, *args, **kwargs)
  File "f:\codes\python apps\lightning-flash\flash\core\model.py", line 418, in predict
    x = data_pipeline.worker_preprocessor(running_stage, collate_fn=dataloader.collate_fn)(x)
  File "C:\Users\flozi\anaconda3\envs\wav2vec\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "f:\codes\python apps\lightning-flash\flash\core\data\batch.py", line 233, in forward
    samples = self.collate_fn(samples)
  File "C:\Users\flozi\anaconda3\envs\wav2vec\lib\site-packages\torch\nn\modules\module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "f:\codes\python apps\lightning-flash\flash\core\data\utils.py", line 178, in forward
    return self.func(*args, **kwargs)
  File "f:\codes\python apps\lightning-flash\flash\core\data\process.py", line 383, in collate
    return collate_fn(samples, metadata)
  File "f:\codes\python apps\lightning-flash\flash\audio\speech_recognition\collate.py", line 65, in __call__
    sampling_rates = [sample["sampling_rate"] for sample in metadata]
TypeError: 'NoneType' object is not iterable

cause the audio never gets loaded an NoneType error returns

@ethanwharris
Copy link
Collaborator

Hey @flozi00 great work here! I fixed the MP3 predicting issue, just needed to add .mp3 to the list of valid extensions in the SpeechRecognitionPathsDataSource. Should all be working now 😃

Copy link
Collaborator

@ethanwharris ethanwharris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, LGTM 😃

@ethanwharris ethanwharris merged commit 4dd2830 into Lightning-Universe:master Sep 6, 2021
@flozi00 flozi00 deleted the patch-1 branch September 6, 2021 12:43
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Speech recognition data loading and augmentation
4 participants