Skip to content
This repository has been archived by the owner on Oct 9, 2023. It is now read-only.

Refactor file loading to use fsspec #1387

Merged
merged 18 commits into from
Jul 14, 2022
Merged

Refactor file loading to use fsspec #1387

merged 18 commits into from
Jul 14, 2022

Conversation

ethanwharris
Copy link
Collaborator

@ethanwharris ethanwharris commented Jul 13, 2022

What does this PR do?

Refactors file loading to use fsspec and a series of other enhancements:

  • added formats for audio loading (but drops mp3, couldn't get that to work)
  • added support for ".tsv" files in data frame loading
  • standardised all loaded spectrograms to be float32
  • tests every supported format with every loader

Fixes #1298

Before submitting

  • Was this discussed/approved via a Github issue? (no need for typos and docs improvements)
  • Did you read the contributor guideline, Pull Request section?
  • Did you make sure your PR does only one thing, instead of bundling different changes together?
  • Did you make sure to update the documentation with your changes?
  • Did you write any new necessary tests? [not needed for typos/docs]
  • Did you verify new and existing tests pass locally with your changes?
  • If you made a notable change (that affects users), did you update the CHANGELOG?

PR review

  • Is this pull request ready for review? (if not, please submit in draft mode)

Anyone in the community is free to review the PR once the tests have passed.
If we didn't discuss your PR in Github issues there's a high chance it will not be merged.

Did you have fun?

Make sure you had fun coding 🙃

@codecov
Copy link

codecov bot commented Jul 13, 2022

Codecov Report

Merging #1387 (8bb6713) into master (86e3781) will increase coverage by 0.10%.
The diff coverage is 98.41%.

@@            Coverage Diff             @@
##           master    #1387      +/-   ##
==========================================
+ Coverage   92.82%   92.92%   +0.10%     
==========================================
  Files         285      285              
  Lines       12763    12778      +15     
==========================================
+ Hits        11847    11874      +27     
+ Misses        916      904      -12     
Flag Coverage Δ
unittests 92.92% <98.41%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
flash/audio/classification/data.py 100.00% <ø> (ø)
flash/audio/speech_recognition/data.py 100.00% <ø> (ø)
flash/core/data/utilities/data_frame.py 100.00% <ø> (+16.12%) ⬆️
flash/image/classification/data.py 98.64% <ø> (ø)
flash/tabular/classification/data.py 97.61% <ø> (ø)
flash/tabular/regression/data.py 97.29% <ø> (ø)
flash/text/classification/data.py 100.00% <ø> (ø)
flash/text/question_answering/data.py 100.00% <ø> (ø)
flash/text/seq2seq/summarization/data.py 100.00% <ø> (ø)
flash/text/seq2seq/translation/data.py 100.00% <ø> (ø)
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 86e3781...8bb6713. Read the comment docs.

@ethanwharris ethanwharris marked this pull request as ready for review July 13, 2022 20:03
@ethanwharris ethanwharris changed the title [WIP] Update file loading to use fsspec [WIP] Refactor file loading to use fsspec Jul 13, 2022
@ethanwharris ethanwharris added this to the 0.8.0 milestone Jul 13, 2022
@ethanwharris ethanwharris added the enhancement New feature or request label Jul 13, 2022
@ethanwharris ethanwharris changed the title [WIP] Refactor file loading to use fsspec Refactor file loading to use fsspec Jul 14, 2022
@ethanwharris ethanwharris merged commit fd8cc7f into master Jul 14, 2022
@ethanwharris ethanwharris deleted the feature/fsspec branch July 14, 2022 18:22
Copy link
Contributor

@krshrimali krshrimali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, @ethanwharris - wow! This looks great. 🎉 🔥 🚀 Thanks for the summary in the description as well. Sorry for the late comments, I don't really have any suggestions but a couple of questions:

  1. Since URLs are supported, apart from the release notes, we need a way to tell the users that they can pass the URLs instead of downloading the CSV files using download_data. What are your opinions on raising a UserWarning or a message for calls to download_data? Something like:
UserWarning: For Lightning Flash v0.8.0+, URLs are now supported to be passed directly. If your use-case supports it, you can now skip the call to `download_data`, and directly pass the URL to your file.
  1. Curious to know what went wrong with the MP3 files? If you have any info on it, we can probably create an issue and come back to it later.

In a separate PR though, we can also update the examples to skip downloading data, whenever possible, and pass the URLs directly. (this can be an issue for the community to come in ⚡)

@ethanwharris
Copy link
Collaborator Author

ethanwharris commented Jul 15, 2022

@krshrimali Thanks for the comments! Here are my thoughts 😃

  1. I don't think we should do that, download_data is stilll a valid thing to do just not always the simplest. We should just make sure that the option of passing a URL is clear from our docs (still some work to do there with docstrings I think).
  2. It looks like MP3 support was only added in the latest version of soundfile and I couldn't get it to work. We should figure out what the requirements are for it and then it could be added back (as long as we can make it run in our CI).

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expose better API for file I/O in DataModules
2 participants