Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support hf:// in read_(csv|ipc|ndjson) functions #17785

Merged
merged 8 commits into from
Jul 23, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Jul 22, 2024

For CSV and IPC, only hf:// paths are dispatched to their scan_* equivalents. They can't dispatch for all paths as it would cause a breaking change to storage_options as they currently use fsspec, which uses different configuration keys. For CSV as well there is also the issue of compressed files.

read_ndjson doesn't suffer from the above issues, so I've added the full set of parameters from scan_ndjson and set it to always dispatch to scan_ndjson.

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 22, 2024
Copy link

codecov bot commented Jul 22, 2024

Codecov Report

Attention: Patch coverage is 70.49180% with 18 lines in your changes missing coverage. Please review.

Project coverage is 80.48%. Comparing base (66f0026) to head (4054d66).
Report is 4 commits behind head on main.

Files Patch % Lines
py-polars/polars/io/csv/functions.py 57.89% 5 Missing and 3 partials ⚠️
py-polars/polars/io/ipc/functions.py 61.53% 3 Missing and 2 partials ⚠️
crates/polars-plan/src/plans/conversion/scans.rs 60.00% 2 Missing ⚠️
py-polars/polars/io/ndjson.py 80.00% 1 Missing and 1 partial ⚠️
crates/polars-utils/src/io.rs 66.66% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17785      +/-   ##
==========================================
+ Coverage   80.47%   80.48%   +0.01%     
==========================================
  Files        1503     1503              
  Lines      197115   196981     -134     
  Branches     2794     2795       +1     
==========================================
- Hits       158628   158541      -87     
+ Misses      37973    37920      -53     
- Partials      514      520       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nameexhaustion nameexhaustion marked this pull request as ready for review July 22, 2024 12:28
@nameexhaustion nameexhaustion marked this pull request as draft July 22, 2024 14:30
@nameexhaustion nameexhaustion changed the title feat: Support hf:// in read_csv feat: Support hf:// in read_(csv|ipc|ndjson) functions Jul 23, 2024
# Also dispatch on FORCE_ASYNC, so that this codepath gets run
# through by our test suite during CI.
or (
os.getenv("POLARS_FORCE_ASYNC") == "1"
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Magical test coverage for the hf:// dispatch 😉

@c-peters
Copy link
Collaborator

LGTM, Could you create an issue about the breaking change for the storage options. Would be great if we could simplify the read_xxx in the future by going into the scan for all paths and not only hf

Copy link
Member

@ritchie46 ritchie46 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright. Then for 2.0 we should see if we can move more from ffspec into our engine.

Thanks!

@ritchie46 ritchie46 merged commit 69b2064 into pola-rs:main Jul 23, 2024
28 checks passed
atigbadr pushed a commit to atigbadr/polars that referenced this pull request Jul 23, 2024
@nameexhaustion
Copy link
Collaborator Author

Could you create an issue about the breaking change for the storage options.

Issue created at #17815

@nameexhaustion nameexhaustion deleted the hf-read-csv branch July 28, 2024 10:59
@c-peters c-peters added the accepted Ready for implementation label Jul 28, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants