Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support API token for scanning hf:// #17682

Merged
merged 1 commit into from
Jul 19, 2024

Conversation

nameexhaustion
Copy link
Collaborator

@nameexhaustion nameexhaustion commented Jul 17, 2024

Adds support for scanning from private repositories with an API token:

pl.scan_csv(
    "hf://...",
    storage_options={"token": "..."},
)

or alternatively, using an environment variable

export HF_TOKEN=...
python main.py

ref #17625

@github-actions github-actions bot added enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars labels Jul 17, 2024
Copy link

codecov bot commented Jul 17, 2024

Codecov Report

Attention: Patch coverage is 10.00000% with 81 lines in your changes missing coverage. Please review.

Project coverage is 80.40%. Comparing base (235cad3) to head (e6d61dd).
Report is 2 commits behind head on main.

Files Patch % Lines
crates/polars-io/src/cloud/options.rs 9.37% 58 Missing ⚠️
crates/polars-io/src/path_utils/hugging_face.rs 0.00% 19 Missing ⚠️
crates/polars-io/src/cloud/object_store_setup.rs 0.00% 3 Missing ⚠️
crates/polars-io/src/path_utils/mod.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main   #17682      +/-   ##
==========================================
+ Coverage   80.37%   80.40%   +0.02%     
==========================================
  Files        1500     1500              
  Lines      196605   196640      +35     
  Branches     2793     2793              
==========================================
+ Hits       158016   158103      +87     
+ Misses      38076    38024      -52     
  Partials      513      513              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@nameexhaustion nameexhaustion marked this pull request as ready for review July 17, 2024 08:37
@albertvillanova
Copy link

Thanks for the fast implementation.

Just a nit comment: do you think the parameter could be named just "token"? It seems a bit redundant to call it "hf_token" when it is already passed to the "hf://" URL...

@nameexhaustion nameexhaustion changed the title feat: Support hf_token for scanning hf:// feat: Support API token for scanning hf:// Jul 17, 2024
@ritchie46
Copy link
Member

Can we add an entry in the docstring for discoverability? (Aside from the user guide)

@@ -59,12 +59,14 @@ def read_parquet(
Parameters
----------
source
Path to a file or a file-like object (by "file-like object" we refer to objects
Path(s) to a file or directory
When needing to authenticate for scanning cloud locations, see the `storage_options`
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

update docs for source to point to cloud options

@nameexhaustion nameexhaustion marked this pull request as draft July 18, 2024 05:09
@nameexhaustion nameexhaustion marked this pull request as ready for review July 18, 2024 15:28
@ritchie46 ritchie46 merged commit c4738d2 into pola-rs:main Jul 19, 2024
27 checks passed
@nameexhaustion nameexhaustion deleted the hf-token branch July 19, 2024 12:22
@c-peters c-peters added the accepted Ready for implementation label Jul 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted Ready for implementation enhancement New feature or an improvement of an existing feature python Related to Python Polars rust Related to Rust Polars
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants