Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always send the cached token when user is logged in #1064

Merged
merged 24 commits into from
Sep 22, 2022

Conversation

Wauplin
Copy link
Contributor

@Wauplin Wauplin commented Sep 19, 2022

TLDR; this works with this PR:

# Using cached token from `huggingface-cli login`
file = hf_hub_download("CompVis/stable-diffusion-v1-4", "README.md")

(cc @julien-c @osanseviero @LysandreJik @apolinario @patrickvonplaten @Pierrci)
Fix #926.

(related to this discussion (internal link), this discussion and this discussion as well (internal link))

The goal is to have a centralized way to handle tokens in huggingface_hub (and downstream libraries) that causes as low friction as possible for the users.

The idea here is to keep the rule "if token is not explicitly set and is not required then do not pass it in header so that the user is cannot be tracked on public repos". This PR aims to solve that by making a HEAD call to the Hub (without auth) to check if a repo is public or private. In case of a private repo, the token is then automatically sent to the Hub in all subsequent calls (the "private" property of a repo is cached).

EDIT: strategy has been revisited

The problem that has been formulated is that if a user wants to download files from a gated/private repos, they currently need to pass use_auth_token=True even though the user is logged in. This PR solves that by always sending the cached token if the user is logged in. This is a big change compared to the previous rule that was "don't send the token if not explicitly needed". Hope is to reduce as much friction as possible.

This PR and the build_hf_headers helper aims also to harmonize how tokens are handled and validated across the platform.

Examples:

# Get auth header
headers = build_hf_headers(use_auth_token="hf_***") # explicit token
headers = build_hf_headers(use_auth_token=True) # explicitly use cached token
headers = build_hf_headers(use_auth_token=False) # explicitly don't use cached token
headers = build_hf_headers() # implicit use of the cached token

DISABLE_IMPLICIT_HF_TOKEN=True # to set as env variable
headers = build_hf_headers() # token is not sent
# Cannot use api_org tokens on write calls
headers = build_hf_headers(use_auth_token="api_org_***")
headers = build_hf_headers(use_auth_token="api_org_***", is_write_action=True) # throws an error !

(note: this PR also gets rid of the datasets dependency in the tests but that kinda unrelated. I got an issue with it that I reported in this PR)

(edit: once that's reviewed and merged, implementing #1018 should be straightforward)

@Wauplin Wauplin changed the title Add a standard token to header helper Add a standard helper to generate headers from token Sep 19, 2022
@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Sep 19, 2022

The documentation is not available anymore as the PR was closed or merged.

@Wauplin Wauplin marked this pull request as ready for review September 19, 2022 16:49
@Wauplin Wauplin added this to the v0.10 milestone Sep 19, 2022
@codecov
Copy link

codecov bot commented Sep 19, 2022

Codecov Report

Base: 84.20% // Head: 84.16% // Decreases project coverage by -0.03% ⚠️

Coverage data is based on head (8a83312) compared to base (26819c7).
Patch coverage: 98.21% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1064      +/-   ##
==========================================
- Coverage   84.20%   84.16%   -0.04%     
==========================================
  Files          38       40       +2     
  Lines        4121     4087      -34     
==========================================
- Hits         3470     3440      -30     
+ Misses        651      647       -4     
Impacted Files Coverage Δ
src/huggingface_hub/__init__.py 75.75% <ø> (ø)
src/huggingface_hub/commands/user.py 33.75% <75.00%> (+0.62%) ⬆️
src/huggingface_hub/_commit_api.py 92.10% <100.00%> (ø)
src/huggingface_hub/_snapshot_download.py 96.00% <100.00%> (+1.26%) ⬆️
src/huggingface_hub/fastai_utils.py 81.18% <100.00%> (-0.19%) ⬇️
src/huggingface_hub/file_download.py 86.86% <100.00%> (+0.04%) ⬆️
src/huggingface_hub/hf_api.py 86.56% <100.00%> (-0.88%) ⬇️
src/huggingface_hub/hub_mixin.py 86.23% <100.00%> (-0.25%) ⬇️
src/huggingface_hub/keras_mixin.py 90.47% <100.00%> (-0.06%) ⬇️
src/huggingface_hub/repository.py 80.33% <100.00%> (ø)
... and 5 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@Wauplin Wauplin changed the title Add a standard helper to generate headers from token Always send the cached token when user is logged in Sep 21, 2022
@lhoestq
Copy link
Member

lhoestq commented Sep 21, 2022

Cool ! I guess datasets can do the same at one point for consistency

@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 21, 2022

@lhoestq Yes ! Best would be to handle all http calls in huggingface_hub but if datasets has a very specific call, it could reuse "build_hf_headers" (from huggingface_hub.utils import build_hf_headers).

Copy link
Contributor

@osanseviero osanseviero left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very clean PR, thanks a lot for doing this!

docs/source/quick-start.mdx Show resolved Hide resolved
docs/source/quick-start.mdx Outdated Show resolved Hide resolved
src/huggingface_hub/utils/_headers.py Show resolved Hide resolved
src/huggingface_hub/utils/_hf_folder.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool! The resulting use of build_hf_headers is nice, much cleaner this way.
Thanks for working on it!

docs/source/quick-start.mdx Outdated Show resolved Hide resolved
src/huggingface_hub/utils/_headers.py Outdated Show resolved Hide resolved
src/huggingface_hub/utils/_headers.py Show resolved Hide resolved
src/huggingface_hub/utils/_hf_folder.py Show resolved Hide resolved
tests/conftest.py Show resolved Hide resolved
@Wauplin
Copy link
Contributor Author

Wauplin commented Sep 22, 2022

Thanks @julien-c @LysandreJik @osanseviero for the reviews and comments. I made all the requested changes.
Will merge it once the CI is happy :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

hf_hub_download and cached_download should read the token by default
6 participants