Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use GITHUB_TOKEN in github_fast_path if available. #13563

Closed
wants to merge 1 commit into from

Conversation

ehuss
Copy link
Contributor

@ehuss ehuss commented Mar 8, 2024

This changes the GitHub fast-path to include an Authorization header if the GITHUB_TOKEN environment variable is set. This is intended to help with API rate limits, which is not too hard to hit in CI. If it hits the rate limit, then the fetch falls back to fetching all branches which is significantly slower for some repositories.

This is also a partial solution for #13555, where the user has a rev="<commit-hash>" where that commit hash points to a PR. The default refspec of ["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"] will not fetch commits from PRs (unless they are reachable from some branch).

There is some risk that if the user has a GITHUB_TOKEN that is invalid or expired, then this will cause github_fast_path to fail where it previously wouldn't. I think that is probably unlikely to happen, though.

This also adds redirect support. Apparently GitHub now issues a redirect for repos/{org}/{repo}/commits/{commit} to /repositories/{id}/commits/{commit}.

The test https::github_works exercises this code path, and will use the token on CI.

@rustbot
Copy link
Collaborator

rustbot commented Mar 8, 2024

r? @weihanglo

rustbot has assigned @weihanglo.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

@rustbot rustbot added A-git Area: anything dealing with git S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 8, 2024
src/cargo/sources/git/utils.rs Outdated Show resolved Hide resolved
@@ -1458,6 +1458,10 @@ fn github_fast_path(
if let Some(local_object) = local_object {
headers.append(&format!("If-None-Match: \"{}\"", local_object))?;
}
if let Ok(token) = std::env::var("GITHUB_TOKEN") {
// Avoid API rate limits if possible.
headers.append(&format!("Authorization: token {token}"))?;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

According to the GitHub REST API doc, both token and Bearer scheme work. The example for GitHub Actions uses Bearer. Should we use that instead?

Note: In most cases, you can use Authorization: Bearer or Authorization: token to pass a token. However, if you are passing a JSON web token (JWT), you must use Authorization: Bearer.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it depends on which scenario we want to address. If this is for CI, then bearer seems more reasonable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, went ahead and changed it.

@ehuss
Copy link
Contributor Author

ehuss commented Mar 13, 2024

I decided to go ahead and update the github test to actually verify this behavior, testing the various code paths for different reference kinds. I also added redirect support, since it now seems to be required.

@ehuss
Copy link
Contributor Author

ehuss commented Mar 13, 2024

github fast path bad response code 403

Hm, I don't yet know why that is happening. I had tested tokens with essentially no permissions. I wonder if the Actions token is different somehow.

@weihanglo
Copy link
Member

Hm, I don't yet know why that is happening. I had tested tokens with essentially no permissions. I wonder if the Actions token is different somehow.

But tests passed on Linux and M1 mac. Let's @bors try again

@weihanglo
Copy link
Member

@bors try

@bors
Copy link
Contributor

bors commented Mar 13, 2024

⌛ Trying commit fe3cb3d with merge bcd8e6d...

bors added a commit that referenced this pull request Mar 13, 2024
Use GITHUB_TOKEN in github_fast_path if available.

This changes the GitHub fast-path to include an Authorization header if the GITHUB_TOKEN environment variable is set. This is intended to help with API rate limits, which is not too hard to hit in CI. If it hits the rate limit, then the fetch falls back to fetching all branches which is significantly slower for some repositories.

This is also a partial solution for #13555, where the user has a `rev="<commit-hash>"` where that commit hash points to a PR. The default refspec of `["+refs/heads/*:refs/remotes/origin/*", "+HEAD:refs/remotes/origin/HEAD"]` will not fetch commits from PRs (unless they are reachable from some branch).

There is some risk that if the user has a GITHUB_TOKEN that is invalid or expired, then this will cause `github_fast_path` to fail where it previously wouldn't. I think that is probably unlikely to happen, though.

This also adds redirect support. Apparently GitHub now issues a redirect for `repos/{org}/{repo}/commits/{commit}` to `/repositories/{id}/commits/{commit}`.

The test `https::github_works` exercises this code path, and will use the token on CI.
@bors
Copy link
Contributor

bors commented Mar 13, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 13, 2024
@Mark-Simulacrum
Copy link
Member

There is some risk that if the user has a GITHUB_TOKEN that is invalid or expired, then this will cause github_fast_path to fail where it previously wouldn't. I think that is probably unlikely to happen, though.

How worried are we about exhausting rate limit?

It can be pretty easy to hit the GitHub rate limit, so I wonder if the number of queries we do will potentially make users unhappy if we exhaust the limit on their token... if this is only for CI, that's probably fine, but locally I could see it being pretty painful.

What about enabling "hidden" usage of credentials with this? For example, if someone has a GitHub token that can access private repositories, just a cargo fetch would "reveal" that here, right? (My sense is that the threat model for cargo wouldn't really be affected here and so this isn't concerning).

@bors
Copy link
Contributor

bors commented Mar 20, 2024

☔ The latest upstream changes (presumably #13561) made this pull request unmergeable. Please resolve the merge conflicts.

@ehuss
Copy link
Contributor Author

ehuss commented Apr 6, 2024

I posted #13718 to fix the immediate problem with the redirect.

@ehuss
Copy link
Contributor Author

ehuss commented Apr 6, 2024

The error in CI is:

---- https::github_works stdout ----
test , tag = "1.3.2"
running `/Users/runner/work/cargo/cargo/target/debug/cargo fetch`
thread 'https::github_works' panicked at tests/testsuite/https.rs:189:14:

test failed running `/Users/runner/work/cargo/cargo/target/debug/cargo fetch`
error: expected to find:
[..]github fast path fetch ed185cfb1c447c1b4bd6ac021c9ec3bb02c9e2f2

did not find in output:
    Updating git repository `https://github.com/rust-lang/bitflags.git`
   0.074247515s DEBUG cargo::sources::git::utils: attempting GitHub fast path for https://api.github.com/repos/rust-lang/bitflags/commits/1.3.2
   0.476275986s DEBUG cargo::sources::git::utils: github fast path bad response code 403
   0.476517613s DEBUG cargo::sources::git::utils: skipping gc as there's only 0 pack files
   0.477348019s DEBUG cargo::sources::git::utils: doing a fetch for https://github.com/rust-lang/bitflags.git
   0.477795554s DEBUG cargo::sources::git::utils: initiating fetch of ["+refs/tags/1.3.2:refs/remotes/origin/tags/1.3.2"] from https://github.com/rust-lang/bitflags.git
   1.604590687s  INFO cargo::sources::git::utils: reset /Users/runner/work/cargo/cargo/target/tmp/cit/t1575/home/.cargo/git/checkouts/bitflags-2d5483ada0800237/ed185cf/.git/ to ed185cfb1c447c1b4bd6ac021c9ec3bb02c9e2f2
   1.606428247s DEBUG cargo::sources::git::utils: doing reset
   1.663015905s DEBUG cargo::sources::git::utils: reset done
   1.664236545s DEBUG cargo::sources::git::utils: update submodules for: "/Users/runner/work/cargo/cargo/target/tmp/cit/t1575/home/.cargo/git/checkouts/bitflags-2d5483ada0800237/ed185cf/"

I've been unable to reproduce that in my own account, though. 😦

bors added a commit that referenced this pull request Apr 6, 2024
Fix github fast path redirect.

This fixes the GitHub fast-path check to look up the sha of a git ref. At some point, GitHub changed the API to redirect to a different URL. Currently cargo is failing the fast-path lookup with 301 response code.

This can be tested in a project with a git dependency, and running `CARGO_LOG=cargo::sources::git::utils=debug cargo fetch` to verify it is picking up the fast path. This currently can't be tested in CI due to #13563.
@ehuss
Copy link
Contributor Author

ehuss commented Apr 6, 2024

Oh, I'm a dummy. The GITHUB_TOKEN env var needs to be set for the test to actually work. I'm somewhat reluctant to do that, though, since the exposure surface would be so large. 😦

@ehuss
Copy link
Contributor Author

ehuss commented Apr 6, 2024

How worried are we about exhausting rate limit?

I was originally not concerned, but I'm a little uncertain now. The Actions token has a limit of 1,000 per hour (or 15,000 for GHEC). I was originally thinking that limit is per run. But now I understand it is per repository and the limit is shared across all runs. That could be a problem, for example, if you have a large number of workflows, and a hefty number of git dependencies, then I could imagine that chewing through hundreds of points relatively quickly.

What about enabling "hidden" usage of credentials with this?

I've been trying to think of ways this could be a security issue, but I haven't been able to come up with something concrete. I do have a fair sense of unease with it, though, since this could be a sensitive thing.


Since the user would need to set GITHUB_TOKEN when running cargo, I'm thinking almost nobody will do that, and thus this PR is probably not worth moving forward with. For some reason, I was originally thinking the env var was always set, but it needs to be done manually. It sucks since the fast path can make a fairly substantial difference in performance.

I'm going to go ahead and close since I don't want to put any more time into it. This might be something we could consider doing for people who really want the fast-path performance, but I'm not sure it is worth it.

@ehuss ehuss closed this Apr 6, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-git Area: anything dealing with git S-waiting-on-author Status: The marked PR is awaiting some action (such as code changes) from the PR author.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants