Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Github rate limit #8426

Closed
killianmuldoon opened this issue Mar 30, 2023 · 17 comments
Closed

Github rate limit #8426

killianmuldoon opened this issue Mar 30, 2023 · 17 comments
Assignees
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/flake Categorizes issue or PR as related to a flaky test. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@killianmuldoon
Copy link
Contributor

Which jobs are flaking?

CAPI e2e main

Which tests are flaking?

e.g. When testing clusterctl upgrades using ClusterClass (v1.2=>current) [ClusterClass] Should create a management cluster and then upgrade all the providers

Since when has it been flaking?

For some months.

Testgrid link

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-main/1641014387735007232

Reason for failure (if possible)

Multiple jobs in the CAPI e2e test run are flaking due to hitting a rate limit with the Github API.

The error looks like:

Error: failed to read "cert-manager.yaml" from provider's repository "cert-manager": failed to download files from GitHub release v1.11.0: rate limit for github api has been reached. Please wait one hour or get a personal API token and assign it to the GITHUB_TOKEN environment variable

Anything else we need to know?

One possible solution (from @sbueringer): vendor the cert-manager yaml in most tests to cut down on this flake. We could consider vendoring it for all tests, but we would lose coverage of the cert-manager download in e2e testing which is a trade-off.

Label(s) to be applied

/kind flake
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

@k8s-ci-robot k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 30, 2023
@fabriziopandini
Copy link
Member

/triage accepted
/help

@k8s-ci-robot
Copy link
Contributor

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

  • Why are we solving this issue?
  • To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
  • Does this issue have zero to low barrier of entry?
  • How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added triage/accepted Indicates an issue or PR is ready to be actively worked on. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. and removed needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Apr 4, 2023
@chrischdi
Copy link
Member

chrischdi commented Apr 13, 2023

Q1: I think lots of stuff in prow uses the ghproxy. But does not look like a real option here.

Q2: what about adding the GITHUB_TOKEN, as we do for periodic-cluster-api-verify-book-links-release-0-4 ?

https://github.com/kubernetes/test-infra/blob/343d1d18b60a33c522935ea30c234ea4b0141d2c/config/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-release-0-4.yaml#L125

But cluster-api seems to be very unique in doing this 🤔, there not much jobs mounting a github token in test-infra.

@killianmuldoon
Copy link
Contributor Author

Q1: I think lots of stuff in prow uses the ghproxy. But does not look like a real option here.

I didn't realize this existed! Why wouldn't it work here in your opinion? I'm not sure how mounting the token works for that job either - need to do a bit of digging.

@chrischdi
Copy link
Member

Its heavily used by prow components itself to cache requests and not hit rate limiting :D.

I think its currently not an option, because we have no ability to reconfigure the github url?! I also don't know if ghproxy is supposed to get used by prowjobs or only by prow components.

@sbueringer
Copy link
Member

sbueringer commented Apr 22, 2023

Based on recent discussions we're trying to set the GitHub token for the jobs right?

I would like to avoid introducing a dependency to the ghproxy. It feels like depending on an implementation detail of Prow

@chrischdi
Copy link
Member

@chrischdi
Copy link
Member

xref:

kubernetes/org#4165 (comment)

@killianmuldoon
Copy link
Contributor Author

It looks like this is an even more substantial problem on the community eks cluster. The release-1.3 e2e testing jobs are failing ~80 percent of the time due to this error:

Ref:

The second link isn't obvious from the triage page, but clicking through it's the same error but hit at an earlier point during clusterctl init.

@sbueringer
Copy link
Member

Yup. Looks to me like now we actually have to work on providing a proper GITHUB_TOKEN

@fabriziopandini
Copy link
Member

/assign

#9236 and #9237 should address this

@killianmuldoon
Copy link
Contributor Author

#9236 and #9237 should address this

Thanks for working on a solution!

I haven't done a review of those PRs yet - but I think we'll still have the problem with any release we don't backport those to - e.g. clusterctl upgrade tests for 1.0, 1.2, 1.3. This should massively reduce the problem which is great for our test signal, but I think we should still push for getting a github token on the EKS cluster as a more permanent solution here.

@sbueringer
Copy link
Member

sbueringer commented Aug 18, 2023

+1 Was thinking exactly the same. It's good going forward but especially because of v1.0 it will take years to not require a GITHUB_TOKEN in CI

@fabriziopandini
Copy link
Member

+1
We can also consider backport if getting a token became a problem

@sbueringer
Copy link
Member

sbueringer commented Aug 18, 2023

Yup let's see. But If possible I would like to avoid new v1.0, v1.2 and v1.3 patch releases

@killianmuldoon
Copy link
Contributor Author

As of 27 Oct 2023 this issue does not seem to be occurring anymore, including on the release-1.3 tests which are running on the EKS community cluster. Closing this, but we can reopen if this becomes a problem again in future.

/close

@k8s-ci-robot
Copy link
Contributor

@killianmuldoon: Closing this issue.

In response to this:

As of 27 Oct 2023 this issue does not seem to be occurring anymore, including on the release-1.3 tests which are running on the EKS community cluster. Closing this, but we can reopen if this becomes a problem again in future.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/flake Categorizes issue or PR as related to a flaky test. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

No branches or pull requests

5 participants