Github rate limit #8426

killianmuldoon · 2023-03-30T16:52:38Z

Which jobs are flaking?

CAPI e2e main

Which tests are flaking?

e.g. When testing clusterctl upgrades using ClusterClass (v1.2=>current) [ClusterClass] Should create a management cluster and then upgrade all the providers

Since when has it been flaking?

For some months.

Testgrid link

https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/periodic-cluster-api-e2e-main/1641014387735007232

Reason for failure (if possible)

Multiple jobs in the CAPI e2e test run are flaking due to hitting a rate limit with the Github API.

The error looks like:

Error: failed to read "cert-manager.yaml" from provider's repository "cert-manager": failed to download files from GitHub release v1.11.0: rate limit for github api has been reached. Please wait one hour or get a personal API token and assign it to the GITHUB_TOKEN environment variable

Anything else we need to know?

One possible solution (from @sbueringer): vendor the cert-manager yaml in most tests to cut down on this flake. We could consider vendoring it for all tests, but we would lose coverage of the cert-manager download in e2e testing which is a trade-off.

Label(s) to be applied

/kind flake
One or more /area label. See https://github.com/kubernetes-sigs/cluster-api/labels?q=area for the list of labels.

The text was updated successfully, but these errors were encountered:

fabriziopandini · 2023-04-04T19:21:52Z

/triage accepted
/help

k8s-ci-robot · 2023-04-04T19:21:54Z

@fabriziopandini:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted
/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

chrischdi · 2023-04-13T05:57:30Z

Q1: I think lots of stuff in prow uses the ghproxy. But does not look like a real option here.

Q2: what about adding the GITHUB_TOKEN, as we do for periodic-cluster-api-verify-book-links-release-0-4 ?

https://github.com/kubernetes/test-infra/blob/343d1d18b60a33c522935ea30c234ea4b0141d2c/config/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-release-0-4.yaml#L125

But cluster-api seems to be very unique in doing this 🤔, there not much jobs mounting a github token in test-infra.

killianmuldoon · 2023-04-13T07:33:21Z

Q1: I think lots of stuff in prow uses the ghproxy. But does not look like a real option here.

I didn't realize this existed! Why wouldn't it work here in your opinion? I'm not sure how mounting the token works for that job either - need to do a bit of digging.

chrischdi · 2023-04-13T10:08:09Z

Its heavily used by prow components itself to cache requests and not hit rate limiting :D.

I think its currently not an option, because we have no ability to reconfigure the github url?! I also don't know if ghproxy is supposed to get used by prowjobs or only by prow components.

sbueringer · 2023-04-22T07:26:45Z

Based on recent discussions we're trying to set the GitHub token for the jobs right?

I would like to avoid introducing a dependency to the ghproxy. It feels like depending on an implementation detail of Prow

chrischdi · 2023-04-26T15:02:46Z

Triage links:

chrischdi · 2023-05-11T14:11:20Z

xref:

kubernetes/org#4165 (comment)

killianmuldoon · 2023-08-12T13:32:54Z

It looks like this is an even more substantial problem on the community eks cluster. The release-1.3 e2e testing jobs are failing ~80 percent of the time due to this error:

Ref:

The second link isn't obvious from the triage page, but clicking through it's the same error but hit at an earlier point during clusterctl init.

sbueringer · 2023-08-14T09:20:11Z

Yup. Looks to me like now we actually have to work on providing a proper GITHUB_TOKEN

fabriziopandini · 2023-08-17T21:37:20Z

/assign

#9236 and #9237 should address this

killianmuldoon · 2023-08-18T08:24:58Z

#9236 and #9237 should address this

Thanks for working on a solution!

I haven't done a review of those PRs yet - but I think we'll still have the problem with any release we don't backport those to - e.g. clusterctl upgrade tests for 1.0, 1.2, 1.3. This should massively reduce the problem which is great for our test signal, but I think we should still push for getting a github token on the EKS cluster as a more permanent solution here.

sbueringer · 2023-08-18T08:34:22Z

+1 Was thinking exactly the same. It's good going forward but especially because of v1.0 it will take years to not require a GITHUB_TOKEN in CI

fabriziopandini · 2023-08-18T13:02:02Z

+1
We can also consider backport if getting a token became a problem

sbueringer · 2023-08-18T13:03:44Z

Yup let's see. But If possible I would like to avoid new v1.0, v1.2 and v1.3 patch releases

killianmuldoon · 2023-10-27T14:14:08Z

As of 27 Oct 2023 this issue does not seem to be occurring anymore, including on the release-1.3 tests which are running on the EKS community cluster. Closing this, but we can reopen if this becomes a problem again in future.

/close

k8s-ci-robot · 2023-10-27T14:14:13Z

@killianmuldoon: Closing this issue.

In response to this:

As of 27 Oct 2023 this issue does not seem to be occurring anymore, including on the release-1.3 tests which are running on the EKS community cluster. Closing this, but we can reopen if this becomes a problem again in future.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added kind/flake Categorizes issue or PR as related to a flaky test. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 30, 2023

Ankitasw mentioned this issue Jun 9, 2023

[E2E] Enable EventBridge in the tests kubernetes-sigs/cluster-api-provider-aws#4100

Merged

4 tasks

killianmuldoon mentioned this issue Aug 12, 2023

migrate capi-1-4 jobs to community cluster kubernetes/test-infra#30365

Merged

chrischdi mentioned this issue Aug 14, 2023

Migrate prowjobs to new infrastructure #8689

Closed

killianmuldoon mentioned this issue Aug 14, 2023

REQUEST: Create a Github token for Cluster API jobs kubernetes/org#4165

Closed

This was referenced Aug 17, 2023

🌱 use http get to download files from GitHub in clusterctl #9236

Merged

🌱 use goproxy to check version in clusterctl #9237

Merged

k8s-ci-robot assigned fabriziopandini Aug 17, 2023

k8s-ci-robot closed this as completed Oct 27, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Github rate limit #8426

Github rate limit #8426

killianmuldoon commented Mar 30, 2023

fabriziopandini commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

chrischdi commented Apr 13, 2023 •

edited

Loading

killianmuldoon commented Apr 13, 2023

chrischdi commented Apr 13, 2023

sbueringer commented Apr 22, 2023 •

edited

Loading

chrischdi commented Apr 26, 2023

chrischdi commented May 11, 2023

killianmuldoon commented Aug 12, 2023

sbueringer commented Aug 14, 2023

fabriziopandini commented Aug 17, 2023

killianmuldoon commented Aug 18, 2023

sbueringer commented Aug 18, 2023 •

edited

Loading

fabriziopandini commented Aug 18, 2023

sbueringer commented Aug 18, 2023 •

edited

Loading

killianmuldoon commented Oct 27, 2023

k8s-ci-robot commented Oct 27, 2023

Github rate limit #8426

Github rate limit #8426

Comments

killianmuldoon commented Mar 30, 2023

Which jobs are flaking?

Which tests are flaking?

Since when has it been flaking?

Testgrid link

Reason for failure (if possible)

Anything else we need to know?

Label(s) to be applied

fabriziopandini commented Apr 4, 2023

k8s-ci-robot commented Apr 4, 2023

Guidelines

chrischdi commented Apr 13, 2023 • edited Loading

killianmuldoon commented Apr 13, 2023

chrischdi commented Apr 13, 2023

sbueringer commented Apr 22, 2023 • edited Loading

chrischdi commented Apr 26, 2023

chrischdi commented May 11, 2023

killianmuldoon commented Aug 12, 2023

sbueringer commented Aug 14, 2023

fabriziopandini commented Aug 17, 2023

killianmuldoon commented Aug 18, 2023

sbueringer commented Aug 18, 2023 • edited Loading

fabriziopandini commented Aug 18, 2023

sbueringer commented Aug 18, 2023 • edited Loading

killianmuldoon commented Oct 27, 2023

k8s-ci-robot commented Oct 27, 2023

chrischdi commented Apr 13, 2023 •

edited

Loading

sbueringer commented Apr 22, 2023 •

edited

Loading

sbueringer commented Aug 18, 2023 •

edited

Loading

sbueringer commented Aug 18, 2023 •

edited

Loading