Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't Remove Support to --once #1339

Closed
t0rr3sp3dr0 opened this issue Sep 14, 2021 · 13 comments
Closed

Don't Remove Support to --once #1339

t0rr3sp3dr0 opened this issue Sep 14, 2021 · 13 comments
Labels
enhancement New feature or request

Comments

@t0rr3sp3dr0
Copy link

On PR #660, the --ephemeral flag was introduced with the intent of replacing the --once flag. But by deregistering and reregistering the runner every time, the number of API calls to GitHub dramatically increases, and eventually, the rate limit is exceeded.

I experienced before the --ephemeral even existed. Every time I wanted to run a job, I was registering a new runner, starting it with --once, and deregistering it after the job was completed. Quickly the maximum number of API calls allowed to a GitHub App was exceeded. I solve the problem by keeping the credentials of runners and only registering and deregistering runners on upscale and downscale operations.

With the removal of the --once flag, it will not possible to run a single job on the runner without removing it when the job is completed. Effectively limiting the number of ephemeral runners that can be used within an hour. Using a PAT, you are limited to 2500 jobs run by a self-hosted runner, assuming the only operations you do are runner registration and deregistration. With a GitHub App, in the best case, you will be able to run 6250 jobs, or 7500 with GitHub Enterprise Cloud, which is not reasonable for a lot of companies.

@TingluoHuang @pje

@t0rr3sp3dr0 t0rr3sp3dr0 added the enhancement New feature or request label Sep 14, 2021
@TingluoHuang
Copy link
Member

With --ephemeral the deregistering of the runner is done by the service and not counted as part of your API rate limit.

@t0rr3sp3dr0
Copy link
Author

Ok, then we have double the number of jobs that I estimated. But even then you are limiting the number of jobs that can be run by self-hosted runners within an hour.

There's also another problem to consider. Even if you have enough number of operations, registering multiple runners at the same time triggers the secondary rate limit. This was also a problem that I faced.

Unless registration of runners are exempt from rate limits, including secondary ones, I think we shouldn't remove this feature.

Also I'm not sure that's interesting for GitHub, as you're making every user of this feature increase their number of API calls towards your severs.

@thboop
Copy link
Collaborator

thboop commented Sep 15, 2021

Hey @t0rr3sp3dr0

The runner registration token is reusable for up to an hour, are you able to cache and reuse them? If so, it should be much easier to keep within api limits. The runner registration only uses a single api limit per runner (so 1 per job).

@t0rr3sp3dr0
Copy link
Author

Yes, I can definitely cache the registration token. Then I would be spending one API operation per repository, as I register runners on repositories and not on the organization.

The limit now becomes very reasonable. That would only be a problem if I had over 10k repositories, but I don't even have 1k repositories.

@t0rr3sp3dr0
Copy link
Author

Actually, I noticed another problem. Maybe you have a suggestion for that as well.

The registration token cache works on our scenario because we run our Linux runners on Kubernetes and we have an operator that deals with everything.

But we also have a fleet of macOS runners and in this case we need to cleanup the environment between jobs. So we start macOS with the runner using --once and after the job is processed, we restore a snapshot to ensure we always have the same environment for any job run. This is possible because when the snapshot is taken the runner is already registered.

But by deregistering it at the end of the job when --ephemeral is used, the runner will not be able to start again after the snapshot is restored, as its credentials got invalided when the runner was deregistered on the server-side.

A workaround would be to have on the machine credentials with power to register a new runner when the snapshot is taken. But this poses a security risk that I'm not willing to take.

You might have some idea as GitHub manages its on fleet of macOS runners and solves this same problem somehow.

@thboop
Copy link
Collaborator

thboop commented Sep 17, 2021

Hey @t0rr3sp3dr0 ,

Thanks for the feedback here. A few things our team is working on:

  • We aren't going to remove --once flag just yet. Our team views ephemeral as the future, but we want to ensure we don't break your current workflows as we move towards it
  • We are looking into possibly providing a separate rate limit for the runner registration endpoint with a higher value, it wouldn't count against your normal rate limit quota. Rate limits allow us to prevent malicious users from doing denial of service attacks, but we want it high enough to not impact legit usage. I'll follow up with more exact details on numbers here in the near future.
  • We are working on updating our docs with best practices to manage a pool of ephemeral runners, we will touch on caching and how to manage tokens in those docs. Typically this is an operator pattern, as you are already using on linux. Our macOS machines get a token passed in in a similar pattern

Please follow up if you have any questions or concerns, and we appreciate the feedback!

@t0rr3sp3dr0
Copy link
Author

Thanks for the info, @thboop!

I just got worried because there’s a comment on the --once flag suggesting it would get removed by October 2021.

public static readonly string Once = "once"; // TODO: Remove in 10/2021

@MichaelJJ
Copy link

Providing a separate rate limit for the runner registration endpoint would be a great help, we're looking to use this for a large enterprise with many repos, so caching the token may not help that much.

@thboop
Copy link
Collaborator

thboop commented Sep 27, 2021

We won't be deleting --once in the near future, and we just rolled out a new rate limit for just the runner registration endpoint that supports (10,000) registrations an hour. If this limit isn't high enough for your use case, please create a support ticket and we can move forward from there.

You can see the status of this new rate limit at the https://api.github.com/rate_limit endpoint if you are authenticated.

"actions_runner_registration": {
        "limit": 10000,
        "used": 1,
        "remaining": 9999,
        "reset": 1632884000
}

Going to close this out.

@thboop thboop closed this as completed Sep 27, 2021
@MichaelJJ
Copy link

MichaelJJ commented Sep 27, 2021

@thboop Does the endpoint limit count for enterprise, organization and repo registration tokens?

I've tested all 3 and I see 0 used for actions_runner_registration but the core used count increases instead.

@MichaelJJ
Copy link

MichaelJJ commented Sep 28, 2021

@thboop As a heads up, tried again today, the actions_runner_registration is not being used, still seeing API limit being consumed. I'm using an OAuth token generated from an app installed in an enterprise.

I requested both enterprise and repository runner tokens.

"actions_runner_registration": {
            "limit": 10000,
            "used": 0,
            "remaining": 10000,
            "reset": 1632871719
        }

@thboop
Copy link
Collaborator

thboop commented Sep 28, 2021

Hey @MichaelJJ,
the limit only applies to registering a runner when using ./config.sh or the windows equivalent. The actual enterprise and repository tokens are cacheable for up to an hour, so it's not as pressing to have a separate rate limit for them. You aren't able to cache registering an ephemeral runner, so we wanted to give a higher limit for that.

If changing or configuring those rate limits would be helpful for you, I'd be curious what your use case is. We could definitely do it, feel free to open an issue and please provide more information on your motivation for it!

@MichaelJJ
Copy link

MichaelJJ commented Sep 28, 2021

We are using the endpoints: https://api.github.com/enterprises/enterprise/actions/runners/registration-token and https://api.github.com/orgs/org/actions/runners/registration-token

We are looking to use the https://github.com/actions-runner-controller/actions-runner-controller and our own tools to create runner tokens for registering runners, also our enterprise has few enterprise users so we have a proxy API on-premise to allow a normal user to get a registration token.

I'll create an issue to have the enterprise/org endpoints added to the separate rate limit.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants