Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: GitHub self-hosted runners for specific use cases #1162

Closed
dmathieu opened this issue Sep 9, 2022 · 13 comments
Closed

Proposal: GitHub self-hosted runners for specific use cases #1162

dmathieu opened this issue Sep 9, 2022 · 13 comments

Comments

@dmathieu
Copy link
Member

dmathieu commented Sep 9, 2022

The OpenTelemetry Go SIG has benchmark tests that we currently run manually when need, but would like to run automatically.
We used to run those benchmark tests using GH Actions. But an action running on GitHub's infrastructure often has noisy neighbours, making the tests unreliable. So we disabled them.

Discussing with @TylerHelmuth, I (and him) got access to the CNCF's equinix account, which would allow us to boot dedicated virtual machines.
Our wish would be to be able to run our benchmark tests in self-hosted runners.

The easiest way to setup those runners would be to do it manually for the repository. This seems brittle though, and prone to mistakes.
We could also setup terraform (or any other automation system) to auto-provision the runners. AFAIK, there is no such automation within the otel organization at the moment though.

It also seems like setting up self-hosted runner just for the Go repository is a bit overkill, as we would be under-using them. Other SIGs may have similar needs though. So having them at org-level may be better?

So here come this issue to gather feedback and opinions about this need.

@dmathieu
Copy link
Member Author

dmathieu commented Sep 9, 2022

cc @jmacd

@jmacd
Copy link
Contributor

jmacd commented Sep 21, 2022

The technical committee discussed the risks and benefits of adopting self-hosted runners. Here are some of the points discussed:

  • If possible, benchmarking using CPU performance counters can overcome the problem of noise due to shared-resource contention. (Note that this is not currently built-in to the Golang benchmarks suite to help in OTel-Go.)
  • There is a definite use-case for self-hosted runners that involves correctness testing, particularly for resource detectors that need to be run in actual Cloud-vendor environments (e.g., testing AWS EC2 resource detection on an AWS EC2 machine).
  • Committee worries about security (e.g., mis-use of the resource)

Our overall recommendation is to see if we can avoid this additional configuration and management associated with self-hosted runners; if there's sufficient interest for correctness integration tests that may be more compelling; perhaps the OTel-Go team can find a workaround?; perhaps the engineering time would be better spent manually reviewing benchmark results prior to releases?

@aktech
Copy link

aktech commented Oct 10, 2022

By the way: cirun.io does the same, without adding maintenance burden.

@bobstrecansky
Copy link
Contributor

There are multiple other services that provide value for this:
buildjet.com is another example.

wanted to resurrect this proposal - It'd definitely help with velocity and stability if we had consistent build servers for our projects.

@trask
Copy link
Member

trask commented Mar 1, 2023

I've opened a CNCF ticket to ask just in case they already have some ARM64 runners available at the CNCF GitHub Enterprise level that we can use.

@trask
Copy link
Member

trask commented Mar 1, 2023

the CNCF pointed to setting up self-hosted ARM64 runners using https://github.com/cncf/cluster (which I realized afterwards @dmathieu mentioned above when opening this issue).

pulling down the TC recommendation from above #1162 (comment):

Our overall recommendation is to see if we can avoid this additional configuration and management associated with self-hosted runners; if there's sufficient interest for correctness integration tests that may be more compelling; perhaps the OTel-Go team can find a workaround?; perhaps the engineering time would be better spent manually reviewing benchmark results prior to releases?

@bobstrecansky is there something in PHP repo that needs special care around ARM64 testing? have you seen issues with things not working or breaking on ARM64? or is the desire for automated ARM64 testing more out of an "abundance of caution"?

@bobstrecansky
Copy link
Contributor

@trask more of the latter - ARM support for our testing matrix would be a welcome addition. I'm sure other SIGs would probably like to have that as well.

@Kielek
Copy link
Contributor

Kielek commented Mar 2, 2023

It will be great to have it also for .NET AutoInstrumentation. See: open-telemetry/opentelemetry-dotnet-instrumentation#1865

@bobstrecansky
Copy link
Contributor

@trask - also - I don't think the github runners are open source? That may be a loose requirement:
https://github.com/cncf/cluster#usage-guidelines

@trask
Copy link
Member

trask commented Jul 6, 2023

Watching and hoping... actions/runner-images#5631

@tylerbenson
Copy link
Member

We've made some progress and now have a runner that can be used for benchmarks:
#1662

(Note: Permission must be granted for each workflow individually to avoid abuse.)

@trask
Copy link
Member

trask commented Feb 6, 2024

FYI we now have access to Arm GitHub runners, see #1821, and you can open a repo maintenance request to get access

@dmathieu will that resolve this issue?

@dmathieu
Copy link
Member Author

dmathieu commented Feb 7, 2024

Thank you @trask. Yes, this should be what we need. We'll be looking into it.
In the mean time, I do believe this issue can be closed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants