Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate Homebrew/homebrew-core CI to GitHub Actions CI #6255

Closed
MikeMcQuaid opened this issue Jun 25, 2019 · 34 comments
Closed

Migrate Homebrew/homebrew-core CI to GitHub Actions CI #6255

MikeMcQuaid opened this issue Jun 25, 2019 · 34 comments
Assignees
Labels
discussion Input solicited from others features New features in progress Maintainers are working on this outdated PR was locked due to age usability Usability of Homebrew/brew

Comments

@MikeMcQuaid
Copy link
Member

A detailed description of the proposed feature

We should migrate the CI for Homebrew/homebrew-core from our own hosted Linux Jenkins instance with our own ESXi clients to Azure Pipelines. This should be done in several stages:

  1. We configure Azure Pipelines to test and verify our workflow (@ladislas has offered to do/start this)
    1. We will need to write an azure-pipelines.yml for Homebrew/homebrew-core which triggers Microsoft's build agents and upload bottles from all PRs to Azure Pipelines.
    2. We will need to make our azure-pipelines.yml upload bottles to Bintray through a release pipeline in a form that brew pull --bottle can access/publish for testing.
  2. We migrate our Linux Jenkins server to use the Azure Pipelines hosted service. This will mean our workers remain on our MacStadium ESXi hosted nodes but builds are triggered by Azure Pipelines build agents.
    1. We will adjust the azure-pipelines.yml to use our own build agents. We will test this with some PRs to ensure that things can be built, tested and bottles published with this new system.
    2. We will switch over the default Homebrew/homebrew-core CI to use Azure Pipelines with our own build agents
    3. When we are happy with this setup: we tear down the https://jenkins.brew.sh service and all related machines.
  3. We migrate our own build agents to use Microsoft's build agents
    1. We will use a single Microsoft build agent alongside one of our own build agents to verify that it can handle e.g. our timeouts and workflow
      • This will require Microsoft permitting multi-day timeouts on their build agents for our usage
    2. We will migrate all of our build agents to use Microsoft build agents
      • This will require Microsoft supporting all the macOS versions we currently support (newest and the two prior) as well as figuring out a solution for future public beta versions
    3. We will tear down our build agents
    4. @MikeMcQuaid will cry with joy

The motivation for the feature

The current CI for Homebrew/homebrew-core works good enough but has the following issues:

  • Updating the base images is a manual process that’s required every time there is a macOS/Xcode/CLT/cask dependencies (e.g. Java, XQuartz, OSXFuse) update.
  • There should not be shared state between builds but some edge cases can result in this. This is undesirable both for CI reliability and security.
  • When any aspect of CI breaks or needs updated we are reliant on a small subset of maintainers to fix things. When it’s a complicated issue we are reliant on a single maintainer (me, @MikeMcQuaid) to fix things. If I get hit by a bus: this would be bad.
  • No-one on the project particularly enjoys or specialises in managing macOS CI
  • At this point the usability of Jenkins is painful for both administrators (e.g. you cannot auto-update plugins) and users (viewing logs for a build is not trivial)
  • We have to maintain our own Linux server for Jenkins, ESXi VM servers and macOS VMs
  • The environment used for Homebrew/brew is not the same as Homebrew/homebrew-core. This sometimes means PRs merged on Homebrew/brew will break when tested on Homebrew/homebrew-core.

It would be good to avoid all of the above work and outsource all (or: as much as possible) system administration work to third-parties who specialise at doing that at scale.

How the feature would be relevant to at least 90% of Homebrew users

  • Maintainers will have more time to do non-CI things
  • Our CI no longer will have a bus factor of one.
  • CI will be more consistent and more reliable

What alternatives to the feature have been considered

  • Staying with Jenkins as-is
    • Hopefully the above sufficiently explains why this status quo is unsustainable
  • Moving Jenkins to build pipelines so we are on the newest Jenkins way of doing things
    • This would solve some problems (us using essentially an unsupported/old Jenkins co. nfiguration) while not solving others (Jenkins UI not auto-updating plugins, maintaining our own infrastructure)
  • Moving to something like BuildKite which would involve continuing to use our own servers
    • Worst case we end up doing this with Azure Pipelines but at least this provides us with the future option to not have to do this
  • Moving to another cloud provider e.g. Travis CI/Circle CI rather than Azure
    • Azure currently has by far the best performance as well as having the most positive direct relationship with Homebrew (Mike has previously tried to convince both Travis CI and Circle CI to help with this and they've been extremely unhelpful whereas Microsoft has a direct product management relationship with us)

@Homebrew/maintainers any thoughts on any of the above? Anything I've missed? Any major problems I've not anticipated?

@MikeMcQuaid MikeMcQuaid added features New features in progress Maintainers are working on this discussion Input solicited from others usability Usability of Homebrew/brew labels Jun 25, 2019
@ethomson
Copy link

This is currently blocked on https://developercommunity.visualstudio.com/content/problem/337049/unable-to-create-pull-request-release-trigger.html?childToView=593066#comment-593066 CC @ethomson for an ETA on a fix for that.

Last I heard was about 3 weeks from now. Let me check in and see if there's an update.

@MikeMcQuaid
Copy link
Member Author

@ethomson Great, thanks Ed!

@fxcoudert
Copy link
Member

Several items say “will require Microsoft doing X”… do we have a agreement on principle to those things?

@ethomson
Copy link

@fxcoudert Yes, this is work that we (Microsoft) are doing.

@Moisan
Copy link
Member

Moisan commented Jun 25, 2019

The environment used for Homebrew/brew is not the same as Homebrew/homebrew-core.

Why is it that way?

@MikeMcQuaid
Copy link
Member Author

The environment used for Homebrew/brew is not the same as Homebrew/homebrew-core.

Why is it that way?

@Moisan Homebrew/brew currently already uses Azure Pipelines whereas Homebrew/homebrew-core uses Jenkins and our own agents.

@iMichka
Copy link
Member

iMichka commented Jun 26, 2019

Special bonus: you can clone the linuxbrew-core release pipeline which is already being used. There will be 2-3 env variables and tokens to change, but that should be it. I can help with that part, but it is pretty straightforward.

@sjackman
Copy link
Member

which is already being used?

Amazing! Are you using it for builds that time out or exceed memory on CircleCI?

@iMichka
Copy link
Member

iMichka commented Jun 27, 2019

Are you using it for builds that time out or exceed memory on CircleCI?

Yes. In these cases I manually trigger the release pipeline for that build, and upload the bottles that way. Far better than building the bottles on my computer locally ...

@MikeMcQuaid
Copy link
Member Author

I chatted with MacStadium today and they pointed me to Orka (https://www.macstadium.com/orka) and their improved Cloud Automation docs (https://docs.macstadium.com/docs/cloud-automation).

If this worked out for us it could be something we use with our dedicated hardware while still migrating our CI frontend from Jenkins to Azure Pipelines.

@zbeekman
Copy link
Contributor

I would be impressed & surprised if Azure pipelines started supporting beta macOS releases. For this reason I suspect we would want to keep MacStadium infra for dealing with new OS releases.

But, in general, looks great!

@MikeMcQuaid MikeMcQuaid changed the title Migrate Homebrew/homebrew-core CI to Azure Pipelines CI Migrate Homebrew/homebrew-core CI to GitHub Actions CI Aug 9, 2019
@MikeMcQuaid
Copy link
Member Author

I've updated this to suggest we move towards https://github.blog/2019-08-08-github-actions-now-supports-ci-cd/ instead. It's pretty similar under the hood and the lack of separate access control will make our lives much easier. I will start investigating the move of Homebrew/brew and other repos to that.

@sjackman
Copy link
Member

My third-party tap https://github.com/brewsci/homebrew-bio has migrating to building and publishing bottles using GitHub Actions.

@jonchang
Copy link
Contributor

jonchang commented Feb 20, 2020

Self hosted runners seem to work well: Homebrew/homebrew-core#50468

We'd only be able to build bottles on one macOS version until support for custom labels is added.

@jonchang
Copy link
Contributor

I'm thinking about the maintainer workflow for this. Currently I think this is the space of what is possible based on the Github Actions permissions model.

Homebrew/linuxbrew-core workflow:

  1. Maintainer opens pull request from their fork. Runner builds and saves a bottle as a workflow artifact.
  2. Maintainer clicks "Merge" in the Github UI. Runner uploads and publishes the bottles to Bintray. Runner pushes the bottle commit to master.

Pros: Everything happens in the GitHub UI.

Cons: Bottles lag updated formulae by about 60-90 seconds, which means users might get errors about missing bottles in that time frame.

brewsci/homebrew-bio workflow:

  1. Maintainer opens pull requests from their fork, against a base develop branch. Runner builds and saves a bottle as a workflow artifact.
  2. Maintainer clicks "Merge" in the Github UI. Runner uploads and publishes the bottles to Bintray. Runner pushes the bottle commit to develop, then pushes develop to master`.

Pros: Everything happens in the Github UI. Users receive bottles and formula updates at the same time.

Cons: Maintainers must remember to base their pull requests on the develop branch. Problems can happen if develop and master go out of sync.

Repository branch workflow:

  1. Maintainer opens pull requests from a branch in the repository. Runner builds and uploads an unpublished bottle to Bintray.
  2. Maintainer clicks "Merge" in the Github UI. Runner publishes the bottle on Bintray and pushes a bottle commit.

Pros: Everything happens in the Github UI.

Cons: Impossible for non-maintainers to contribute bottles

New brew pull workflow:

  1. Maintainer opens pull request from their fork. Runner builds and uploads a bottle as a workflow artifact.
  2. Maintainer runs brew pull --bottle ###. Script sends dispatch event to Github actions. Script downloads bottle JSONs from GitHub Actions and generates bottle commits. Meanwhile, runner uploads and publishes bottle on Bintray. Script polls Bintray for published bottles.
  3. Maintainer runs git push

Pros: Maintainer experience identical to current workflow.

Cons: Bottled formulae don't use Github UI. Pull requests show up as "closed" rather than "merged".

Example brew request-merge workflow:

  1. Maintainer opens pull request from their fork. Runner builds and saves a bottle as a workflow artifact.
  2. Maintainer runs brew request-merge ###. Script sends a dispatch event to Github Actions. Runner uploads and publishes bottles to Bintray. Runner merges the pull request and adds a bottle commit, then pushes to master.

Pros: Users receive bottles and formula updates at the same time.

Cons: Bottled formulae don't use Github UI. Need to write and test new request-merge code. Need to check exactly what order of events needs to occur to get pull requests to show up as "merged" in the pull request UI.

@MikeMcQuaid
Copy link
Member Author

We'd only be able to build bottles on one macOS version until support for custom labels is added.

Interesting. This is a hard blocker for us for now, I'm surprised it's not supported yet.

Homebrew/linuxbrew-core workflow:
Cons: Bottles lag updated formulae by about 60-90 seconds, which means users might get errors about missing bottles in that time frame.

Unfortunately this makes it a non-started for Homebrew/homebrew-core given our scale.

brewsci/homebrew-bio workflow:
Cons: Maintainers must remember to base their pull requests on the develop branch. Problems can happen if develop and master go out of sync.

I think this could be OK if develop (or whatever it is named) is the default branch and brew update handles checkout out the right one.

Repository branch workflow:
Cons: Impossible for non-maintainers to contribute bottles

Non-starter, unfortunately. Too many contributions from non-maintainers.

New brew pull workflow:
Cons: Bottled formulae don't use Github UI. Pull requests show up as "closed" rather than "merged".

I think this would be fine and is preferable to the linuxbrew-core or repository branch workflows. I agree that using a more GitHub-native solution would be preferable if possible.

Example brew request-merge workflow:
Cons: Bottled formulae don't use Github UI. Need to write and test new request-merge code.

Need to check exactly what order of events needs to occur to get pull requests to show up as "merged" in the pull request UI.

You need a merge commit which merges the same SHA1 as the PR branch into the default branch.

As a result, you need to push any commits to the PR before you can "merge" it. With our current brew pull workflow technically they would show up as "merged" if we always pushed the bottle commits to the PR branch and then merged (squashed or rebased) that.


I was thinking a slight deviation from brew pull or brew request-merge would be nice. Everything happens in a PR on Homebrew/homebrew-core (including forks) but we somehow trigger (with a comment, label, repository dispatch, etc.) a GitHub Actions run (using the master workflow) which uploads and publishes the bottles, pushes them to the PR and then merges it for us (either using the GitHub API if we want squash/rebase or by pushing to master if we can tolerate merge commits).

This wouldn't allow us to use the "merge PR" button (for bottled PRs) but it would allow everything to be done within the GitHub UI on desktop or mobile using normal workflows.

@jonchang
Copy link
Contributor

Everything happens in a PR on Homebrew/homebrew-core (including forks) but we somehow trigger (with a comment, label, repository dispatch, etc.) a GitHub Actions run

This wouldn't be possible with standard actions because I believe comments and label actions on pull requests from forked repositories do not have access to secrets. However we could implement some kind of web hook like we already do with pinging BrewTestBot which could then do those actions on our behalf.

@MikeMcQuaid
Copy link
Member Author

However we could implement some kind of web hook like we already do with pinging BrewTestBot which could then do those actions on our behalf.

Yeh, this would be cool 👍

@jonchang
Copy link
Contributor

However we could implement some kind of web hook like we already do with pinging BrewTestBot which could then do those actions on our behalf.

Yeh, this would be cool 👍

Can you point me to where the current code that does this lives? Is it just a Jenkins config?

@MikeMcQuaid
Copy link
Member Author

Can you point me to where the current code that does this lives? Is it just a Jenkins config?

Yes, just Jenkins configuration. The TL;DR will be to configure a webhook to hit an endpoint which then calls a script with a GitHub API key. I'd suggest a simple Heroku Ruby app for this.

@jonchang
Copy link
Contributor

Actually, thinking about it this could be done entirely in GitHub Actions on the schedule trigger, which I believe does have the correct permissions to access secrets. Maintainers could label a formula as bottles requested and BrewTestBot will sweep for formulae with green builds and bottles available and push the bottle commit to the pull request head branch. An automerge label could also be used for BrewTestBot to additionally do the merge.

@jonchang
Copy link
Contributor

jonchang commented Feb 22, 2020

To remove Jenkins:

  • Set up self-hosted runners on Orka VM using my modified runner.
  • Set up a pull_request workflow that builds bottles and uploads them as workflow artifacts, using our self-hosted runners. Replaces Homebrew Core Pull Requests Jenkins job. workflows: add self-hosted runners homebrew-core#50468
  • Set up a repository_dispatch workflow based on the Linuxbrew-core upload-bottles workflow that retrieves bottles from workflow artifacts and uploads them to Bintray, and pushes the bottle commit to a tag on BrewTestBot's fork of homebrew-core. Replaces Homebrew Bottles Jenkins job. decided against this
  • Modify brew pull --bottle to additionally fire the repository_dispatch event for the upload-bottles workflow, then continue to use the current logic to add a bottle commit and publish bottles on Bintray. Maintainers then git push to Homebrew/core. (Not quite as planned, see Add new pr-pull command #7236 for our final solution)
  • Decommission Jenkins.

To remove brew pull --bottle:

To remove tedious VM image building:

  • 🤷‍♂
  • Write a bunch of ops code that interfaces with the Orka JSON API 🤔

@sjackman
Copy link
Member

sjackman commented Feb 22, 2020

This wouldn't be possible with standard actions because I believe comments and label actions on pull requests from forked repositories do not have access to secrets.

I know that approving a fork pull request does not have access to secrets, I tested that myself, but I haven't tested a comment or label. It'd be worth testing to find out definitively.

@MikeMcQuaid
Copy link
Member Author

Maintainers could label a formula as bottles requested and BrewTestBot will sweep for formulae with green builds and bottles available and push the bottle commit to the pull request head branch. An automerge label could also be used for BrewTestBot to additionally do the merge.

This sounds great 👏. I'd suggest we have a CI check on this so that we can't accidentally merge PRs that haven't had the job run yet. Otherwise: this flow sounds perfect!

To remove Jenkins:

Sounds good. My suggestion would be to have the CI jobs both running side-by-side for a bit and switching over the Bottles job from Jenkins to GitHub Actions. This means we can switch back if things go wrong and we can compare the results on PRs for a while (like a week or something?). After that: agreed, we can tear down Jenkins.

To remove brew pull --bottle:

I like the idea of splitting these into two "tasks".

To remove tedious VM image building:

I think once we're zeroed in on a single setup we can probably make better use of scripts than we already are. There will probably remain some minor tedium, indeed.

Thanks for all your work on this @jonchang!

@Bo98
Copy link
Member

Bo98 commented Feb 26, 2020

Do we have an equivalent for "Homebrew Testing"?

There were a couple of use cases for it. Besides the obvious of testing builds (which can perhaps be done instead with a "do not merge" revision bump pull request), it was also used to generate bottles without performing any actual formula changes prior. I believe this workflow (or rather "Homebrew Catalina Testing") was used to generate many Catalina bottles after Catalina's release. "Homebrew Testing" has also been used to generate bottles when someone accidentally merged without brew pull.

@jonchang
Copy link
Contributor

Do we have an equivalent for "Homebrew Testing"?

There were a couple of use cases for it. Besides the obvious of testing builds (which can perhaps be done instead with a "do not merge" revision bump pull request), it was also used to generate bottles without performing any actual formula changes prior. I believe this workflow (or rather "Homebrew Catalina Testing") was used to generate many Catalina bottles after Catalina's release. "Homebrew Testing" has also been used to generate bottles when someone accidentally merged without brew pull.

I intend to adapt https://github.com/Homebrew/linuxbrew-core/blob/master/.github/workflows/dispatch-build-bottle.yml

@xu-cheng
Copy link
Member

xu-cheng commented Feb 26, 2020

I noticed that there is an on-going work to use self-hosted GitHub Actions runner for CI. However, for your information, I want to remind you that this may not be secure.

Per Github document (https://help.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories)

We recommend that you do not use self-hosted runners with public repositories.

Forks of your public repository can potentially run dangerous code on your self-hosted runner machine by creating a pull request that executes the code in a workflow.

For example, a malicious PR can do the followings:

  • Access the token used to connect the runner to Github. This would allow an attacker to hijack the entire runner in the following steps:
    • Copy the tokens.
    • Kill the runner process in Homebrew machines.
    • Use the tokens to create a new runner in a machine controlled by the attacker.
    • All the future PRs would be run in the environment controlled by the attacker.
  • Overwrite the any binaries (e.g. create a malicious /usr/local/bin/git or brew-test-bot) in the VM. As a result, all the future PRs would be run in a compromised environment. Noted that, for self-hosted runner, the same VM environment is used across the PRs.

@MikeMcQuaid
Copy link
Member Author

There's nothing more inherently insecure about using these runners than our existing Jenkins setup.

If you find an actual vulnerability with our setup please follow our responsible security disclosure process: https://github.com/Homebrew/brew#security

@zbeekman
Copy link
Contributor

zbeekman commented Mar 5, 2020

@jonchang (and everyone else in this thread) I've noticed that the GHA runners are timing out for formulae PRs that need a lot of revision bumps for dependencies, e.g., Homebrew/homebrew-core#51116

With self hosted runners can we set arbitrary timeout limits? If not, is there a plan for a mechanism to breakup PRs with tons of revision bumps?

Sincere apologies if this was addressed in a comment above that I missed, but it's hard to keep up with every thread.

@jonchang
Copy link
Contributor

jonchang commented Mar 5, 2020

With self hosted runners can we set arbitrary timeout limits? If not, is there a plan for a mechanism to breakup PRs with tons of revision bumps?

Yeah there’s a timeout setting. By default it’s six hours but we can raise that on self hosted runners. What would be a good timeout?

@Bo98
Copy link
Member

Bo98 commented Mar 6, 2020

I'd rather it be "too high" (if that's even possible) and something we're confident we won't hit rather than something that we "probably" won't hit. We are able to identify and kill hanging jobs ourselves. If something has been running for over a day, it's likely intentional.

@MikeMcQuaid
Copy link
Member Author

Yeh, I'd say setting it to ~3 days would be a decent starting point once we switch over. For now, though, I think a short timeout is sensible as the bottles aren't actually being used.

@sjackman
Copy link
Member

sjackman commented Mar 6, 2020

With self hosted runners can we set arbitrary timeout limits? If not, is there a plan for a mechanism to breakup PRs with tons of revision bumps?

If the formula being updated is versioned, then the revision bumps can be done in batches.

@MikeMcQuaid
Copy link
Member Author

This was done! Thanks so much to everyone who was involved but a particular shout-out to @jonchang who has pushed this forward more than any other individual.

Well done everyone, pour one out for Jenkins 🍻

@BrewTestBot BrewTestBot added the outdated PR was locked due to age label Dec 11, 2020
@Homebrew Homebrew locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
discussion Input solicited from others features New features in progress Maintainers are working on this outdated PR was locked due to age usability Usability of Homebrew/brew
Projects
None yet
Development

No branches or pull requests

13 participants