This document details the responsibilities and tasks for each role in the release team.
Notes:
- The examples in this document are based on the v1.6 release cycle.
- This document focuses on tasks that are done for every release. One-time improvement tasks are out of scope.
- If a task is prefixed with
[Track]
it means it should be ensured that this task is done, but the folks with the corresponding role are not responsible to do it themselves.
- Release Lead
- Responsibilities
- Tasks
- Finalize release schedule and team
- Add/remove release team members
- Prepare main branch for development of the new release
- Create a new GitHub milestone for the next release
- [Track] Remove previously deprecated code
- [Track] Bump dependencies
- Set a tentative release date for the next minor release
- Assemble next release team
- Update milestone applier and GitHub Actions
- [Continuously] Maintain the GitHub release milestone
- [Continuously] Bump the Go version
- [Repeatedly] Cut a release
- [Optional] Public release session
- [Optional] [Track] Bump the Cluster API apiVersion
- [Optional] [Track] Bump the Kubernetes version
- [Optional] Track Release and Improvement tasks
- Communications/Docs/Release Notes Manager
- Responsibilities
- Tasks
- Add docs to collect release notes for users and migration notes for provider implementers
- Update supported versions
- Ensure the book for the new release is available
- Generate weekly PR updates to post in Slack
- Create PR for release notes
- Change production branch in Netlify to the new release branch
- Update clusterctl links in the quickstart
- Continuously: Communicate key dates to the community
- Communicate beta to providers
- CI Signal/Bug Triage/Automation Manager
- Coordination:
- Take ultimate accountability for all release tasks to be completed on time
- Coordinate release activities
- Create and maintain the GitHub release milestone
- Track tasks needed to add support for new Kubernetes versions in upcoming releases
- Ensure a retrospective happens
- Ensure one of the maintainers is available when a release needs to be cut.
- Staffing:
- Assemble the release team for the next release cycle
- Ensure a release lead for the next release cycle is selected and trained
- Set a tentative release date for the next release cycle
- Cutting releases:
- Release patch releases for supported previous releases at least monthly or more often if needed
- Create beta, RC and GA releases for the minor release of the current release cycle
- Release lead should keep an eye on what is going on in the project to be able to react if necessary
- Finalize release schedule and team in the docs/release/releases, e.g. release-1.6.md.
- Update @cluster-api-release-team Slack user group and GitHub team accordingly.
Prior artorg
: kubernetes/org#4353
Prior artcommunity
: kubernetes/community#7423 - Update @cluster-api-release-lead and @cluster-api-release-team aliases in root OWNERS_ALIASES file with Release Team members.
Prior art: https://github.com/kubernetes-sigs/cluster-api/pull/9111/files#diff-4985b733677adf9dda6b5187397d4700868248ef646d64aecfb66c1ced575499 - Announce the release team and release schedule to the mailing list.
If necessary, the release lead can adjust the release team during the cycle to handle unexpected changes in staffing due to personal/professional issues, no-shows, or unplanned work spikes. Adding/removing members can be done by opening a PR to update the release team members list for the release cycle in question.
The goal of this issue is to bump the versions on the main branch so that the upcoming release version is used for e.g. local development and e2e tests. We also modify tests so that they are testing the previous release.
This comes down to changing occurrences of the old version to the new version, e.g. v1.5
to v1.6
:
- Setup E2E tests for the new release:
- Goal is that we have clusterctl upgrade tests for the latest stable versions of each contract / for each supported branch. For
v1.6
this means:- v1beta1:
v1.0
,v1.4
,v1.5
(will change with each new release)
- v1beta1:
- Update providers in
docker.yaml
:- Add a new
v1.6.0
entry. - Remove providers that are not used anymore in clusterctl upgrade tests.
- Change
v1.5.99
tov1.6.99
.
- Add a new
- Adjust
metadata.yaml
's:- Create a new
v1.6
metadata.yaml
(test/e2e/data/shared/v1.6/metadata.yaml
) by copyingtest/e2e/data/shared/main/metadata.yaml
- Add the new release to the main
metadata.yaml
(test/e2e/data/shared/main/metadata.yaml
). - Add the new release to the root level
metadata.yaml
- Remove old
metadata.yaml
's that are not used anymore in clusterctl upgrade tests.
- Create a new
- Adjust cluster templates in
test/e2e/data/infrastructure-docker
:- Create a new
v1.6
folder. It should be created based on themain
folder and only contain the templates we use in the clusterctl upgrade tests (as of todaycluster-template
andcluster-template-topology
). - Remove old folders that are not used anymore in clusterctl upgrade tests.
- Create a new
- Modify the test specs in
test/e2e/clusterctl_upgrade_test.go
(according to the versions we want to test described above). Please note that bothInitWithKubernetesVersion
andWorkloadKubernetesVersion
should be the highest mgmt cluster version supported by the respective Cluster API version.
- Goal is that we have clusterctl upgrade tests for the latest stable versions of each contract / for each supported branch. For
- Update
create-local-repository.py
andtools/internal/tilt-prepare/main.go
:v1.5.99
=>v1.6.99
. - Make sure all tests are green (also run
pull-cluster-api-e2e-full-main
andpull-cluster-api-e2e-workload-upgrade-1-27-latest-main
). - Remove an unsupported release version of Cluster API from the Makefile target that generates e2e templates. For example, remove
v1.3
while working onv1.6
.
Prior art:
- 1.5 - https://github.com/kubernetes-sigs/cluster-api/pull/8430/files
- 1.6 - https://github.com/kubernetes-sigs/cluster-api/pull/9097/files
The goal of this task is to create a new GitHub milestone for the next release, so that we can already move tasks out of the current milestone if necessary.
- Create the milestone for the new release via GitHub UI.
The goal of this task is to remove all previously deprecated code that can be now removed.
- Check for deprecated code and remove it.
- We can't just remove all code flagged with
Deprecated
. In some cases like e.g. in API packages we have to keep the old code.
- We can't just remove all code flagged with
Prior art: Remove code deprecated in v1.6
The goal of this task is to ensure that we have relatively up-to-date dependencies at the time of the release. This reduces the risk that CVEs are found in outdated dependencies after our release.
We should take a look at the following dependencies:
- Go dependencies in
go.mod
files. - Tools used in our Makefile (e.g. kustomize).
- Set a tentative release date for the next minor release and document it by creating a
release-X.Y.md
in docs/release/releases.
Prior art: #9635
There is currently no formalized process to assemble the release team. As of now we ask for volunteers in Slack and office hours.
Once release branch is created by GitHub Automation, the goal of this task would be to ensure we have the milestone applier that applies milestones accordingly and to update GitHub actions to work with new release version. From this point forward changes which should land in the release have to be cherry-picked into the release branch.
-
Update the milestone applier config accordingly (e.g.
release-1.5: v1.5
andmain: v1.6
)
Prior art: cluster-api: update milestone applier config for v1.5 -
Update the GitHub Actions to work with the new release version.
Prior art: Update actions for v1.6
The goal of this task is to keep an overview over the current release milestone and the implementation progress of issues assigned to the milestone.
This can be done by:
- Regularly checking in with folks implementing an issue in the milestone.
- If nobody is working on an issue in the milestone, drop it from the milestone.
- Ensuring we have a plan to get
release-blocking
issues implemented in time.
The goal of this task is to ensure we are always using the latest Go version for our releases.
- Keep track of new Go versions
- Bump the Go version in supported branches if necessary
Prior art: Bump to Go 1.19.5
Note: If the Go minor version of one of our supported branches goes out of supported, we should consider bumping to a newer Go minor version according to our backport policy.
-
Ensure CI is stable before cutting the release (e.g. by checking with the CI manager) Note: special attention should be given to image scan results, so we can avoid cutting a release with CVE or document known CVEs in release notes.
-
Ask the Communications/Docs/Release Notes Manager to create a PR with the release notes for the new desired tag and review the PR. Once the PR merges, it will trigger a GitHub Action to create a release branch, push release tags, and create a draft release. This will also trigger a ProwJob to publish images to the staging repository.
-
Promote images from the staging repository to the production registry (
registry.k8s.io/cluster-api
):-
Wait until images for the tag have been built and pushed to the staging repository by the post push images job.
-
If you don't have a GitHub token, create one by going to your GitHub settings, in Personal access tokens. Make sure you give the token the
repo
scope. -
Create a PR to promote the images to the production registry:
# Export the tag of the release to be cut, e.g.: export RELEASE_TAG=v1.0.1 export GITHUB_TOKEN=<your GH token> make promote-images
Notes:
make promote-images
target tries to figure out your Github user handle in order to find the forked k8s.io repository. If you have not forked the repo, please do it before running the Makefile target.- if
make promote-images
fails with an error likeFATAL while checking fork of kubernetes/k8s.io
you may be able to solve it by manually setting the USER_FORK variable i.e.export USER_FORK=<personal GitHub handle>
kpromo
usesgit@github.com:...
as remote to push the branch for the PR. If you don't havessh
set up you can configure git to usehttps
instead viagit config --global url."https://github.com/".insteadOf git@github.com:
.- This will automatically create a PR in k8s.io and assign the CAPI maintainers.
-
Merge the PR (/lgtm + /hold cancel) and verify the images are available in the production registry:
- Wait for the promotion prow job to complete successfully. Then test the production images are accessible:
docker pull registry.k8s.io/cluster-api/clusterctl:${RELEASE_TAG} && docker pull registry.k8s.io/cluster-api/cluster-api-controller:${RELEASE_TAG} && docker pull registry.k8s.io/cluster-api/kubeadm-bootstrap-controller:${RELEASE_TAG} && docker pull registry.k8s.io/cluster-api/kubeadm-control-plane-controller:${RELEASE_TAG}
-
-
Publish the release in GitHub:
- Reach out to one of the maintainers over the Slack to publish the release in GitHub.
- NOTE: clusterctl will have issues installing providers between the time the release tag is cut and the Github release is published. See issue 7889 for more details.
- The draft release should be automatically created via the Create Release GitHub Action with release notes previously committed to the repo by the release team. Ensure by reminding the maintainer that release is flagged as
pre-release
for allbeta
andrc
releases orlatest
for a new release in the most recent release branch.
- Reach out to one of the maintainers over the Slack to publish the release in GitHub.
-
Publish
clusterctl
to Homebrew by bumping the version in clusterctl.rb.
Notes:- This is only done for new latest stable releases, not for beta / RC releases and not for previous release branches.
- Check if homebrew already has a PR to update the version (homebrew introduced automation that picks it up). Open one if no PR exists.
- To open a PR, you need two things:
tag
(i.e v1.5.3 & v1.4.8 releases are being published, where release-1.5 is the latest stable release branch, so tag would be v1.5.4) andrevision
(it is a commit hash of the tag, i.e if the tag is v1.5.3, it can be found by looking for commit id in v1.5.3 tag page). - Once the PR is open, no action should be needed. Homebrew bot should push a second commit (see an example here) to the same PR to update the binary hashes automatically.
- For an example please see: PR: clusterctl 1.5.3.
- Homebrew has conventions for commit messages usually
the commit message for us should look like:
clusterctl 1.5.3
.
- To open a PR, you need two things:
-
For minor releases Set EOL date for previous release and update Cluster API support and guarantees in CONTRIBUTING.md (prior art: https://github.com/kubernetes-sigs/cluster-api/pull/9817/files).
-
For latest stable releases Index the most recent CRDs in the release by navigating to
https://doc.crds.dev/github.com/kubernetes-sigs/cluster-api@<CURRENT_RELEASE>
Additional information:
- Versioning documentation for more information.
- Cutting a release as of today requires permissions to:
- Create a release tag on the GitHub repository.
- Create/update/publish GitHub releases.
- Host a release session over a public zoom meeting.
- Record the session for future reference and transparency.
- Use release process-related waiting periods as a forum for discussing issues/questions.
- Publish the recording on YouTube channel.
Note This should only be done when we have to bump the apiVersion of our APIs.
- Add new version of the types:
- Create new api packages by copying existing packages.
- Make sure webhooks only exist in the latest apiVersion (same for other subpackages like
index
). - Add conversion and conversion tests.
- Adjust generate targets in the Makefile.
- Consider dropping fields deprecated in the previous apiVersion.
- Update import aliases in
.golangci.yml
. - Switch other code over to the new version (imports across the code base, e.g. controllers).
- Add all versions to the schema in the
main.go
files.
- Add all versions to the schema in the
- Add types to the
PROJECT
files of the respective provider. - Add test data for the new version in
test/e2e/data/{infrastructure-docker,shared}
(also update top-level.gitignore
). - Update
docker.yaml
, make sure all tests are successful in CI.
- Create an issue for the new Kubernetes version via: New Issue: Kubernetes bump.
- Track the issue to ensure the work is completed in time.
-
Create an issue for easier tracking of all the tasks for the release cycle in question.
Prior art: Tasks for v1.6 release cycle -
Create a release improvement tasks GitHub Project Board to track the current status of all improvement tasks planned for the release, their priorities, status (i.e
Done
/In Progress
) and to distribute the work among the Release Team members.Notes:
- At the beginning of the cycle, Release Team Lead should prepare the improvement tasks board for the ongoing release cycle.
The following steps can be taken:
- Edit improvement tasks board name for current cycle (e.g.
CAPI vX.Y release improvement tasks
) - Add/move all individual missing issues to the board
- Edit improvement tasks board name for current cycle (e.g.
- At the beginning of the cycle, Release Team Lead should prepare the improvement tasks board for the ongoing release cycle.
The following steps can be taken:
- Communication:
- Communicate key dates to the community
- Documentation:
- Improve release process documentation
- Ensure the book and provider upgrade documentation are up-to-date
- Maintain and improve user facing documentation about releases, release policy and release calendar
- Release Notes:
- Create PR with release notes
The goal of this task is to initially create the docs so that we can continuously add notes going forward. The release notes doc will be used to collect release notes during the release cycle and will be eventually used to write the final release notes. The provider migration doc is part of the book and contains instructions for provider authors on how to adopt to the new Cluster API version.
- Add a new migration doc for provider implementers.
Prior art: [Add v1.5 -> v1.6 migration doc](part of: #8996) - see changes to SUMMARY.md and addition of v1.5-to-v1.6.md
- Update supported versions in versions.md.
Prior art: Update supported versions for v1.6
The goal of this task to make the book for the current release available under e.g. https://release-1-4.cluster-api.sigs.k8s.io
.
- Add a DNS entry for the book of the new release (should be available under e.g.
https://release-1-4.cluster-api.sigs.k8s.io
).
Prior art: Add DNS for CAPI release-1.2 release branch - Open
https://release-1-4.cluster-api.sigs.k8s.io/
and verify that the certificates are valid. If they are not, talk to someone with access to Netlify, they have to click therenew certificate
button in the Netlify UI.- To add new subdomains to the certificate config, checkout the email snippet template for reference.
- Update references in introduction.md only on the main branch (drop unsupported versions, add the new release version).
Prior art: Add release 1.2 book link
The goal of this task is to keep the CAPI community updated on recent PRs that have been merged. This is done by using the weekly update tool in hack/tools/release/weekly/main.go
. Here is how to use it:
- Checkout the latest commit on the release branch, e.g.
release-1.6
, or the main branch if the release branch doesn't yet exist (e.g. beta release). - Build the release weekly update tools binary.
make release-weekly-update-tool
- Generate the weekly update with the following command:
./bin/weekly --from YYYY-MM-DD --to YYYY-MM-DD --milestone v1.x
- Paste the output into a new Slack message in the
#cluster-api
channel. Currently, we post separate messages in a thread formain
and the two most recent release branches (e.g.release-1.5
andrelease-1.4
).
-
Checkout the
main
branch. -
Build the release note tools binary.
make release-notes-tool
-
Checkout the latest commit on the release branch, e.g.
release-1.6
, or the main branch if the release branch doesn't yet exist (e.g. beta release). -
Generate release notes with:
# PREVIOUS_TAG should be the last patch release of the previous minor release. PREVIOUS_TAG=v1.5.x # RELEASE_TAG should be the new desired tag (note: at this point the tag does not yet exist). RELEASE_TAG=v1.6.x # If this is a beta or RC release, add the --pre-release-version flag ./bin/notes --from=$PREVIOUS_TAG > CHANGELOG/${RELEASE_TAG}.md
-
This will generate a new release notes file at
CHANGELOG/<RELEASE_TAG>.md
. Finalize the release notes:- Update the
Kubernetes version support section
. If this is a patch release you can most probably copy the same values from the previous patch release notes. Except if this is the release where a new Kubernetes version support is added.
Note: Check our Kubernetes support policy in the CAPI book. In case of doubt, reach out to the current release lead. - If this is a
vX.X.0
release, fill in the content for theHighlights
section. Otherwise, remove the section altogether. - If there a deprecations in this release (for example, a CAPI API version drop), add them, to the
Deprecation Warning
section. Otherwise, remove the section altogether. - Look for any
MISSING_AREA
entries. Add the corresponding label to the PR and regenerate the notes. - Look for any
MULTIPLE_AREAS
entries. If the PR does indeed guarantee multiple areas, just remove theMULTIPLE_AREAS
prefix and just leave the areas. Otherwise, fix the labels in the PR and regenerate the notes. - Review that all areas are correctly assigned to each PR. If not, correct the labels and regenerate the notes.
- Look for area duplications in PR title. Sometimes authors add a prefix in their PR title that matches the area label. When the notes are generated, the area is as a prefix to the PR title, which can create redundant information. Remove the one from the PR title and just leave the area. Make sure you capitalize the title after this.
- Check that all entries are in the right section. Sometimes the wrong emoji prefix is added to the PR title, which drives the section in which the entry is added in the release notes. Manually move any entry as needed. Note that fixing the PR title won't fix this even after regenerating the notes, since the notes tool reads this info from the commit messages and these don't get rewritten.
- Sort manually all entries if you made any manual edits that might have altered the correct order.
- For minor releases: Modify
Changes since v1.x.y
toChanges since v1.x
Note: The release notes tool includes all merges since the previous release branch was branched of.
- Update the
-
Checkout
main
, branch out from it and addCHANGELOG/<RELEASE_TAG>.md
. -
Open a pull request against the main branch with all manual edits to
CHANGELOG/<RELEASE_TAG>.md
which is used for the new release notes. The commit and PR title should be🚀 Release v1.x.y
.
Note: Important! The commit should only contain the release notes file, nothing else, otherwise automation will not work.
The goal of this task to make the book for the current release available under https://cluster-api.sigs.k8s.io
.
Someone with access to Netlify should:
- Change production branch in Netlify the current release branch (e.g.
release-1.6
) to make the book available underhttps://cluster-api.sigs.k8s.io
. It's done under production branch settings - Trigger a redeploy.
The goal of this task is to ensure the quickstart has links to the latest clusterctl
binaries.
Update clusterctl links in the quickstart (on main and cherry-pick onto release-1.6).
Prior art: Update clusterctl version to v1.6.x in quick start
Note: The PR for this should be merged after the minor release has been published. Recommended to create it before
the release but with /hold
. This will allow maintainers to review and approve before the release. When the release is
done just remove the hold to merge it.
The goal of this task is to ensure all stakeholders are informed about the current release cycle. For example announcing upcoming code freezes etc based on the release timeline (1.6 example).
Templates for all types of communication can be found in the release-templates page.
Information can be distributed via:
sig-cluster-lifecycle
mailing list- Note: The person sending out the email should ensure that they are first part of the mailing list. If the email is sent out is not received by the community, reach out to the maintainers to unblock and approve the email.
- #cluster-api Slack channel
- Office hours
- Release Team meetings
- Cluster API book
- Github Issue (when communicating beta release to providers)
Relevant information includes:
- Beta, RC, GA and patch release
- Start of code freeze
- Implementation progress
- Release delays and changes if applicable
Stakeholders are:
- End users of Cluster API
- Contributors to core Cluster API
- Provider implementers
The goal of this task is to inform all providers that a new beta.0 version a release is out and that it should be tested. We want to prevent issues where providers don't have enough time to test before a new version of CAPI is released. This stems from a previous issue we are trying to avoid: #8498
We should inform at least the following providers via a new issue on their respective repos that a new version of CAPI is being released (provide the release date) and that the beta.0 version is ready for them to test.
- Addon provider helm: https://github.com/kubernetes-sigs/cluster-api-addon-provider-helm/issues/new
- AWS: https://github.com/kubernetes-sigs/cluster-api-provider-aws/issues/new
- Azure: https://github.com/kubernetes-sigs/cluster-api-provider-azure/issues/new
- Cloudstack: https://github.com/kubernetes-sigs/cluster-api-provider-cloudstack/issues/new
- Digital Ocean: https://github.com/kubernetes-sigs/cluster-api-provider-digitalocean/issues/new
- GCP: https://github.com/kubernetes-sigs/cluster-api-provider-gcp/issues/new
- Kubemark: https://github.com/kubernetes-sigs/cluster-api-provider-kubemark/issues/new
- Kubevirt: https://github.com/kubernetes-sigs/cluster-api-provider-kubevirt/issues/new
- IBMCloud: https://github.com/kubernetes-sigs/cluster-api-provider-ibmcloud/issues/new
- Metal3: https://github.com/metal3-io/cluster-api-provider-metal3/issues/new
- Nested: https://github.com/kubernetes-sigs/cluster-api-provider-nested/issues/new
- OCI: https://github.com/oracle/cluster-api-provider-oci/issues/new
- Openstack: https://github.com/kubernetes-sigs/cluster-api-provider-openstack/issues/new
- Operator: https://github.com/kubernetes-sigs/cluster-api-operator/issues/new
- Packet: https://github.com/kubernetes-sigs/cluster-api-provider-packet/issues/new
- vSphere: https://github.com/kubernetes-sigs/cluster-api-provider-vsphere/issues/new
TODO: Right now we don't have a template for this message but the Comms Team will provide one later.
- Signal:
- Responsibility for the quality of the release
- Continuously monitor CI signal, so a release can be cut at any time
- Add CI signal for new release branches
- Bug Triage:
- Make sure blocking issues and bugs are triaged and dealt with in a timely fashion
- Automation:
- Maintain and improve release automation, tooling & related developer docs
The goal of this task is to have test coverage for the new release branch and results in testgrid. While we add test coverage for the new release branch we will also drop the tests for old release branches if necessary.
- Create new jobs based on the jobs running against our
main
branch:- Copy
test-infra/config/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-main.yaml
toconfig/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-release-1-6.yaml
. - Copy
test-infra/config/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-main-upgrades.yaml
totest-infra/config/jobs/kubernetes-sigs/cluster-api/cluster-api-periodics-release-1-6-upgrades.yaml
. - Copy
test-infra/config/jobs/kubernetes-sigs/cluster-api/cluster-api-presubmits-main.yaml
totest-infra/config/jobs/kubernetes-sigs/cluster-api/cluster-api-presubmits-release-1-6.yaml
. - Modify the following:
- Rename the jobs, e.g.:
periodic-cluster-api-test-main
=>periodic-cluster-api-test-release-1-6
. - Change
annotations.testgrid-dashboards
tosig-cluster-lifecycle-cluster-api-1.6
. - Change
annotations.testgrid-tab-name
, e.g.capi-test-main
=>capi-test-release-1-6
. - For periodics additionally:
- Change
extra_refs[].base_ref
torelease-1.6
(for repo:cluster-api
). - Change interval (let's use the same as for
1.5
).
- Change
- For presubmits additionally: Adjust branches:
^main$
=>^release-1.6$
.
- Rename the jobs, e.g.:
- Copy
- Create a new dashboard for the new branch in:
test-infra/config/testgrids/kubernetes/sig-cluster-lifecycle/config.yaml
(dashboard_groups
anddashboards
). - Remove tests from the test-infra repository for old release branches according to our policy documented in Support and guarantees. For example, let's assume we just created tests for v1.6, then we can now drop test coverage for the release-1.3 branch.
- Verify the jobs and dashboards a day later by taking a look at:
https://testgrid.k8s.io/sig-cluster-lifecycle-cluster-api-1.6
- Update
.github/workflows/weekly-security-scan.yaml
- to setup Trivy and govulncheck scanning -.github/workflows/weekly-md-link-check.yaml
- to setup link checking in the CAPI book - and.github/workflows/weekly-test-release.yaml
- to verify the release target is working - for the currently supported branches. - Update the PR markdown link checker accordingly (e.g.
main
->release-1.6
).
Prior art: Update branch for link checker
Prior art:
The goal of this task is to keep our tests running in CI stable.
Note: To be very clear, this is not meant to be an on-call role for Cluster API tests.
- Add yourself to the Cluster API alert mailing list <br>Note: An alternative to the alert mailing list is manually monitoring the testgrid dashboards (also dashboards of previous releases). Using the alert mailing list has proven to be a lot less effort though.
- Subscribe to
CI Activity
notifications for the Cluster API repo. - Check the existing failing-test and flaking-test issue templates under
.github/ISSUE_TEMPLATE/
folder of the repo, used to create an issue for failing or flaking tests respectively. Please make sure they are up-to-date and if not, send a PR to update or improve them. - Check if there are any existing jobs that got stuck (have been running for more than 12 hours) in a 'pending' state:
- If that is the case, notify the maintainers and ask them to manually cancel and re-run the stuck jobs.
- Triage CI failures reported by mail alerts or found by monitoring the testgrid dashboards:
- Create an issue using an appropriate template (failing-test) in the Cluster API repository to surface the CI failure.
- Identify if the issue is a known issue, new issue or a regression.
- Mark the issue as
release-blocking
if applicable.
- Triage periodic GitHub actions failures, with special attention to image scan results; Eventually open issues as described above.
- Run periodic deep-dive sessions with the CI team to investigate failing and flaking tests. Example session recording: https://www.youtube.com/watch?v=YApWftmiDTg
The Cluster API tests are pretty stable, but there are still some flaky tests from time to time.
To reduce the amount of flakes please periodically:
- Take a look at recent CI failures via
k8s-triage
: - Open issues using an appropriate template (flaking-test) for occurring flakes and ideally fix them or find someone who can. Note: Given resource limitations in the Prow cluster it might not be possible to fix all flakes. Let's just try to pragmatically keep the amount of flakes pretty low.
The goal of bug triage is to triage incoming issues and if necessary flag them with release-blocking
and add them to the milestone of the current release.
We probably have to figure out some details about the overlap between the bug triage task here, release leads and Cluster API maintainers.