Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

migrate away from test-infra-trusted build cluster #32432

Closed
4 tasks done
ameukam opened this issue Apr 11, 2024 · 37 comments
Closed
4 tasks done

migrate away from test-infra-trusted build cluster #32432

ameukam opened this issue Apr 11, 2024 · 37 comments
Assignees
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@ameukam
Copy link
Member

ameukam commented Apr 11, 2024

There are a few jobs running on the test-infra-trusted we should either migrate to k8s-infra-prow-build-trusted or remove:

  • post-test-infra-push-git
  • post-test-infra-push-git-custom-k8s-auth
  • post-test-infra-upload-testgrid-config
  • ci-test-infra-update-slack-oncall
@ameukam ameukam added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Apr 11, 2024
@k8s-ci-robot k8s-ci-robot added the needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. label Apr 11, 2024
@ameukam
Copy link
Member Author

ameukam commented Apr 11, 2024

/assign @michelle192837
/sig testing

@k8s-ci-robot k8s-ci-robot added sig/testing Categorizes an issue or PR as relevant to SIG Testing. and removed needs-sig Indicates an issue or PR lacks a `sig/foo` label and requires one. labels Apr 11, 2024
@BenTheElder
Copy link
Member

ci-test-infra-update-slack-oncall

no point migrating this, we'll just shut it down when prow is migrated and instead people can posted in #testing-ops in slack.

we should actually probably proactively stop advertising @test-infra-oncall to the broader project.

post-test-infra-upload-testgrid-config

.... uhhhh this one I'm not sure, because we have to be able to publish to testgrid's config bucket .... migrating testgrid is another fun topic

The image publishing jobs we should be able to move over.

@michelle192837
Copy link
Contributor

re: ci-test-infra-update-slack-oncall: Ah, that's easier then.

re: post-test-infra-upload-testgrid-config: I think this should be doable. I have not gone through the full details, but imo thanks to config merger merging configs for TestGrid from multiple locations, we can stand up a new config upload job in community-owned infra, verify the uploaded config in the new location is the same as the old, and swap the config location used in the TestGrid instance overall.

@BenTheElder
Copy link
Member

On the K8s infra side we're going to need a bucket for this to start then, cc @upodroid @ameukam for thoughts.

post-test-infra-push-git
post-test-infra-push-git-custom-k8s-auth

Not sure how these didn't wind up getting migrated yet ... looks like this is part of k8s-testimages kubernetes/k8s.io#1523

I don't see evidence that we're actually using these images in Kubernetes and we should probably just delete them.

Prow has built in known-hosts handlinmg in clonerefs these days, I don't think we need these anymore.

@michelle192837
Copy link
Contributor

Sorry for the delay, I'm looking into this and some of the other unmigrated jobs today.

@BenTheElder
Copy link
Member

in #32808 the list should be clearer now, a lot of these are related to running prow so that's fine, but some are pushing images and that's concerning, we should either eliminate or migrate them.

@BenTheElder
Copy link
Member

here's one #32812

@BenTheElder
Copy link
Member

BenTheElder commented Jun 21, 2024

File Path Job Link
config/jobs/kubernetes/test-infra/test-infra-periodics.yaml job-migration-todo-report Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-autobump-prow-for-auto-deploy Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-autobump-prow Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-update-slack-oncall Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-branchprotector Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-label-sync Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-gencred-refresh-kubeconfig Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-rotate-legacy-default-build-sa-json-key Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-alpine Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-gcloud-terraform Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-git Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-git-custom-k8s-auth Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-deploy-prow Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-reconcile-hmacs Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-misc-images Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-kettle Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-bazel Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-gcb-docker-gcloud Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-test-gubernator Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-gencred Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-gencred-refresh-kubeconfig Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-upload-oncall Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-upload-testgrid-config Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-upload-boskos-config Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-cip-prow Search Results

SIG Contribex:

File Path Job Link
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-community-tempelis-apply Search Results

Not trusted cluster, but the other non-migrated jobs with test-infra in the name (there could be more) ...

File Path Job Link
config/jobs/kubernetes/test-infra/janitors.yaml maintenance-pull-janitor Search Results
config/jobs/kubernetes/test-infra/janitors.yaml maintenance-ci-aws-janitor Search Results
config/jobs/kubernetes/test-infra/janitors.yaml maintenance-ci-janitor Search Results

@BenTheElder
Copy link
Member

Janitor jobs: won't be migrated, will be turned down.

post-test-infra-upload-oncall, ci-test-infra-update-slack-oncall: no need, this will be obsolete.

job-migration-todo-report: will be obsolete, also this isn't working correctly and we're just manually checking in the tool output, I'll clean this one up.

ci-test-infra-rotate-legacy-default-build-sa-json-key: will be obsolete

post-test-infra-upload-boskos-config: will be obsolete, we have a different boskos config in github.com/kubernetes/k8s.io for community boskos resources

post-test-infra-cip-prow: I deleted this in #32812

post-test-infra-push.* are concerning.
post-test-infra-upload-testgrid-config will need migrating

I'm guessing renconcile hmacs needs to be considered as part of control plane migration, along with definitely branchprotector.

@BenTheElder
Copy link
Member

#32814 will remove the job-migration-todo-report report job.

ci-test-infra-label-sync should be able to migrate to k8s-infra-prow-build-trusted without waiting for the rest of prow, but we might not have the right secrets available yet.

@michelle192837
Copy link
Contributor

On the K8s infra side we're going to need a bucket for this to start then, cc @upodroid @ameukam for thoughts.

post-test-infra-push-git
post-test-infra-push-git-custom-k8s-auth

Not sure how these didn't wind up getting migrated yet ... looks like this is part of k8s-testimages kubernetes/k8s.io#1523

I don't see evidence that we're actually using these images in Kubernetes and we should probably just delete them.

Prow has built in known-hosts handlinmg in clonerefs these days, I don't think we need these anymore.

These are used as the base images for building Prow images (https://cs.k8s.io/?q=gcr.io%2Fk8s-prow%2Fgit&i=nope&files=&excludeFiles=&repos=). I think we can replace the git image with alpine, but git-custom-k8s-auth might need to stay?

@michelle192837
Copy link
Contributor

michelle192837 commented Jun 21, 2024

Job Link Uses
post-test-infra-push-alpine Search Results Search Results
post-test-infra-push-gcloud-terraform Search Results Search Results
post-test-infra-push-git Search Results Search Results
post-test-infra-push-git-custom-k8s-auth Search Results Search Results
post-test-infra-push-misc-images Search Results Search Results
post-test-infra-push-kettle Search Results Search Results
post-test-infra-push-bazel Search Results Search Results
post-test-infra-push-gcb-docker-gcloud Search Results Search Results
post-test-infra-push-test-gubernator Search Results Search Results
post-test-infra-push-gencred Search Results Search Results

Several of these push images that aren't used and should be turned down (post-test-infra-push-test-gubernator, post-test-infra-push-bazel, post-test-infra-push-gcloud-terraform, post-test-infra-push-gencred).

  • Note that post-test-infra-push-gencred hasn't succeeded, and pushed to k8s-testimages, which is not what jobs are using; jobs use the image pushed to k8s-prow and pushed by post-test-infra-push-misc-images)

@michelle192837
Copy link
Contributor

Discussed offline: for post-test-infra-push-git and post-test-infra-push-git-custom-k8s-auth, since we'll need to migrate the latter anyways, we can migrate the former at the same time, then see if we can replace the git image base with alpine instead.

@BenTheElder
Copy link
Member

BenTheElder commented Jun 25, 2024

then see if we can replace the git image base with alpine instead.

we should probably use something else, we generally prefer to use e.g. debian/distroless for kubernetes base images, for licensing reasons (alpine/busybox) and alignment on patching etc.

@BenTheElder
Copy link
Member

BenTheElder commented Jul 8, 2024

@BenTheElder BenTheElder self-assigned this Jul 8, 2024
@ameukam
Copy link
Member Author

ameukam commented Jul 10, 2024

Sorry for the late response. I can confirm that git-custom-k8s-auth is used by prow to authenticate to non-GKE clusters (currently it's only EKS)

@upodroid
Copy link
Member

https://github.com/kubernetes-sigs/prow/blob/main/.ko.yaml

+1 for building a unified base image for prow that has git, the kubectl auth plugins for our cloud vendors

@upodroid
Copy link
Member

We can migrate that job to the community cluster and update the .ko.yaml references

@BenTheElder
Copy link
Member

We can do something similar to the distroless-iptables image in k/release.

@BenTheElder
Copy link
Member

tempelis will be done after #32946

michelle192837 added a commit to michelle192837/prow that referenced this issue Jul 26, 2024
Switch the image bases to use those built in k8s-staging-test-infra
instead. Ref kubernetes/test-infra#32432.
@michelle192837
Copy link
Contributor

TestGrid upload progress:

# See https://github.com/GoogleCloudPlatform/testgrid/tree/main/config/print#config-printer for the print utility.
~/go/bin/print gs://k8s-testgrid/configs/k8s/config > k8s-testgrid-config.textproto
~/go/bin/print gs://k8s-testgrid-config/k8s/config > k8s-infra-testgrid-config.textproto

diff k8s-testgrid-config.textproto k8s-infra-testgrid-config.textproto
# This produces no diffs

(And these do have contents):

wc -l k8s-testgrid-config.textproto 
519759 k8s-testgrid-config.textproto

wc -l k8s-infra-testgrid-config.textproto 
519759 k8s-infra-testgrid-config.textproto

Now following the config merger instructions at https://github.com/kubernetes/test-infra/blob/master/testgrid/merging.md#config-merger. I'll have a few PRs out for those.

@michelle192837
Copy link
Contributor

Remaining from my list above:

File Path Job Link Uses
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-alpine Search Results Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-git Search Results Search Results
config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-push-misc-images Search Results Search Results

post-test-infra-push-alpine just needs minor cleanup, then it can be deleted.
post-test-infra-push-git can probably be deleted; the remaining use of it is as the base for certain Prow images. I can't switch them over immediately (integration tests fail when switching from the January image to a recent July image), but I believe switching to an image from the old location will have the same problem.
post-test-infra-push-misc-images needs a fix (I think the most recent PR will fix it, but it needs a retrigger to verify that's the case), then the images need to be switched to the new location before the old job is turned down.

(And last bit of cleanup, move all the new image push jobs to the image-pushes dashboard and remove '-canary' from the job name).

@michelle192837
Copy link
Contributor

post-test-infra-push-misc-images technically passes, but it doesn't seem to be uploading new images? (I think the same is happening for the new prow images push, which does something similar.)

post-test-infra-push-alpine and post-test-infra-push-git I think we can delete for the reasoning above. The minor cleanup isn't blocking removing the old jobs.

@michelle192837
Copy link
Contributor

lol I lied, the misc-image canary is working fine. I'll switch those uses over today.

I'm still not seeing new Prow images uploaded to the new location though. (https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/post-k8s-infra-prow-images/1818232059856949248, https://storage.googleapis.com/kubernetes-jenkins/logs/post-k8s-infra-prow-images/1818232059856949248/artifacts/build.log for the build log). Since it's doing something similar to the misc-images push job, I might update it to be similar and see if that fixes it.

@michelle192837
Copy link
Contributor

Sorry about the confusion, the Prow images job has been working the whole time and I was just confused. (More detail in kubernetes-sigs/prow#217 (comment)).

Anyways, remaining updates are:

  • Switch Prow images to use the new location
  • Remove the old misc-images job
  • Remove the old prow images job
  • Remove '-canary' from the new image push jobs

I'll leave submission of those to Monday, but those should handle the last test-infra jobs that I think we're actually handling?

@BenTheElder
Copy link
Member

secrets path job
[] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-gencred-refresh-kubeconfig
[] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-rotate-legacy-default-build-sa-json-key
[] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-deploy-prow
[] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-gencred-refresh-kubeconfig
[kubeconfig-prow-services oauth-token] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml post-test-infra-reconcile-hmacs
[oauth-token k8s-ci-robot-ssh-keys] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-autobump-prow
[oauth-token k8s-ci-robot-ssh-keys] config/jobs/kubernetes/test-infra/test-infra-trusted.yaml ci-test-infra-autobump-prow-for-auto-deploy

Of those, I think we might need reconcile-hmacs to move along with the new prow deployment?

Otherwise I think rest should probably be spun down just ahead of migrating prow, and remain in the meantime to keep the legacy instance humming.

#33129 covers the janitor jobs.

@BenTheElder
Copy link
Member

We only have these six left now:

  • ci-test-infra-gencred-refresh-kubeconfig
    • Decision: keep until we're ready to migrate prow control plane, job will not migrate (we will use workload identity and new kubeconfigs cc @upodroid to confirm)
  • post-test-infra-deploy-prow
    • Decision: keep until we're ready to migrate prow control plane, job will not migrate (we will use something else?? argoCD?)
  • post-test-infra-gencred-refresh-kubeconfig
    • Decision: I think we can turn this down now? it is ~redundant to the periodic, otherwise same decision as ci-test-infra-gencred-refresh-kubeconfig
  • post-test-infra-reconcile-hmacs
    • Decision: keep until we're ready to migrate prow control plane, job will not migrate. (cc @cjwagner to confirm)
  • ci-test-infra-autobump-prow
    • Decision: keep until we're ready to migrate prow control plane, job will not migrate (we'll redo this against k8s.io?)
  • ci-test-infra-autobump-prow-for-auto-deploy
    • Decision: Decision: keep until we're ready to migrate prow control plane, job will not migrate (we'll redo this against k8s.io?)

@BenTheElder BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Aug 13, 2024
@cjwagner
Copy link
Member

post-test-infra-reconcile-hmacs

  • Decision: keep until we're ready to migrate prow control plane, job will not migrate. (cc @cjwagner to confirm)

Yes that does not need to migrate assuming that the K8s-Infra Prow is using a GitHub App to manage webhooks (rather than manually configuring them per org or repo) . IIRC someone confirmed this in the last SIG-Testing meeting.

The other decisions SGTM as well.

@michelle192837
Copy link
Contributor

Now done thanks to Ben: #33352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

6 participants