Migrate triage dashboard to wg-k8s-infra #1305

spiffxp · 2020-10-06T20:37:26Z

Part of migrating away from gcp-project k8s-gubernator: #1308

Triage is made up of a few components

periodic jobs run via prow.k8s.io - these should be changed to run on k8s-infra-prow-build
data files stored in gs://k8s-gubernator/triage - these should be migrated to a new k8s-infra owned GCS bucket
a static site hosted in gs://k8s-gubernator/triage - ditto
a http://go.k8s.io/triage redirect that points to the static site - this should be updated to point to the new location

Where should we move things to? My suggestions:

gcp project: k8s-gubernator -> kubernetes-public
gcs bucket: gs://k8s-gubernator -> gs://k8s-triage

One other wrinkle: visiting https://go.k8s.io/triage redirects to https://storage.googleapis.com/k8s-gubernator/triage/index.html which exposes the bucket. If we change to use a new bucket, we're probably going to break a lot of existing URI's. Ideally we can serve a 301 redirect pointing to the new location. Bonus points if we could mask out the bucket name (eg: have triage.k8s.io be our location)

fejta-bot · 2021-01-04T21:36:51Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

ameukam · 2021-01-08T19:11:00Z

/remove-lifecycle stale

spiffxp · 2021-01-21T19:29:28Z

/assign @spiffxp
/milestone v1.21

fejta-bot · 2021-04-22T19:20:35Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

ameukam · 2021-04-22T19:44:35Z

/remove-lifecycle stale
/milestone clear

fejta-bot · 2021-07-21T19:46:59Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

spiffxp · 2021-07-21T20:28:00Z

/remove lifecycle-stale
/milestone v1.23

spiffxp · 2021-08-03T21:58:04Z

Ahh right, I had hopes this would be just as simple as gs://k8s-metrics, but it's not.

gs://k8s-gubernator/triage is the path for triage, so gs://k8s-gubernator is the bucket to be migrated

The problem is it's much more complicated because it uses ACLs, so there's no guarantee a delete/recreate as I did last time would recreate all ACLs (and whether there are google-internal things hidden in there)

$ gsutil ubla get gs://k8s-gubernator
'NoneType' object has no attribute 'bucketPolicyOnly'

$ gsutil iam get gs://k8s-gubernator | yq -y
bindings:
  - members:
      - projectEditor:k8s-gubernator
      - projectOwner:k8s-gubernator
      - serviceAccount:p595166511048-012700@gcp-sa-logging.iam.gserviceaccount.com
    role: roles/storage.legacyBucketOwner
  - members:
      - projectViewer:k8s-gubernator
    role: roles/storage.legacyBucketReader
  - members:
      - serviceAccount:pusher@k8s-prow.iam.gserviceaccount.com
    role: roles/storage.objectAdmin
  - members:
      - serviceAccount:pusher@k8s-prow.iam.gserviceaccount.com
    role: roles/storage.objectCreator
  - members:
      - serviceAccount:pusher@k8s-prow.iam.gserviceaccount.com
    role: roles/storage.objectViewer
etag: CAQ=

I'm going to at least start with:

syncing the contents of gs://k8s-gubernator/triage to gs://k8s-project-triage
looking at what k8s.io redirects or reverse-proxy could do for us
looking at what a 301 redirect from the old triage link would do for us

spiffxp · 2021-08-04T15:04:06Z

I toyed around with trying to create gs://triage.k8s.io so as to be able to serve content directly from GCS vs. having to proxy through nginx. It's requiring that I verify domain ownership. Which I feel like we have done, the question is who owns it?

spiffxp · 2021-08-04T18:26:19Z

For now I'm opting to:

s/k8s-project-triage/k8s-triage
setup a canary triage job in k8s-infra to run to populate gs://k8s-triage (config/jobs: setup canary triage job, mv other triage jobs test-infra#23126)
mv the "push static files" job to k8s-infra, and update to push to gs://k8s-triage (config/jobs: setup canary triage job, mv other triage jobs test-infra#23126)
mv the "push triage image" job to k8s-staging-test-infra (config/jobs: setup canary triage job, mv other triage jobs test-infra#23126)
switch from using k8s-testimage/triage to k8s-staging-test-infra/triage
setup a redirect index.html in gs://k8s-gubernator/triage that will javascript-redirect (preserving querystring + hash) to gs://k8s-triage/index.html
- I experimented with a redirect from gs://k8s-triage back to gubernator
- The redirect (with the buckets swapped) is in config/jobs: setup canary triage job, mv other triage jobs test-infra#23126 but not yet auto-deployed
verify all still works
cleanup (e.g. remove all triage folders from gs://k8s-gubernator)

spiffxp · 2021-08-04T19:25:17Z

PR's in-flight:

spiffxp · 2021-08-06T15:11:11Z

The canary job is refusing to schedule: https://testgrid.k8s.io/wg-k8s-infra-canaries#triage

So:

change the prow-build-trusted nodepool to match the prow-build nodepool config (except autoscaling settings): prow: match prow-build-trusted nodepool to prow-build #2479
update the job to ask for slightly fewer resources: config/jobs: misc fixes for k8s-infra triage jobs test-infra#23154

spiffxp · 2021-08-06T19:42:22Z

For some reason the triage image isn't in staging like I would have expected with kubernetes/test-infra#23126

$ gcloud container images list --project=k8s-staging-test-infra
NAME
gcr.io/k8s-staging-test-infra/alpine
gcr.io/k8s-staging-test-infra/docker-buildx
gcr.io/k8s-staging-test-infra/gcb-docker-gcloud
gcr.io/k8s-staging-test-infra/git

spiffxp · 2021-08-06T22:00:50Z

... that would be because nothing has landed that would trigger the push to staging after the job config was updated

I was originally going to make a dummy change just to push a new image, but in grepping for k8s-gubernator in the triage directory I realized it's still hardcoded in the update_summaries.sh script, to the canary was bound to fail trying to use a project it wasn't authorized to

Next steps:

PR to add env var based configuration for projects, buckets, and datasets: triage: prepare for alternate projects, datasets, buckets test-infra#23168
See how far the canary job gets
Look into provisioning a bq dataset for the canary job to write temp results to

spiffxp · 2021-08-16T15:27:21Z

We are very nearly done now:

fix static upload job to work with UBLA bucket, fix gcr.io/k8s-staging-test-infra/triage image build: triage: fix postsubmits test-infra#23180
write to a community-owned bigquery dataset: config/jobs: update ci-test-infra-triage-canary test-infra#23235
fix triage job to work with UBLA bucket: triage: don't set ACLs when uploading test-infra#23256
point go.k8s.io/triage to gs://k8s-triage: apps/k8s.io: flip go.k8s.io/triage to community-owned bucket #2544
drop old google.com job: config/jobs: use community-owned infra for triage test-infra#23258

spiffxp · 2021-08-16T15:29:05Z

When the final PR merges I will:

go through and find some old triage links to make sure the redirect appropriately
validate the triage job shows on https://testgrid.k8s.io/wg-k8s-infra-prow

spiffxp · 2021-08-16T18:30:45Z

Will close after one last PR:

Pushing to gs://k8s-gubernator can't run from community, and do need the public_read ACL since the bucket as a whole can't be public read
I manually fixed and ran the appropriate make commands
PR with the changes: triage: fix k8s-gubernator triage redirect test-infra#23261

Arbitrary old link I verified the redirect with:

spiffxp · 2021-08-16T18:32:32Z

I'm a bit hesitant to shout "success!" because kettle being down (kubernetes/test-infra#23135) makes triage look broken, but it's just as broken as it was before migration. We're reading and clustering data on community infra, but there's no new data to cluster right now...

spiffxp · 2021-08-16T19:05:29Z

/close
Calling this done!

k8s-ci-robot · 2021-08-16T19:05:42Z

@spiffxp: Closing this issue.

In response to this:

/close
Calling this done!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

spiffxp mentioned this issue Oct 6, 2020

Migrate k8s-gubernator:builds dataset to wg-k8s-infra #1307

Closed

spiffxp added wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 6, 2020

spiffxp mentioned this issue Oct 6, 2020

Migrate away from google.com gcp project k8s-gubernator #1308

Closed

7 tasks

spiffxp added this to Needs Triage in sig-k8s-infra via automation Oct 6, 2020

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2021

spiffxp mentioned this issue Jan 19, 2021

Added guide for monitoring CI kubernetes/community#5244

Closed

spiffxp moved this from Needs Triage to Backlog in sig-k8s-infra Jan 20, 2021

spiffxp moved this from Backlog to Backlog (infra to migrate) in sig-k8s-infra Jan 20, 2021

k8s-ci-robot assigned spiffxp Jan 21, 2021

k8s-ci-robot added this to the v1.21 milestone Jan 21, 2021

spiffxp added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 22, 2021

spiffxp added this to To Triage in sig-testing issues Feb 9, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021

k8s-ci-robot removed this from the v1.21 milestone Apr 22, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021

k8s-ci-robot added this to the v1.23 milestone Jul 21, 2021

ameukam removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021

spiffxp moved this from To Triage to Backlog in sig-testing issues Jul 27, 2021

spiffxp mentioned this issue Aug 3, 2021

infra/gcp: setup k8s-project-triage special-case #2454

Merged

spiffxp moved this from Backlog (infra to migrate) to In Progress in sig-k8s-infra Aug 3, 2021

spiffxp moved this from Backlog to In Progress in sig-testing issues Aug 3, 2021

This was referenced Aug 4, 2021

config/jobs: setup canary triage job, mv other triage jobs kubernetes/test-infra#23126

Merged

Migrate k8s-triage, k8s-metrics to terraform, setup k8s-triage dataset #2461

Merged

spiffxp mentioned this issue Aug 6, 2021

config/jobs: misc fixes for k8s-infra triage jobs kubernetes/test-infra#23154

Merged

spiffxp mentioned this issue Aug 6, 2021

triage: prepare for alternate projects, datasets, buckets kubernetes/test-infra#23168

Merged

This was referenced Aug 16, 2021

apps/k8s-io: fix go.k8s.io/triage test #2547

Merged

triage: fix k8s-gubernator triage redirect kubernetes/test-infra#23261

Merged

k8s-ci-robot closed this as completed Aug 16, 2021

sig-k8s-infra automation moved this from In Progress to Done Aug 16, 2021

sig-testing issues automation moved this from In Progress to Done Aug 16, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Migrate triage dashboard to wg-k8s-infra #1305

Migrate triage dashboard to wg-k8s-infra #1305

spiffxp commented Oct 6, 2020 •

edited

Loading

fejta-bot commented Jan 4, 2021

ameukam commented Jan 8, 2021

spiffxp commented Jan 21, 2021

fejta-bot commented Apr 22, 2021

ameukam commented Apr 22, 2021

fejta-bot commented Jul 21, 2021

spiffxp commented Jul 21, 2021

spiffxp commented Aug 3, 2021

spiffxp commented Aug 4, 2021

spiffxp commented Aug 4, 2021 •

edited

Loading

spiffxp commented Aug 4, 2021

spiffxp commented Aug 6, 2021

spiffxp commented Aug 6, 2021

spiffxp commented Aug 6, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

k8s-ci-robot commented Aug 16, 2021

Migrate triage dashboard to wg-k8s-infra #1305

Migrate triage dashboard to wg-k8s-infra #1305

Comments

spiffxp commented Oct 6, 2020 • edited Loading

fejta-bot commented Jan 4, 2021

ameukam commented Jan 8, 2021

spiffxp commented Jan 21, 2021

fejta-bot commented Apr 22, 2021

ameukam commented Apr 22, 2021

fejta-bot commented Jul 21, 2021

spiffxp commented Jul 21, 2021

spiffxp commented Aug 3, 2021

spiffxp commented Aug 4, 2021

spiffxp commented Aug 4, 2021 • edited Loading

spiffxp commented Aug 4, 2021

spiffxp commented Aug 6, 2021

spiffxp commented Aug 6, 2021

spiffxp commented Aug 6, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

spiffxp commented Aug 16, 2021

k8s-ci-robot commented Aug 16, 2021

spiffxp commented Oct 6, 2020 •

edited

Loading

spiffxp commented Aug 4, 2021 •

edited

Loading