Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Migrate triage dashboard to wg-k8s-infra #1305

Closed
spiffxp opened this issue Oct 6, 2020 · 20 comments
Closed

Migrate triage dashboard to wg-k8s-infra #1305

spiffxp opened this issue Oct 6, 2020 · 20 comments
Assignees
Labels
priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Milestone

Comments

@spiffxp
Copy link
Member

spiffxp commented Oct 6, 2020

Part of migrating away from gcp-project k8s-gubernator: #1308

Triage is made up of a few components

  • periodic jobs run via prow.k8s.io - these should be changed to run on k8s-infra-prow-build
  • data files stored in gs://k8s-gubernator/triage - these should be migrated to a new k8s-infra owned GCS bucket
  • a static site hosted in gs://k8s-gubernator/triage - ditto
  • a http://go.k8s.io/triage redirect that points to the static site - this should be updated to point to the new location

Where should we move things to? My suggestions:

  • gcp project: k8s-gubernator -> kubernetes-public
  • gcs bucket: gs://k8s-gubernator -> gs://k8s-triage

One other wrinkle: visiting https://go.k8s.io/triage redirects to https://storage.googleapis.com/k8s-gubernator/triage/index.html which exposes the bucket. If we change to use a new bucket, we're probably going to break a lot of existing URI's. Ideally we can serve a 301 redirect pointing to the new location. Bonus points if we could mask out the bucket name (eg: have triage.k8s.io be our location)

@spiffxp spiffxp added wg/k8s-infra sig/testing Categorizes an issue or PR as relevant to SIG Testing. labels Oct 6, 2020
@spiffxp spiffxp added this to Needs Triage in sig-k8s-infra via automation Oct 6, 2020
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 4, 2021
@ameukam
Copy link
Member

ameukam commented Jan 8, 2021

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 8, 2021
@spiffxp spiffxp moved this from Needs Triage to Backlog in sig-k8s-infra Jan 20, 2021
@spiffxp spiffxp moved this from Backlog to Backlog (infra to migrate) in sig-k8s-infra Jan 20, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Jan 21, 2021

/assign @spiffxp
/milestone v1.21

@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 21, 2021
@spiffxp spiffxp added the priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. label Jan 22, 2021
@spiffxp spiffxp added this to To Triage in sig-testing issues Feb 9, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@ameukam
Copy link
Member

ameukam commented Apr 22, 2021

/remove-lifecycle stale
/milestone clear

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 22, 2021
@k8s-ci-robot k8s-ci-robot removed this from the v1.21 milestone Apr 22, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Jul 21, 2021

/remove lifecycle-stale
/milestone v1.23

@k8s-ci-robot k8s-ci-robot added this to the v1.23 milestone Jul 21, 2021
@ameukam ameukam removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 21, 2021
@spiffxp spiffxp moved this from To Triage to Backlog in sig-testing issues Jul 27, 2021
@spiffxp spiffxp moved this from Backlog (infra to migrate) to In Progress in sig-k8s-infra Aug 3, 2021
@spiffxp spiffxp moved this from Backlog to In Progress in sig-testing issues Aug 3, 2021
@spiffxp
Copy link
Member Author

spiffxp commented Aug 3, 2021

Ahh right, I had hopes this would be just as simple as gs://k8s-metrics, but it's not.

gs://k8s-gubernator/triage is the path for triage, so gs://k8s-gubernator is the bucket to be migrated

The problem is it's much more complicated because it uses ACLs, so there's no guarantee a delete/recreate as I did last time would recreate all ACLs (and whether there are google-internal things hidden in there)

$ gsutil ubla get gs://k8s-gubernator
'NoneType' object has no attribute 'bucketPolicyOnly'
$ gsutil iam get gs://k8s-gubernator | yq -y
bindings:
  - members:
      - projectEditor:k8s-gubernator
      - projectOwner:k8s-gubernator
      - serviceAccount:p595166511048-012700@gcp-sa-logging.iam.gserviceaccount.com
    role: roles/storage.legacyBucketOwner
  - members:
      - projectViewer:k8s-gubernator
    role: roles/storage.legacyBucketReader
  - members:
      - serviceAccount:pusher@k8s-prow.iam.gserviceaccount.com
    role: roles/storage.objectAdmin
  - members:
      - serviceAccount:pusher@k8s-prow.iam.gserviceaccount.com
    role: roles/storage.objectCreator
  - members:
      - serviceAccount:pusher@k8s-prow.iam.gserviceaccount.com
    role: roles/storage.objectViewer
etag: CAQ=

I'm going to at least start with:

  • syncing the contents of gs://k8s-gubernator/triage to gs://k8s-project-triage
  • looking at what k8s.io redirects or reverse-proxy could do for us
  • looking at what a 301 redirect from the old triage link would do for us

@spiffxp
Copy link
Member Author

spiffxp commented Aug 4, 2021

I toyed around with trying to create gs://triage.k8s.io so as to be able to serve content directly from GCS vs. having to proxy through nginx. It's requiring that I verify domain ownership. Which I feel like we have done, the question is who owns it?

@spiffxp
Copy link
Member Author

spiffxp commented Aug 4, 2021

For now I'm opting to:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 6, 2021

The canary job is refusing to schedule: https://testgrid.k8s.io/wg-k8s-infra-canaries#triage

So:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 6, 2021

For some reason the triage image isn't in staging like I would have expected with kubernetes/test-infra#23126

$ gcloud container images list --project=k8s-staging-test-infra
NAME
gcr.io/k8s-staging-test-infra/alpine
gcr.io/k8s-staging-test-infra/docker-buildx
gcr.io/k8s-staging-test-infra/gcb-docker-gcloud
gcr.io/k8s-staging-test-infra/git

@spiffxp
Copy link
Member Author

spiffxp commented Aug 6, 2021

... that would be because nothing has landed that would trigger the push to staging after the job config was updated

I was originally going to make a dummy change just to push a new image, but in grepping for k8s-gubernator in the triage directory I realized it's still hardcoded in the update_summaries.sh script, to the canary was bound to fail trying to use a project it wasn't authorized to

Next steps:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 16, 2021

We are very nearly done now:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 16, 2021

When the final PR merges I will:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 16, 2021

Will close after one last PR:

Arbitrary old link I verified the redirect with:

@spiffxp
Copy link
Member Author

spiffxp commented Aug 16, 2021

I'm a bit hesitant to shout "success!" because kettle being down (kubernetes/test-infra#23135) makes triage look broken, but it's just as broken as it was before migration. We're reading and clustering data on community infra, but there's no new data to cluster right now...

@spiffxp
Copy link
Member Author

spiffxp commented Aug 16, 2021

/close
Calling this done!

@k8s-ci-robot
Copy link
Contributor

@spiffxp: Closing this issue.

In response to this:

/close
Calling this done!

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

sig-k8s-infra automation moved this from In Progress to Done Aug 16, 2021
sig-testing issues automation moved this from In Progress to Done Aug 16, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
sig-k8s-infra
  
Done
Development

No branches or pull requests

4 participants