Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cleanup usage of kubernetes-release-pull in kubernetes presubmits #18789

Closed
amwat opened this issue Aug 11, 2020 · 35 comments
Closed

Cleanup usage of kubernetes-release-pull in kubernetes presubmits #18789

amwat opened this issue Aug 11, 2020 · 35 comments
Assignees
Labels
area/jobs help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@amwat
Copy link
Contributor

amwat commented Aug 11, 2020

What should be cleaned up or changed:
We stage builds to gs://kubernetes-release-pull in almost every presubmit job.
But from what I can tell nothing is actually consuming those builds since the jobs also use extract=local .
It's a non-trivial overhead to upload the release tars in every presubmit and we should remove all the non-required usages.

Provide any links for context:
https://cs.k8s.io/?q=kubernetes-release-pull&i=nope&files=&repos=

case local:
url := util.K8s("kubernetes", "_output", "gcs-stage")
files, err := ioutil.ReadDir(url)
if err != nil {
return err
}
var release string
for _, file := range files {
r := file.Name()
if strings.HasPrefix(r, "v") {
release = r
break
}
}
if len(release) == 0 {
return fmt.Errorf("No releases found in %v", url)
}
return getKube(fmt.Sprintf("file://%s", url), release, extractSrc)

Random GCE provider job: https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-gce/1293275406807339008#1:build-log.txt%3A903

/cc @spiffxp @BenTheElder @MushuEE


EDIT(@spiffxp): I made a list of the offending jobs going off the criteria --extract=local and --stage=gs://kubernetes-release-pull/*

  • if the job triggers for a single branch it's labeled as job@branch
  • if the job triggers for all branches it's labeled as job
  • there are no presubmits that trigger for N branches (where all > N > 1)
  • there are no periodics or postsubmits that touch gs://kubernetes-release-pull
  • this picks up some --provider=aws jobs (kops), it remains to be seen whether they need --stage or not

EDIT(@BenTheElder): I removed the outdated checklist and instead i'm going to provide a search: https://github.com/search?q=repo%3Akubernetes%2Ftest-infra+%22--stage%3Dgs%3A%2F%2Fkubernetes-release-pull%22&type=code

@amwat amwat added the kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. label Aug 11, 2020
@BenTheElder
Copy link
Member

we should test this in a canary just because this stuff is old and brittle and I can't remember why we were doing this anymore 🙃

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 26, 2020
@fejta-bot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Dec 26, 2020
@BenTheElder
Copy link
Member

still worth doing?

@spiffxp
Copy link
Member

spiffxp commented Jan 8, 2021

/remove-lifecycle rotten
I think so. The other option is to continue as-is, meaning jobs that use this bucket need to switch to use k8s-release-pull as they migrate to k8s-infra.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jan 8, 2021
@amwat
Copy link
Contributor Author

amwat commented Jan 8, 2021

sadly looks like those gcs links have been gced.
seems like one of the steps involved as part of --stage is copying the artifacts from the bazel output path to the make output path _output/gcs-stage and then uploading them to gcs.

and our presubmit jobs are configured to --extract=local instead of --extract=bazel while using --build=bazel
so they were relying on them being in the make output path.
https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes/sig-cloud-provider/gcp/gcp-gce.yaml#L48

testing out in the canary job: #20427

@spiffxp
Copy link
Member

spiffxp commented Jan 21, 2021

/milestone v1.21
/sig testing
/wg-k8s-infra

@k8s-ci-robot k8s-ci-robot added the sig/testing Categorizes an issue or PR as relevant to SIG Testing. label Jan 21, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Jan 21, 2021
@amwat
Copy link
Contributor Author

amwat commented Jan 21, 2021

We have a succesful run at https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-gce-no-stage/1352340847076577280

not sure why the total test duration is higher as compared to
https://prow.k8s.io/view/gcs/kubernetes-jenkins/pr-logs/directory/pull-kubernetes-e2e-gce/1351850610516824064

but we atleast saved 154 seconds of stage time (which should be the only delta here)

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/92316/pull-kubernetes-e2e-gce-no-stage/1352340847076577280/artifacts/junit_runner.xml

as compared to

https://storage.googleapis.com/kubernetes-jenkins/pr-logs/pull/97894/pull-kubernetes-e2e-gce/1351850610516824064/artifacts/junit_runner.xml

and 1.84 GiB of unnecessary GCS uploads

$ gsutil du -sh gs://kubernetes-release-pull/ci/pull-kubernetes-e2e-gce/v1.18.16-rc.0.3+9f5c61d324a62b
1.84 GiB     gs://kubernetes-release-pull/ci/pull-kubernetes-e2e-gce/v1.18.16-rc.0.3+9f5c61d324a62b

@spiffxp spiffxp added this to Backlog (infra to migrate) in sig-k8s-infra Jan 22, 2021
@spiffxp
Copy link
Member

spiffxp commented Jan 22, 2021

/priority important-soon

@k8s-ci-robot k8s-ci-robot added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Jan 22, 2021
@spiffxp
Copy link
Member

spiffxp commented Jan 26, 2021

/assign @amwat @spiffxp
Assigning to us for now. If we think this is eligible for /help or don't have time to do it ourselves we can writeup how to proceed

@jbpratt
Copy link
Contributor

jbpratt commented Oct 23, 2022

Based on #22892 (comment) (and the changes being reverted), how should we proceed with this? I started working through this and realized I was re-doing @spiffxp's changes 😄

@SD-13
Copy link
Contributor

SD-13 commented Dec 9, 2023

Hi, I am interested to work on this issue but I have some questions or queries.

  1. Seems like the list of jobs to be fixed is outdated
  2. Please help me to understand the fix we need to follow here
    I don't think the fix @amwat mentioned here here don't apply because we are not using bazel to build.
    So as discussed here config/jobs: run no-stage on k8s-infra, drop extract #24238 (comment), can we now remove extract and stage?
    Please feel free to correct me if I am wrong

cc @spiffxp @BenTheElder @ameukam

@BenTheElder
Copy link
Member

Sorry, a couple of the people you pinged don't work on tthis anymore and I'm kinda buried.

I've lost context on this one.

@BenTheElder
Copy link
Member

I'm not sure we ever got no-stage working? It's hard to follow at this point.

@BenTheElder
Copy link
Member

#28176 renamed the test job, testing in kubernetes/kubernetes#126563

@BenTheElder
Copy link
Member

It does, it will stage to a generated bucket under the rented boskos project (which the boskos janitors should clean up if they don't already), so we can carefully start dropping these I think ... very belatedly.

@BenTheElder
Copy link
Member

beginning bulk migration in #33259, starting with a subset of optional, non-blocking, not always_run jobs

We have to drop both --extract=local and --stage at the same time. We don't need to locally extract what we just built, it's running fine and uploading to a bucket under the boskos project.

You can see sample runs in kubernetes/kubernetes#126563

Inspect these logs:
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126563/pull-kubernetes-e2e-gce-cos-no-stage/1820909457454927872
https://prow.k8s.io/view/gs/kubernetes-jenkins/pr-logs/pull/126563/pull-kubernetes-e2e-gce-pull-through-cache/1821254221408768000

@BenTheElder
Copy link
Member

If anyone wants to help:

  • Break these up into easily reverted commits
  • Make sure to drop both extract=local and stage at the same time
  • Make sure the jobs you're touching are not required for merge, we'll do those last
  • You must agree to follow-up to make sure you didn't break anything shortly after merging these, and definitely before doing any more. If you're not confident in / familiar with this part, I'd ask that you select a different issue, we need to get this sorted out as part of migrating to the community infra in the immediate future but we don't want to break CI especially at this point in the release cycle, and I expect to be done before we get to the safer periods in the release cycle.

NOTE: spiffxp and amwat don't work on Kubernetes anymore. I'm taking over this problem.

@BenTheElder
Copy link
Member

#33278 does everything but the one remaining PR blocking job, for which we'll wait a bit and check some more things

@BenTheElder
Copy link
Member

Once we have test results we can do #33280, and then I'll delete the bucket

@BenTheElder
Copy link
Member

This is done, I just need to follow-up with eliminating that bucket.

@BenTheElder
Copy link
Member

Done!

sig-k8s-infra automation moved this from Backlog (infra to migrate) to Done Aug 13, 2024
sig-testing issues automation moved this from Backlog to Done Aug 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/jobs help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/cleanup Categorizes issue or PR as related to cleaning up code, process, or technical debt. lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
sig-k8s-infra
  
Done
Development

No branches or pull requests

8 participants