Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci-kubernetes-e2e-gce-scale-performance is continuously testing the same, stale k8s version since 10-29 #19838

Closed
mborsz opened this issue Nov 4, 2020 · 9 comments
Assignees
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing.

Comments

@mborsz
Copy link
Member

mborsz commented Nov 4, 2020

What happened:
Starting from
https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1321859659652403200

all ci-kubernetes-e2e-gce-scale-performance runs are using stale k8s version v1.20.0-beta.0.54+2729b8e3751434.

Moreover, the commit number on https://k8s-testgrid.appspot.com/sig-scalability-gce#gce-master-scale-performance seems to be changing and it doesn't match the actual version that is being tested:
image

What you expected to happen:

How to reproduce it (as minimally and precisely as possible):

Please provide links to example occurrences, if any:
e.g. the latest run https://prow.k8s.io/view/gcs/kubernetes-jenkins/logs/ci-kubernetes-e2e-gce-scale-performance/1323762317799723008
Anything else we need to know?:

#19660 is most likely a culprit which moves fast builds from kubernetes-release-dev which is used by the job to k8s-release-dev.

/cc @mm4tt
/cc @wojtek-t
/cc @cpanato
/cc @justaugustus

@mborsz mborsz added the kind/bug Categorizes issue or PR as related to a bug. label Nov 4, 2020
@mborsz
Copy link
Member Author

mborsz commented Nov 4, 2020

Yes, this is #19660 for sure which migrates the test builds to k8s-release-dev. The question is if we should switch our jobs to k8s-release-dev or rollback that change? I don't think I have enough context to make that decision so I asked this question in #19660.

@mborsz
Copy link
Member Author

mborsz commented Nov 4, 2020

#19839 for showing right version on testgrid

@mm4tt
Copy link
Contributor

mm4tt commented Nov 4, 2020

Thanks, Maciek. Great finding!

We discovered only by a sheer luck. If Maciek wasn't debugging some other issue today, we might have missed that change for weeks and it would render our scale tests useless.
@justaugustus, can we do something to make sure it doesn't happen again? For example, could you let us know ever time you're making this kind of change to the build pipeline?

@wojtek-t
Copy link
Member

wojtek-t commented Nov 4, 2020

/assign @justaugustus

Stephen - as you mentioned on slack - I'm assigning it to you]
However, my suspicion would be that it's not only about scalability tests - it's probably touching many more tests.

@justaugustus
Copy link
Member

PR opened to revert the change: #19841
Slack discussion here: https://kubernetes.slack.com/archives/C2C40FMNF/p1604488277091500

@justaugustus
Copy link
Member

FYI @kubernetes/ci-signal

@wojtek-t
Copy link
Member

wojtek-t commented Nov 5, 2020

It's fixed at least for scalability tests - closing. Thanks for fixing Stephen!

/close

@k8s-ci-robot
Copy link
Contributor

@wojtek-t: Closing this issue.

In response to this:

It's fixed at least for scalability tests - closing. Thanks for fixing Stephen!

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@justaugustus
Copy link
Member

Thanks for reporting back on your side, @wojtek-t!

For completeness, dropping a snippet of the PR description in from #19841:

As we continue to migrate release-blocking jobs to a dedicated K8s Infra
cluster, jobs that use the latest-fast marker need to extract builds
from gs://k8s-release-dev, which is the K8s Infra equivalent of
gs://kubernetes-release-dev.

A new flag (--extract-ci-bucket=k8s-release-dev) was added to support
this transitional use case, so we employ it here.

This is part of migrating release-blocking jobs to K8s Infra (ref: #19484, #18549).

I'll plan to send a note out to the broader community early next week (by then, the remaining changes should have died down).

/sig release testing scalability
/area release-eng

@k8s-ci-robot k8s-ci-robot added sig/release Categorizes an issue or PR as relevant to SIG Release. sig/testing Categorizes an issue or PR as relevant to SIG Testing. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. area/release-eng Issues or PRs related to the Release Engineering subproject labels Nov 5, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/release-eng Issues or PRs related to the Release Engineering subproject kind/bug Categorizes issue or PR as related to a bug. sig/release Categorizes an issue or PR as relevant to SIG Release. sig/scalability Categorizes an issue or PR as relevant to SIG Scalability. sig/testing Categorizes an issue or PR as relevant to SIG Testing.
Projects
None yet
Development

No branches or pull requests

5 participants