Delete jobs that have failed for at least the last 60 days in a row #2528

fejta · 2017-04-19T00:38:35Z

http://velodrome.k8s.io/dashboard/db/bigquery-metrics

Delete any job which:

Ran this week
Failed every run for the last 60 days.

Example jobs:

 "ci-kubernetes-e2e-gci-gce-examples": {
    "failing_days": 172
  },
  "ci-kubernetes-e2e-gce-examples": {
    "failing_days": 172
  },
  "ci-kubernetes-e2e-gce-latest-upgrade-cluster": {
    "failing_days": 165
  },
  "ci-kubernetes-e2e-gci-gke-pre-release": {
    "failing_days": 163
  },
  "ci-kubernetes-e2e-gke-pre-release": {
    "failing_days": 159
  },
  "ci-kubernetes-e2e-kops-aws-slow": {
    "failing_days": 146
  },
  "ci-kubernetes-e2e-kops-aws-serial": {
    "failing_days": 146
  },
  "ci-kubernetes-e2e-gke-stackdriver": {
    "failing_days": 94
  },
  "ci-kubernetes-e2e-ubuntu-gke-serial": {
    "failing_days": 80
  },
  "ci-kubernetes-e2e-ubuntu-gke-1-6-serial": {
    "failing_days": 80
  },
  "ci-kubernetes-e2e-ubuntu-gke-1-6-flaky": {
    "failing_days": 78
  },
  "ci-kubernetes-node-docker-benchmark": {
    "failing_days": 74
  },
  "ci-kubernetes-node-docker": {
    "failing_days": 74
  },
  "ci-kubernetes-node-kubelet-flaky": {
    "failing_days": 68
  },
  "ci-kubernetes-pull-gce-federation-deploy-canary": {
    "failing_days": 66
  },
  "ci-kubernetes-e2e-gce-gci-qa-serial-master": {
    "failing_days": 62
  },
  "pr:pull-kubernetes-e2e-kubeadm-gce": {
    "failing_days": 62
  },
  "ci-kubernetes-soak-gke-gci-test": {
    "failing_days": 61
  },
  "ci-kubernetes-e2e-gce-etcd3-release-1-5": {
    "failing_days": 60
  },
  "ci-kubernetes-soak-gke-test": {
    "failing_days": 60
  },
  "ci-kubernetes-e2e-kops-aws-canary": {
    "failing_days": 60
  },

Previous cleanup work: #2453

Current status: http://storage.googleapis.com/k8s-metrics/failures-latest.json

The text was updated successfully, but these errors were encountered:

rmmh · 2017-04-19T19:01:15Z

Some tests are ONLY run on flaky suites. Are we going to just stop running them?

I guess that's a general problem. We should graph the test matrix-- show all the tests we have defined and which jobs they have run on in the last week. That way we can find tests that never run and flag them for revival or deletion.

fejta · 2017-04-19T19:19:08Z

There are a bunch of flaky suites.

fejta · 2017-04-27T02:02:37Z

I want to do the following:

Run the minimum set of testing necessary to give us confidence in our releases
Run the maximum set of testing we have the ability to maintain.

Right now it seems like we are running more tests than we have the ability to maintain. Therefore I am deleting the tests that seem to be providing the least amount of marginal value (based on the fact that they never pass).

fejta · 2017-05-01T20:09:47Z

Will delete these tests in a couple weeks unless someone signs up to fix them: https://github.com/kubernetes/test-infra/blob/master/experiment/bigquery/failures-latest.json

pipejakob · 2017-05-01T20:26:13Z

Sign me up for pr:pull-kubernetes-e2e-kubeadm-gce. I have a WIP PR (#2509) to fix it.

fejta · 2017-05-01T20:27:39Z

Cool! And that one is a month away from the 60d mark anyway :)

fejta-bot · 2018-01-02T00:47:37Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

Prevent issues from auto-closing with an /lifecycle frozen comment.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or @fejta.
/lifecycle stale

BenTheElder · 2018-01-02T19:13:49Z

Heh, @fejta I think maybe we don't want the stale job issue to go stale :-)

BenTheElder · 2018-01-02T19:14:10Z

/remove-lifecycle stale

fejta-bot · 2018-04-28T08:16:05Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

fejta-bot · 2018-05-28T09:03:02Z

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle rotten
/remove-lifecycle stale

BenTheElder · 2018-05-29T20:40:38Z

/remove-lifecycle stale

/cc @mithrav @spiffxp @AishSundar
we should codify something like this and enact it to clean up jobs that have been failing for ridiculously long

fejta-bot · 2018-06-28T21:23:45Z

Rotten issues close after 30d of inactivity.
Reopen the issue with /reopen.
Mark the issue as fresh with /remove-lifecycle rotten.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/close

fejta self-assigned this Apr 19, 2017

fejta mentioned this issue Apr 20, 2017

Improve the coverage of examples kubernetes/client-go#128

Closed

11 tasks

fejta mentioned this issue May 11, 2017

Delete gce-1.4 to latest kubectl skew tests that never pass #2729

Merged

This was referenced Jul 31, 2017

Delete perma-failing examples e2e jobs #3799

Merged

Delete perma-failing pre-release jobs #3800

Merged

Delete slow, serial kops jobs #3801

Merged

Delete gce-latest-upgrade-cluster and gke-stackdriver #3802

Closed

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2018

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 2, 2018

fejta mentioned this issue Jan 28, 2018

Delete gke-test gci-gke-test #6503

Merged

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Apr 28, 2018

k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels May 28, 2018

k8s-ci-robot closed this as completed Jun 28, 2018

spiffxp mentioned this issue Jul 27, 2018

jobs: remove jobs that have been continuously failing for over N days #8861

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Delete jobs that have failed for at least the last 60 days in a row #2528

Delete jobs that have failed for at least the last 60 days in a row #2528

fejta commented Apr 19, 2017 •

edited

Loading

rmmh commented Apr 19, 2017

fejta commented Apr 19, 2017

fejta commented Apr 27, 2017

fejta commented May 1, 2017

pipejakob commented May 1, 2017

fejta commented May 1, 2017

fejta-bot commented Jan 2, 2018

BenTheElder commented Jan 2, 2018

BenTheElder commented Jan 2, 2018

fejta-bot commented Apr 28, 2018

fejta-bot commented May 28, 2018

BenTheElder commented May 29, 2018

fejta-bot commented Jun 28, 2018

Delete jobs that have failed for at least the last 60 days in a row #2528

Delete jobs that have failed for at least the last 60 days in a row #2528

Comments

fejta commented Apr 19, 2017 • edited Loading

rmmh commented Apr 19, 2017

fejta commented Apr 19, 2017

fejta commented Apr 27, 2017

fejta commented May 1, 2017

pipejakob commented May 1, 2017

fejta commented May 1, 2017

fejta-bot commented Jan 2, 2018

BenTheElder commented Jan 2, 2018

BenTheElder commented Jan 2, 2018

fejta-bot commented Apr 28, 2018

fejta-bot commented May 28, 2018

BenTheElder commented May 29, 2018

fejta-bot commented Jun 28, 2018

fejta commented Apr 19, 2017 •

edited

Loading