OCPBUGS-15255: Add terminated as a handled state so terminated instances don't get stuck #83

racheljpg · 2023-10-02T14:58:19Z

This is a PR to add terminated as a handled state, so if AWS terminates instances before they are ready the instance should no longer get stuck. Creating it as a draft PR to get input/test my changes.

openshift-ci · 2023-10-02T14:58:48Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

openshift-ci · 2023-10-02T15:00:43Z

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign joelspeed for approval. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

OWNERS

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2023-10-02T15:10:51Z

@racheljpg: This pull request references Jira Issue OCPBUGS-15255, which is invalid:

expected the bug to target the "4.15.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

This is a PR to add terminated as a handled state, so if AWS terminates instances before they are ready the instance should no longer get stuck. Creating it as a draft PR to get input/test my changes.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

JoelSpeed

Are there any unit tests that we can be updating to prove out this behaviour?

JoelSpeed · 2023-10-03T09:17:51Z

pkg/actuators/machine/reconciler.go

+	if len(existingInstances) == 0 || r.checkIfInstanceTerminated(existingInstances) {
 		if r.machine.Spec.ProviderID != nil && *r.machine.Spec.ProviderID != "" && len(r.machine.Status.Addresses) == 0 && (r.machine.Status.LastUpdated == nil || r.machine.Status.LastUpdated.Add(requeueAfterSeconds*time.Second).After(time.Now())) {
 			klog.Infof("%s: Possible eventual-consistency discrepancy; returning an error to requeue", r.machine.Name)
 			return false, &machinecontroller.RequeueAfterError{RequeueAfter: requeueAfterSeconds * time.Second}


This might not be what we want. I think there's probably a case where the instance is terminated, but the provider ID is set, without the addresses in place, which makes me think we can probably hit this eventual consistency issue.

Given the last updated check this is probably ok since we wait circa 20 seconds before giving up, but, I'd like to see this tested if we can

Thanks for your review Joel, from our testing we could see that it works how we hope. Will look into unit tests next

pkg/actuators/machine/reconciler.go

racheljpg · 2023-10-03T15:37:44Z

Hello @huali9, you are down as the QA contact for this bug :) Would you mind running this through the premerge tests? Thank you.

racheljpg · 2023-10-05T10:49:43Z

/retest

racheljpg · 2023-10-06T13:58:13Z

/retest

huali9 · 2023-10-08T05:34:16Z

Hello @huali9, you are down as the QA contact for this bug :) Would you mind running this through the premerge tests? Thank you.

Hi @racheljpg I tried to premerge test this today.
First I tried to reproduce the issue on 4.14.0-0.nightly-2023-10-06-234925
Steps:

Install an AWS Local Zone cluster, we use flexy template: ipi-on-aws/versioned-installer-local_zone-ovn-ci

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2023-10-06-234925   True        False         17m     Cluster version is 4.14.0-0.nightly-2023-10-06-234925
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-aws08a.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                             PHASE     TYPE          REGION      ZONE               AGE
huliu-aws08a-vmn2r-edge-us-east-1-bos-1a-x6jb2   Running   c5d.2xlarge   us-east-1   us-east-1-bos-1a   44m
huliu-aws08a-vmn2r-edge-us-east-1-mia-1a-wlqss   Running   m5.xlarge     us-east-1   us-east-1-mia-1a   44m
huliu-aws08a-vmn2r-master-0                      Running   m6i.xlarge    us-east-1   us-east-1a         48m
huliu-aws08a-vmn2r-master-1                      Running   m6i.xlarge    us-east-1   us-east-1b         48m
huliu-aws08a-vmn2r-master-2                      Running   m6i.xlarge    us-east-1   us-east-1c         48m
huliu-aws08a-vmn2r-worker-us-east-1a-5hfqj       Running   m6i.xlarge    us-east-1   us-east-1a         44m
huliu-aws08a-vmn2r-worker-us-east-1b-szbdb       Running   m6i.xlarge    us-east-1   us-east-1b         44m
huliu-aws08a-vmn2r-worker-us-east-1c-7wn5v       Running   m6i.xlarge    us-east-1   us-east-1c         44m
liuhuali@Lius-MacBook-Pro huali-test % oc get node
NAME                           STATUS   ROLES                  AGE   VERSION
ip-10-0-10-16.ec2.internal     Ready    control-plane,master   48m   v1.27.6+fd4d1f9
ip-10-0-18-10.ec2.internal     Ready    worker                 34m   v1.27.6+fd4d1f9
ip-10-0-196-5.ec2.internal     Ready    edge,worker            30m   v1.27.6+fd4d1f9
ip-10-0-209-234.ec2.internal   Ready    edge,worker            32m   v1.27.6+fd4d1f9
ip-10-0-28-136.ec2.internal    Ready    control-plane,master   48m   v1.27.6+fd4d1f9
ip-10-0-37-82.ec2.internal     Ready    worker                 34m   v1.27.6+fd4d1f9
ip-10-0-44-59.ec2.internal     Ready    control-plane,master   48m   v1.27.6+fd4d1f9
ip-10-0-7-96.ec2.internal      Ready    worker                 34m   v1.27.6+fd4d1f9

Copy a default local zone machineset, and change volumeSize to 16384, then create the new machineset. The new machine goto Provisioned.

liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws08a-vmn2r-edge-us-east-1-bos-1a -oyaml > ms2.yaml 
liuhuali@Lius-MacBook-Pro huali-test % vim ms2.yaml 
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms2.yaml 
machineset.machine.openshift.io/huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE         TYPE          REGION      ZONE               AGE
huliu-aws08a-vmn2r-edge-us-east-1-bos-1a-x6jb2    Running       c5d.2xlarge   us-east-1   us-east-1-bos-1a   72m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-5qsk2   Provisioned   c5d.2xlarge   us-east-1   us-east-1-bos-1a   10m
huliu-aws08a-vmn2r-edge-us-east-1-mia-1a-wlqss    Running       m5.xlarge     us-east-1   us-east-1-mia-1a   72m
huliu-aws08a-vmn2r-master-0                       Running       m6i.xlarge    us-east-1   us-east-1a         76m
huliu-aws08a-vmn2r-master-1                       Running       m6i.xlarge    us-east-1   us-east-1b         76m
huliu-aws08a-vmn2r-master-2                       Running       m6i.xlarge    us-east-1   us-east-1c         76m
huliu-aws08a-vmn2r-worker-us-east-1a-5hfqj        Running       m6i.xlarge    us-east-1   us-east-1a         72m
huliu-aws08a-vmn2r-worker-us-east-1b-szbdb        Running       m6i.xlarge    us-east-1   us-east-1b         72m
huliu-aws08a-vmn2r-worker-us-east-1c-7wn5v        Running       m6i.xlarge    us-east-1   us-east-1c         72m

3.Scale the machineset to replicas=5, I saw that the four new machines all Failed.

liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa --replicas=5
machineset.machine.openshift.io/huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE         TYPE          REGION      ZONE               AGE
huliu-aws08a-vmn2r-edge-us-east-1-bos-1a-x6jb2    Running       c5d.2xlarge   us-east-1   us-east-1-bos-1a   76m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-27bwt   Failed        c5d.2xlarge   us-east-1   us-east-1-bos-1a   3m55s
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-2mjsq   Failed        c5d.2xlarge   us-east-1   us-east-1-bos-1a   3m55s
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-5qsk2   Provisioned   c5d.2xlarge   us-east-1   us-east-1-bos-1a   14m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-l6vk8   Failed        c5d.2xlarge   us-east-1   us-east-1-bos-1a   3m55s
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-wc24s   Failed        c5d.2xlarge   us-east-1   us-east-1-bos-1a   3m55s
huliu-aws08a-vmn2r-edge-us-east-1-mia-1a-wlqss    Running       m5.xlarge     us-east-1   us-east-1-mia-1a   76m
huliu-aws08a-vmn2r-master-0                       Running       m6i.xlarge    us-east-1   us-east-1a         80m
huliu-aws08a-vmn2r-master-1                       Running       m6i.xlarge    us-east-1   us-east-1b         80m
huliu-aws08a-vmn2r-master-2                       Running       m6i.xlarge    us-east-1   us-east-1c         80m
huliu-aws08a-vmn2r-worker-us-east-1a-5hfqj        Running       m6i.xlarge    us-east-1   us-east-1a         76m
huliu-aws08a-vmn2r-worker-us-east-1b-szbdb        Running       m6i.xlarge    us-east-1   us-east-1b         76m
huliu-aws08a-vmn2r-worker-us-east-1c-7wn5v        Running       m6i.xlarge    us-east-1   us-east-1c         76m

4.Scale the machineset to replicas=15, I saw that some new machines goto Failed, some new machines stuck in Provisioning. I checked on AWS console, all the Failed and Provisioning machines shows Terminated with this message "Client.VolumeLimitExceeded: Volume limit exceeded. You have exceeded the maximum gp2 storage limit of 30720 GiB in this location for your account. Please contact AWS Support for more information."

liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa --replicas=15
machineset.machine.openshift.io/huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE          TYPE          REGION      ZONE               AGE
huliu-aws08a-vmn2r-edge-us-east-1-bos-1a-x6jb2    Running        c5d.2xlarge   us-east-1   us-east-1-bos-1a   3h38m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-22xnq   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-27bwt   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   145m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-2b9sq   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-2mjsq   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   145m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-5qsk2   Running        c5d.2xlarge   us-east-1   us-east-1-bos-1a   156m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-84nfj   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-dv8ps   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-l6vk8   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   145m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-m8lcd   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-mm79v   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-sbt2k   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-vb7fw   Provisioning                                                137m
huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-wc24s   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   145m
huliu-aws08a-vmn2r-edge-us-east-1-mia-1a-wlqss    Running        m5.xlarge     us-east-1   us-east-1-mia-1a   3h38m
huliu-aws08a-vmn2r-master-0                       Running        m6i.xlarge    us-east-1   us-east-1a         3h42m
huliu-aws08a-vmn2r-master-1                       Running        m6i.xlarge    us-east-1   us-east-1b         3h42m
huliu-aws08a-vmn2r-master-2                       Running        m6i.xlarge    us-east-1   us-east-1c         3h42m
huliu-aws08a-vmn2r-worker-us-east-1a-5hfqj        Running        m6i.xlarge    us-east-1   us-east-1a         3h38m
huliu-aws08a-vmn2r-worker-us-east-1b-szbdb        Running        m6i.xlarge    us-east-1   us-east-1b         3h38m
huliu-aws08a-vmn2r-worker-us-east-1c-7wn5v        Running        m6i.xlarge    us-east-1   us-east-1c         3h38m
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-7566555589-qhbrh -c machine-controller |grep huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h |grep "terminated"
W1008 02:47:40.690116       1 reconciler.go:481] huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h: Failed to find existing instance by id i-06f1ef3a18e07c0b1: instance i-06f1ef3a18e07c0b1 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E1008 02:47:40.759920       1 utils.go:236] Excluding instance matching huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h: instance i-06f1ef3a18e07c0b1 state "terminated" is not in running, pending, stopped, stopping, shutting-down
W1008 02:47:43.111268       1 reconciler.go:481] huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h: Failed to find existing instance by id i-06f1ef3a18e07c0b1: instance i-06f1ef3a18e07c0b1 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E1008 02:47:43.153129       1 utils.go:236] Excluding instance matching huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h: instance i-06f1ef3a18e07c0b1 state "terminated" is not in running, pending, stopped, stopping, shutting-down
W1008 02:47:44.952884       1 reconciler.go:481] huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h: Failed to find existing instance by id i-06f1ef3a18e07c0b1: instance i-06f1ef3a18e07c0b1 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E1008 02:47:45.002988       1 utils.go:236] Excluding instance matching huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h: instance i-06f1ef3a18e07c0b1 state "terminated" is not in running, pending, stopped, stopping, shutting-down
...
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-7566555589-qhbrh -c machine-controller |grep huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls |grep "terminated"
W1008 02:47:35.801230       1 reconciler.go:481] huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls: Failed to find existing instance by id i-0f0b9c47755aba1d2: instance i-0f0b9c47755aba1d2 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E1008 02:47:35.872468       1 utils.go:236] Excluding instance matching huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls: instance i-0f0b9c47755aba1d2 state "terminated" is not in running, pending, stopped, stopping, shutting-down
W1008 02:47:35.930628       1 reconciler.go:481] huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls: Failed to find existing instance by id i-0f0b9c47755aba1d2: instance i-0f0b9c47755aba1d2 state "terminated" is not in running, pending, stopped, stopping, shutting-down
E1008 02:47:35.981754       1 utils.go:236] Excluding instance matching huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls: instance i-0f0b9c47755aba1d2 state "terminated" is not in running, pending, stopped, stopping, shutting-down
...
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qnk6h -oyaml
...
status:
  conditions:
  - lastTransitionTime: "2023-10-08T02:47:15Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2023-10-08T02:48:04Z"
    message: Instance not found on provider
    reason: InstanceMissing
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2023-10-08T02:47:15Z"
    status: "True"
    type: Terminable
  errorMessage: can't find created instance
  lastUpdated: "2023-10-08T02:48:04Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-10-08T02:47:33Z"
      message: Machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: i-06f1ef3a18e07c0b1
    instanceState: Unknown
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws08a-vmn2r-edge-us-east-1-bos-1aa-qscls -oyaml
...
status:
  conditions:
  - lastTransitionTime: "2023-10-08T02:47:14Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2023-10-08T02:47:14Z"
    message: Instance has not been created
    reason: InstanceNotCreated
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2023-10-08T02:47:14Z"
    status: "True"
    type: Terminable
  lastUpdated: "2023-10-08T02:47:14Z"
  phase: Provisioning
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-10-08T02:47:17Z"
      message: Machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: i-0f0b9c47755aba1d2
    instanceState: pending

Then I tried to premerge test this. Still some machines goto Failed and some machines stuck in Provisioning, which is the same as the result without this PR. So seems the PR doesn't work!
Steps:
1.Build image with the PR using cluster-bot
job build openshift/machine-api-provider-aws#83 succeeded
2.Install an AWS Local Zone cluster with the image built in the previous step, we use flexy template: ipi-on-aws/versioned-installer-local_zone-ovn-ci

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.ci.test-2023-10-08-015527-ci-ln-qij1vrb-latest   True        False         24m     Cluster version is 4.14.0-0.ci.test-2023-10-08-015527-ci-ln-qij1vrb-latest
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-aws08c.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                             PHASE     TYPE          REGION      ZONE               AGE
huliu-aws08c-rdgdx-edge-us-east-1-bos-1a-s8nkk   Running   c5d.2xlarge   us-east-1   us-east-1-bos-1a   38m
huliu-aws08c-rdgdx-edge-us-east-1-mia-1a-wz92t   Running   m5.xlarge     us-east-1   us-east-1-mia-1a   38m
huliu-aws08c-rdgdx-master-0                      Running   m6i.xlarge    us-east-1   us-east-1a         42m
huliu-aws08c-rdgdx-master-1                      Running   m6i.xlarge    us-east-1   us-east-1b         42m
huliu-aws08c-rdgdx-master-2                      Running   m6i.xlarge    us-east-1   us-east-1c         42m
huliu-aws08c-rdgdx-worker-us-east-1a-gvdgk       Running   m6i.xlarge    us-east-1   us-east-1a         38m
huliu-aws08c-rdgdx-worker-us-east-1b-gknsm       Running   m6i.xlarge    us-east-1   us-east-1b         38m
huliu-aws08c-rdgdx-worker-us-east-1c-f5w5h       Running   m6i.xlarge    us-east-1   us-east-1c         38m

Copy a default local zone machineset, and change volumeSize to 16384, then create the new machineset. The new machine goto Failed.

liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws08c-rdgdx-edge-us-east-1-bos-1a -oyaml>ms3.yaml 
liuhuali@Lius-MacBook-Pro huali-test % vim ms3.yaml 
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms3.yaml 
machineset.machine.openshift.io/huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE     TYPE          REGION      ZONE               AGE
huliu-aws08c-rdgdx-edge-us-east-1-bos-1a-s8nkk    Running   c5d.2xlarge   us-east-1   us-east-1-bos-1a   50m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-rddl7   Failed    c5d.2xlarge   us-east-1   us-east-1-bos-1a   11m
huliu-aws08c-rdgdx-edge-us-east-1-mia-1a-wz92t    Running   m5.xlarge     us-east-1   us-east-1-mia-1a   50m
huliu-aws08c-rdgdx-master-0                       Running   m6i.xlarge    us-east-1   us-east-1a         54m
huliu-aws08c-rdgdx-master-1                       Running   m6i.xlarge    us-east-1   us-east-1b         54m
huliu-aws08c-rdgdx-master-2                       Running   m6i.xlarge    us-east-1   us-east-1c         54m
huliu-aws08c-rdgdx-worker-us-east-1a-gvdgk        Running   m6i.xlarge    us-east-1   us-east-1a         50m
huliu-aws08c-rdgdx-worker-us-east-1b-gknsm        Running   m6i.xlarge    us-east-1   us-east-1b         50m
huliu-aws08c-rdgdx-worker-us-east-1c-f5w5h        Running   m6i.xlarge    us-east-1   us-east-1c         50m

4.Scale the machineset to replicas=10, I saw that some new machines goto Failed, some new machines stuck in Provisioning. I checked on AWS console, all the Failed and Provisioning machines shows Terminated with this message "Client.VolumeLimitExceeded: Volume limit exceeded. You have exceeded the maximum gp2 storage limit of 30720 GiB in this location for your account. Please contact AWS Support for more information."

liuhuali@Lius-MacBook-Pro huali-test % oc scale machineset huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa --replicas=10
machineset.machine.openshift.io/huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa scaled
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE          TYPE          REGION      ZONE               AGE
huliu-aws08c-rdgdx-edge-us-east-1-bos-1a-s8nkk    Running        c5d.2xlarge   us-east-1   us-east-1-bos-1a   146m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-574z4   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-7q9nl   Provisioning                                                94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-82qfl   Provisioning                                                94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-dt7kh   Provisioning                                                94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-nzwss   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-qhjms   Provisioning                                                94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-qmg9j   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-rddl7   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   107m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq   Provisioning                                                94m
huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   94m
huliu-aws08c-rdgdx-edge-us-east-1-mia-1a-wz92t    Running        m5.xlarge     us-east-1   us-east-1-mia-1a   146m
huliu-aws08c-rdgdx-master-0                       Running        m6i.xlarge    us-east-1   us-east-1a         150m
huliu-aws08c-rdgdx-master-1                       Running        m6i.xlarge    us-east-1   us-east-1b         150m
huliu-aws08c-rdgdx-master-2                       Running        m6i.xlarge    us-east-1   us-east-1c         150m
huliu-aws08c-rdgdx-worker-us-east-1a-gvdgk        Running        m6i.xlarge    us-east-1   us-east-1a         146m
huliu-aws08c-rdgdx-worker-us-east-1b-gknsm        Running        m6i.xlarge    us-east-1   us-east-1b         146m
huliu-aws08c-rdgdx-worker-us-east-1c-f5w5h        Running        m6i.xlarge    us-east-1   us-east-1c         146m
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-77596ddf6f-hqnhh -c machine-controller |grep huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt  |grep "terminated"
I1008 03:51:07.975383       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:09.585096       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:10.592851       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:11.457018       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:12.351679       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:13.080674       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:13.755784       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:14.398499       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:15.112847       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:16.478058       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:19.113459       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:24.447736       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
I1008 03:51:34.935470       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt: Instance state terminated
liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-77596ddf6f-hqnhh -c machine-controller |grep huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq  |grep "terminated"
I1008 03:51:05.560948       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:07.502215       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:09.044864       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:10.095605       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:11.031422       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:11.907554       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:12.788396       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:13.622738       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:14.602525       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:16.009849       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:18.693487       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:24.056515       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:34.445808       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:51:55.069347       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:52:36.145014       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:53:58.226567       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:56:42.256668       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 03:58:38.754420       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:02:10.117751       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:08:25.395495       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:18:15.801384       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:18:50.277309       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:28:00.465163       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:35:30.507645       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:37:52.350866       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
I1008 04:47:40.169134       1 reconciler.go:488] huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-vjhbq: Instance state terminated
liuhuali@Lius-MacBook-Pro huali-test % oc get machine huliu-aws08c-rdgdx-edge-us-east-1-bos-1aa-wmlnt  -oyaml
...
status:
  conditions:
  - lastTransitionTime: "2023-10-08T03:50:47Z"
    status: "True"
    type: Drainable
  - lastTransitionTime: "2023-10-08T03:51:34Z"
    message: Instance not found on provider
    reason: InstanceMissing
    severity: Warning
    status: "False"
    type: InstanceExists
  - lastTransitionTime: "2023-10-08T03:50:47Z"
    status: "True"
    type: Terminable
  errorMessage: can't find created instance
  lastUpdated: "2023-10-08T03:51:34Z"
  phase: Failed
  providerStatus:
    conditions:
    - lastTransitionTime: "2023-10-08T03:51:00Z"
      message: Machine successfully created
      reason: MachineCreationSucceeded
      status: "True"
      type: MachineCreation
    instanceId: i-07dda2783a18f7f19
    instanceState: Unknown

Screenshot on AWS console: https://drive.google.com/file/d/1cQktyGOCMRof14VtU59a2-UNxpkOiCx2/view?usp=sharing
must gather of premerge test cluster: https://drive.google.com/file/d/1wGmGAaOA_8_JGf81iUINKbWk63OvfG9D/view?usp=sharing
must gather of reproduce the issue cluster: https://drive.google.com/file/d/12-DMAg_OeETemlbN2Klbd6EBxd25QX3z/view?usp=sharing

racheljpg · 2023-10-10T13:32:56Z

/retest

…tuck

racheljpg · 2023-10-10T20:20:56Z

/retest

racheljpg · 2023-10-11T08:53:34Z

/retest

racheljpg · 2023-10-11T10:25:13Z

/retest

openshift-ci · 2023-10-11T10:56:36Z

@racheljpg: all tests passed!

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

racheljpg · 2023-10-11T11:06:05Z

Hello @huali9, thank you very much for running the pre-merge tests. I'm afraid in between the time I pinged you and the tests, I had broken the code a bit, so if you wouldn't mind running them again, I'm hoping they should pass now? I know in the original bug was mentioned as something that happened on a local zone cluster also, so maybe this could be why in my testing it seemed to have fixed the issue but in your tests it hadn't. If you wouldn't mind running them again, though, to test my new changes, that would be great. Thank you!

huali9 · 2023-10-12T03:00:55Z

Hello @huali9, thank you very much for running the pre-merge tests. I'm afraid in between the time I pinged you and the tests, I had broken the code a bit, so if you wouldn't mind running them again, I'm hoping they should pass now? I know in the original bug was mentioned as something that happened on a local zone cluster also, so maybe this could be why in my testing it seemed to have fixed the issue but in your tests it hadn't. If you wouldn't mind running them again, though, to test my new changes, that would be great. Thank you!

Hi @racheljpg I tried to premerge test this today, but get the same result. Some new machines goto Failed, some new machines stuck in Provisioning.
Steps:
1.Build image with the PR using cluster-bot
job build openshift/machine-api-provider-aws#83 succeeded
2.Install an AWS Local Zone cluster with the image built in the previous step, we use flexy template: ipi-on-aws/versioned-installer-local_zone-ovn-ci

liuhuali@Lius-MacBook-Pro huali-test % oc get clusterversion 
NAME      VERSION                                                   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.ci.test-2023-10-12-011706-ci-ln-mbtsdm2-latest   True        False         21m     Cluster version is 4.14.0-0.ci.test-2023-10-12-011706-ci-ln-mbtsdm2-latest
liuhuali@Lius-MacBook-Pro huali-test % oc project openshift-machine-api
Now using project "openshift-machine-api" on server "https://api.huliu-aws12b.qe.devcluster.openshift.com:6443".
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                             PHASE     TYPE          REGION      ZONE               AGE
huliu-aws12b-jkvbt-edge-us-east-1-bos-1a-2js8k   Running   c5d.2xlarge   us-east-1   us-east-1-bos-1a   36m
huliu-aws12b-jkvbt-edge-us-east-1-mia-1a-khbfj   Running   m5.xlarge     us-east-1   us-east-1-mia-1a   36m
huliu-aws12b-jkvbt-master-0                      Running   m6i.xlarge    us-east-1   us-east-1a         40m
huliu-aws12b-jkvbt-master-1                      Running   m6i.xlarge    us-east-1   us-east-1b         40m
huliu-aws12b-jkvbt-master-2                      Running   m6i.xlarge    us-east-1   us-east-1c         40m
huliu-aws12b-jkvbt-worker-us-east-1a-4tcpb       Running   m6i.xlarge    us-east-1   us-east-1a         36m
huliu-aws12b-jkvbt-worker-us-east-1b-25gbv       Running   m6i.xlarge    us-east-1   us-east-1b         36m
huliu-aws12b-jkvbt-worker-us-east-1c-8hkcv       Running   m6i.xlarge    us-east-1   us-east-1c         36m

3.Copy a default local zone machineset, and change volumeSize to 16384, then create the new machineset. The new machine goto Provisioned.

liuhuali@Lius-MacBook-Pro huali-test % oc get machineset huliu-aws12b-jkvbt-edge-us-east-1-bos-1a -oyaml>ms1.yaml 
liuhuali@Lius-MacBook-Pro huali-test % vim ms1.yaml 
liuhuali@Lius-MacBook-Pro huali-test % oc create -f ms1.yaml 
machineset.machine.openshift.io/huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa created
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE         TYPE          REGION      ZONE               AGE
huliu-aws12b-jkvbt-edge-us-east-1-bos-1a-2js8k    Running       c5d.2xlarge   us-east-1   us-east-1-bos-1a   47m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-44zb5   Provisioned   c5d.2xlarge   us-east-1   us-east-1-bos-1a   9m49s
huliu-aws12b-jkvbt-edge-us-east-1-mia-1a-khbfj    Running       m5.xlarge     us-east-1   us-east-1-mia-1a   47m
huliu-aws12b-jkvbt-master-0                       Running       m6i.xlarge    us-east-1   us-east-1a         52m
huliu-aws12b-jkvbt-master-1                       Running       m6i.xlarge    us-east-1   us-east-1b         52m
huliu-aws12b-jkvbt-master-2                       Running       m6i.xlarge    us-east-1   us-east-1c         52m
huliu-aws12b-jkvbt-worker-us-east-1a-4tcpb        Running       m6i.xlarge    us-east-1   us-east-1a         47m
huliu-aws12b-jkvbt-worker-us-east-1b-25gbv        Running       m6i.xlarge    us-east-1   us-east-1b         47m
huliu-aws12b-jkvbt-worker-us-east-1c-8hkcv        Running       m6i.xlarge    us-east-1   us-east-1c         47m

4.Scale the machineset to replicas=10, I saw that some new machines goto Failed, some new machines stuck in Provisioning. I checked on AWS console, all the Failed and Provisioning machines shows Terminated with this message "Client.VolumeLimitExceeded: Volume limit exceeded. You have exceeded the maximum gp2 storage limit of 30720 GiB in this location for your account. Please contact AWS Support for more information."

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                              PHASE          TYPE          REGION      ZONE               AGE
huliu-aws12b-jkvbt-edge-us-east-1-bos-1a-2js8k    Running        c5d.2xlarge   us-east-1   us-east-1-bos-1a   76m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-44zb5   Running        c5d.2xlarge   us-east-1   us-east-1-bos-1a   38m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-5p5cw   Provisioning                                                26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-6qp7d   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-7lht8   Provisioning                                                26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-7v5n9   Provisioning                                                26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-8gnp9   Provisioning                                                26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-9qfjg   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-vm2wz   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-vzx67   Failed         c5d.2xlarge   us-east-1   us-east-1-bos-1a   26m
huliu-aws12b-jkvbt-edge-us-east-1-bos-1aa-wfnpf   Provisioning                                                26m
huliu-aws12b-jkvbt-edge-us-east-1-mia-1a-khbfj    Running        m5.xlarge     us-east-1   us-east-1-mia-1a   76m
huliu-aws12b-jkvbt-master-0                       Running        m6i.xlarge    us-east-1   us-east-1a         80m
huliu-aws12b-jkvbt-master-1                       Running        m6i.xlarge    us-east-1   us-east-1b         80m
huliu-aws12b-jkvbt-master-2                       Running        m6i.xlarge    us-east-1   us-east-1c         80m
huliu-aws12b-jkvbt-worker-us-east-1a-4tcpb        Running        m6i.xlarge    us-east-1   us-east-1a         76m
huliu-aws12b-jkvbt-worker-us-east-1b-25gbv        Running        m6i.xlarge    us-east-1   us-east-1b         76m
huliu-aws12b-jkvbt-worker-us-east-1c-8hkcv        Running        m6i.xlarge    us-east-1   us-east-1c         76m
liuhuali@Lius-MacBook-Pro huali-test %

Screenshot on AWS console: https://drive.google.com/file/d/12NnwnuUq2Mwy8Jw9tctXc6tJBCT2ZJX9/view?usp=sharing
Must gather: https://drive.google.com/file/d/1v--I5ghJvVBVvnW9hwW3DfAxiHo-G33G/view?usp=sharing

racheljpg · 2023-10-12T08:59:47Z

Thank you, @huali9. I will look into it further

openshift-bot · 2024-01-10T09:00:39Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-02-10T00:30:30Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

racheljpg · 2024-02-12T10:40:22Z

Hello, adding a comment to remove this stale label as it's something I still want to get back to - but further investigation has to be done on why this problem exists on local zone clusters specifically. Thanks!
/remove-lifecycle rotten

racheljpg · 2024-02-12T10:44:11Z

/remove-lifecycle rotten

openshift-bot · 2024-05-13T01:00:19Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

openshift-bot · 2024-06-12T08:31:04Z

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

JoelSpeed · 2024-06-12T08:38:31Z

/remove-lifecycle rotten

openshift-bot · 2024-09-10T09:00:58Z

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

racheljpg · 2024-09-10T10:46:30Z

/remove-lifecycle stale

racheljpg · 2024-09-10T10:47:54Z

/jira refresh

openshift-ci-robot · 2024-09-10T10:47:58Z

@racheljpg: This pull request references Jira Issue OCPBUGS-15255, which is invalid:

expected the bug to target the "4.18.0" version, but no target version was set

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

racheljpg · 2024-09-10T10:48:20Z

/jira refresh

openshift-ci-robot · 2024-09-10T10:48:24Z

@racheljpg: This pull request references Jira Issue OCPBUGS-15255, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.18.0) matches configured target version for branch (4.18.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @huali9

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

openshift-ci bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 2, 2023

racheljpg changed the title ~~Add terminated as a handled state so terminated instances don't get stuck~~ OCPBUGS-15255: Add terminated as a handled state so terminated instances don't get stuck Oct 2, 2023

openshift-ci-robot added jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. labels Oct 2, 2023

JoelSpeed reviewed Oct 3, 2023

View reviewed changes

racheljpg force-pushed the addTerminatedState branch from 26dfed8 to 00d0049 Compare October 3, 2023 11:05

racheljpg marked this pull request as ready for review October 3, 2023 15:36

openshift-ci bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Oct 3, 2023

openshift-ci bot requested review from odvarkadaniel and RadekManak October 3, 2023 15:39

racheljpg force-pushed the addTerminatedState branch 2 times, most recently from 2a7e423 to f571a1c Compare October 10, 2023 12:54

racheljpg added 2 commits October 10, 2023 15:24

Add terminated as a handled state so terminated instances don't get s…

0fb1a5c

…tuck

Edit unit tests for terminated instances

b87739f

racheljpg force-pushed the addTerminatedState branch from f571a1c to b87739f Compare October 10, 2023 14:25

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 10, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Feb 10, 2024

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Feb 12, 2024

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label May 13, 2024

openshift-ci bot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jun 12, 2024

openshift-ci bot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jun 12, 2024

openshift-ci bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2024

openshift-ci bot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Sep 10, 2024

openshift-ci-robot added the jira/valid-bug Indicates that a referenced Jira bug is valid for the branch this PR is targeting. label Sep 10, 2024

openshift-ci-robot removed the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Sep 10, 2024

openshift-ci bot requested a review from huali9 September 10, 2024 10:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OCPBUGS-15255: Add terminated as a handled state so terminated instances don't get stuck #83

OCPBUGS-15255: Add terminated as a handled state so terminated instances don't get stuck #83

racheljpg commented Oct 2, 2023

openshift-ci bot commented Oct 2, 2023

openshift-ci bot commented Oct 2, 2023

openshift-ci-robot commented Oct 2, 2023

JoelSpeed left a comment

JoelSpeed Oct 3, 2023

racheljpg Oct 3, 2023

racheljpg commented Oct 3, 2023

racheljpg commented Oct 5, 2023

racheljpg commented Oct 6, 2023

huali9 commented Oct 8, 2023 •

edited

Loading

racheljpg commented Oct 10, 2023

racheljpg commented Oct 10, 2023

racheljpg commented Oct 11, 2023

racheljpg commented Oct 11, 2023

openshift-ci bot commented Oct 11, 2023

racheljpg commented Oct 11, 2023

huali9 commented Oct 12, 2023

racheljpg commented Oct 12, 2023

openshift-bot commented Jan 10, 2024

openshift-bot commented Feb 10, 2024

racheljpg commented Feb 12, 2024 •

edited

Loading

racheljpg commented Feb 12, 2024

openshift-bot commented May 13, 2024

openshift-bot commented Jun 12, 2024

JoelSpeed commented Jun 12, 2024

openshift-bot commented Sep 10, 2024

racheljpg commented Sep 10, 2024

racheljpg commented Sep 10, 2024

openshift-ci-robot commented Sep 10, 2024

racheljpg commented Sep 10, 2024

openshift-ci-robot commented Sep 10, 2024

OCPBUGS-15255: Add terminated as a handled state so terminated instances don't get stuck #83

Are you sure you want to change the base?

OCPBUGS-15255: Add terminated as a handled state so terminated instances don't get stuck #83

Conversation

racheljpg commented Oct 2, 2023

openshift-ci bot commented Oct 2, 2023

openshift-ci bot commented Oct 2, 2023

openshift-ci-robot commented Oct 2, 2023

JoelSpeed left a comment

Choose a reason for hiding this comment

JoelSpeed Oct 3, 2023

Choose a reason for hiding this comment

racheljpg Oct 3, 2023

Choose a reason for hiding this comment

racheljpg commented Oct 3, 2023

racheljpg commented Oct 5, 2023

racheljpg commented Oct 6, 2023

huali9 commented Oct 8, 2023 • edited Loading

racheljpg commented Oct 10, 2023

racheljpg commented Oct 10, 2023

racheljpg commented Oct 11, 2023

racheljpg commented Oct 11, 2023

openshift-ci bot commented Oct 11, 2023

racheljpg commented Oct 11, 2023

huali9 commented Oct 12, 2023

racheljpg commented Oct 12, 2023

openshift-bot commented Jan 10, 2024

openshift-bot commented Feb 10, 2024

racheljpg commented Feb 12, 2024 • edited Loading

racheljpg commented Feb 12, 2024

openshift-bot commented May 13, 2024

openshift-bot commented Jun 12, 2024

JoelSpeed commented Jun 12, 2024

openshift-bot commented Sep 10, 2024

racheljpg commented Sep 10, 2024

racheljpg commented Sep 10, 2024

openshift-ci-robot commented Sep 10, 2024

racheljpg commented Sep 10, 2024

openshift-ci-robot commented Sep 10, 2024

huali9 commented Oct 8, 2023 •

edited

Loading

racheljpg commented Feb 12, 2024 •

edited

Loading