OCPBUGS-434: Absorb PodsScheduled condition into MinAvailable #854

Miciah · 2022-11-04T15:29:09Z

Absorb PodsScheduled condition into MinAvailable

Remove the PodsScheduled status condition on IngressController objects, and report the information that PodsScheduled reported instead in the DeploymentReplicasMinAvailable status condition's message when its status is "False".

Before this change, the PodsScheduled status condition could erroneously signal that there was a problem when some misscheduled pods existed even if there were enough correctly scheduled and available pods.

The PodsScheduled status condition was added in order to provide diagnostic information when there was a problem. That is, PodsScheduled was intended to indicate why there was a problem, not whether there was a problem. The DeploymentReplicasMinAvailable status condition definitively indicates whether there is a problem, and its status message should indicate why there is a problem. Having a separate status condition for the "why" is superfluous and unnecessarily complicated in this instance.

Follow-up to #465.

pkg/operator/controller/ingress/controller.go (IngressControllerPodsScheduledConditionType): Delete const.
pkg/operator/controller/ingress/metrics_test.go (TestDeleteIngressControllerConditionsMetric): Delete a reference to the "PodsScheduled" status condition in a test case.
pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Delete the call to computeDeploymentPodsScheduledCondition. Pass the list of pods to computeDeploymentReplicasMinAvailableCondition.
(PruneConditions): Delete pruning of the "DeploymentDegraded" status condition (which was removed in 4.6.0) and instead prune the "PodsScheduled" status condition.
(computeDeploymentPodsScheduledCondition): Rename...
(checkPodsScheduledForDeployment): ...to this. Change the return value from an operator status condition to an error value.
(computeDeploymentReplicasMinAvailableCondition): Call checkPodsScheduledForDeployment and include the error string in the status condition's message if the function returns an error.
(computeIngressDegradedCondition): Remove IngressControllerPodsScheduledConditionType from the list of conditions to check.
pkg/operator/controller/ingress/status_test.go (TestComputePodsScheduledCondition): Rename...
(Test_checkPodsScheduledForDeployment): ...to this. Update for the function's new name and result value.
(TestComputeIngressDegradedCondition): Delete test cases for PodsScheduled.
(TestComputeDeploymentReplicasMinAvailableCondition): Update to set the deployment's label selector and pass in a list of pods (which is empty).

`checkPodsScheduledForDeployment`: Add nil check

checkPodsScheduledForDeployment takes a pointer to a deployment, which should never be nil, so return an error indicating that there is a bug if the deployment is somehow nil.

pkg/operator/controller/ingress/status.go (checkPodsScheduledForDeployment): Check whether deployment is nil and return an error if it is.
pkg/operator/controller/ingress/status_test.go (Test_checkPodsScheduledForDeployment): Add a "nil deployment" test case.

openshift-ci-robot · 2022-11-04T15:29:14Z

@Miciah: This pull request references Jira Issue OCPBUGS-434, which is invalid:

expected the bug to target the "4.13.0" version, but it targets "4.11.0" instead

Comment /jira refresh to re-evaluate validity if changes to the Jira bug are made, or edit the title of this pull request to link to a different bug.

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Remove the PodsScheduled status condition on IngressController objects, and report the information that PodsScheduled reported instead in the DeploymentReplicasAllAvailable status condition's message when its status is "False".

Before this change, the PodsScheduled status condition could erroneously signal that there was a problem when some misscheduled pods existed even if there were enough correctly scheduled and available pods.

The PodsScheduled status condition was added in order to provide diagnostic information when there was a problem. That is, PodsScheduled was intended to indicate why there was a problem, not whether there was a problem. The DeploymentReplicasAllAvailable status condition definitively indicates whether there is a problem, and its status message should indicate why there is a problem. Having a separate status condition for the "why" is superfluous and unnecessarily complicated in this instance.

Follow-up to #465.

pkg/operator/controller/ingress/controller.go (IngressControllerPodsScheduledConditionType): Delete const.

pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Delete the call to computeDeploymentPodsScheduledCondition. Pass the list of pods to computeDeploymentReplicasAllAvailableCondition.
(computeDeploymentPodsScheduledCondition): Rename...
(checkPodsScheduledForDeployment): ...to this. Change the return value from an operator status condition to an error value.
(computeDeploymentReplicasAllAvailableCondition): Call checkPodsScheduledForDeployment and include the error string in the status condition's message if the function returns an error.
(computeIngressDegradedCondition): Remove IngressControllerPodsScheduledConditionType from the list of conditions to check.

pkg/operator/controller/ingress/status_test.go (TestComputePodsScheduledCondition): Rename...
(Test_checkPodsScheduledForDeployment): ...to this. Update for the function's new name and result value.
(TestComputeIngressDegradedCondition): Delete test cases for PodsScheduled.
(TestComputeDeploymentReplicasAllAvailableCondition): Update to set the deployment's label selector and pass in a list of pods (which is empty).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2022-11-04T15:35:13Z

/jira refresh

openshift-ci-robot · 2022-11-04T15:35:20Z

@Miciah: This pull request references Jira Issue OCPBUGS-434, which is valid. The bug has been moved to the POST state.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.13.0) matches configured target version for branch (4.13.0)
bug is in the state New, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot · 2022-11-04T16:46:30Z

@Miciah: This pull request references Jira Issue OCPBUGS-434, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.13.0) matches configured target version for branch (4.13.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @lihongan

In response to this:

Remove the PodsScheduled status condition on IngressController objects, and report the information that PodsScheduled reported instead in the DeploymentReplicasMinAvailable status condition's message when its status is "False".

Before this change, the PodsScheduled status condition could erroneously signal that there was a problem when some misscheduled pods existed even if there were enough correctly scheduled and available pods.

The PodsScheduled status condition was added in order to provide diagnostic information when there was a problem. That is, PodsScheduled was intended to indicate why there was a problem, not whether there was a problem. The DeploymentReplicasMinAvailable status condition definitively indicates whether there is a problem, and its status message should indicate why there is a problem. Having a separate status condition for the "why" is superfluous and unnecessarily complicated in this instance.

Follow-up to #465.

pkg/operator/controller/ingress/controller.go (IngressControllerPodsScheduledConditionType): Delete const.

pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Delete the call to computeDeploymentPodsScheduledCondition. Pass the list of pods to computeDeploymentReplicasMinAvailableCondition.
(computeDeploymentPodsScheduledCondition): Rename...
(checkPodsScheduledForDeployment): ...to this. Change the return value from an operator status condition to an error value.
(computeDeploymentReplicasMinAvailableCondition): Call checkPodsScheduledForDeployment and include the error string in the status condition's message if the function returns an error.
(computeIngressDegradedCondition): Remove IngressControllerPodsScheduledConditionType from the list of conditions to check.

pkg/operator/controller/ingress/status_test.go (TestComputePodsScheduledCondition): Rename...
(Test_checkPodsScheduledForDeployment): ...to this. Update for the function's new name and result value.
(TestComputeIngressDegradedCondition): Delete test cases for PodsScheduled.
(TestComputeDeploymentReplicasMinAvailableCondition): Update to set the deployment's label selector and pass in a list of pods (which is empty).

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Miciah · 2022-11-04T16:48:12Z

Latest push changes from reporting the information about unscheduled/unschedulable pods from DeploymentReplicasAllAvailable to DeploymentReplicasMinAvailable because the former gets reported in Degraded after a 60-minute grace period whereas the latter gets reported after a 60-second grace period, so the cluster admin is more likely to see useful information sooner if it's in the DeploymentReplicasMinAvailable.

Miciah · 2022-11-11T13:39:26Z

e2e-aws-ovn-upgrade failed because deprovisioning failed. e2e-gcp-ovn-serial because etcd had an unhealthy member. e2e-azure-ovn failed because terraform failed to configure the vnet.
/retest-required

candita · 2022-11-30T17:01:25Z

/assign @gcs278
/assign

gcs278 · 2022-12-06T15:25:52Z

pkg/operator/controller/ingress/status_test.go

-		pods       []corev1.Pod
-		expect     operatorv1.ConditionStatus
+		name        string
+		deployment  *appsv1.Deployment


I always test to ensure functions can handle nil pointers. Do you agree with testing this edge case?

I'm ambivalent in this case and in the general case. Adding nil checks everywhere can make code safer (less likely to panic), but it can also obscure bugs (maybe it's a bug in the caller if the argument is nil). In this case, the caller of checkPodsScheduledForDeployment itself dereferences deployment, so it cannot be nil. For short helper functions like this that are used in only one or two places, I would be fine with saying something like, "The caller must ensure that the deployment argument is not nil." Would a comment in the godoc for checkPodsScheduledForDeployment suffice, or do you still prefer an explicit nil check in checkPodsScheduledForDeployment?

Actually, checkPodsScheduledForDeployment returns an error value, so I'll go ahead and add a check that returns an error if deployment is nil.

So I was a bit of a NIL Pointer Check absolutist until you mentioned it can also obscure bugs. So I agree, it's not that cut-and-dry whether every function needs a NIL pointer check. The GO community seems torn on it too based on my web searches.

So I think I don't mind either way.

candita · 2022-12-07T00:08:53Z

@Miciah I have a small concern with this. What if SD or customers have set up their own monitoring for specific conditions, and we are changing things here. Will there be a good release note to explain the changes to the condition set?

Miciah · 2022-12-07T00:15:39Z

@Miciah I have a small concern with this. What if SD or customers have set up their own monitoring for specific conditions, and we are changing things here. Will there be a good release note to explain the changes to the condition set?

We don't document the status condition, and we don't declare it in openshift/api, so we have made no promises to anyone that they can rely on it. We could make a release note if you think that's appropriate, though I would want to state clearly in any release note that no one should be relying on undocumented status conditions anyway.

Your question does spark a thought—we should prune the removed status condition on upgrade to make sure we don't leave a stale status condition around.

Remove the PodsScheduled status condition on IngressController objects, and report the information that PodsScheduled reported instead in the DeploymentReplicasMinAvailable status condition's message when its status is "False". Before this commit, the PodsScheduled status condition could erroneously signal that there was a problem when some misscheduled pods existed even if there were enough correctly scheduled and available pods. The PodsScheduled status condition was added in order to provide diagnostic information when there was a problem. That is, PodsScheduled was intended to indicate *why* there was a problem, not *whether* there was a problem. The DeploymentReplicasMinAvailable status condition definitively indicates *whether* there is a problem, and its status message should indicate *why* there is a problem. Having a separate status condition for the "why" is superfluous and unnecessarily complicated in this instance. Follow-up to commit c5e0f4a. This bug fixes OCPBUGS-434. https://issues.redhat.com/browse/OCPBUGS-434 * pkg/operator/controller/ingress/controller.go (IngressControllerPodsScheduledConditionType): Delete const. * pkg/operator/controller/ingress/metrics_test.go (TestDeleteIngressControllerConditionsMetric): Delete a reference to the "PodsScheduled" status condition in a test case. * pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Delete the call to computeDeploymentPodsScheduledCondition. Pass the list of pods to computeDeploymentReplicasMinAvailableCondition. (PruneConditions): Delete pruning of the "DeploymentDegraded" status condition (which was removed in 4.6.0) and instead prune the "PodsScheduled" status condition. (computeDeploymentPodsScheduledCondition): Rename... (checkPodsScheduledForDeployment): ...to this. Change the return value from an operator status condition to an error value. (computeDeploymentReplicasMinAvailableCondition): Call checkPodsScheduledForDeployment and include the error string in the status condition's message if the function returns an error. (computeIngressDegradedCondition): Remove IngressControllerPodsScheduledConditionType from the list of conditions to check. * pkg/operator/controller/ingress/status_test.go (TestComputePodsScheduledCondition): Rename... (Test_checkPodsScheduledForDeployment): ...to this. Update for the function's new name and result value. (TestComputeIngressDegradedCondition): Delete test cases for PodsScheduled. (TestComputeDeploymentReplicasMinAvailableCondition): Update to set the deployment's label selector and pass in a list of pods (which is empty).

openshift-ci-robot · 2022-12-07T00:22:35Z

@Miciah: This pull request references Jira Issue OCPBUGS-434, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.13.0) matches configured target version for branch (4.13.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

The bug has been updated to refer to the pull request using the external bug tracker.

In response to this:

Absorb PodsScheduled condition into MinAvailable

Remove the PodsScheduled status condition on IngressController objects, and report the information that PodsScheduled reported instead in the DeploymentReplicasMinAvailable status condition's message when its status is "False".

Before this change, the PodsScheduled status condition could erroneously signal that there was a problem when some misscheduled pods existed even if there were enough correctly scheduled and available pods.

The PodsScheduled status condition was added in order to provide diagnostic information when there was a problem. That is, PodsScheduled was intended to indicate why there was a problem, not whether there was a problem. The DeploymentReplicasMinAvailable status condition definitively indicates whether there is a problem, and its status message should indicate why there is a problem. Having a separate status condition for the "why" is superfluous and unnecessarily complicated in this instance.

Follow-up to #465.

pkg/operator/controller/ingress/controller.go (IngressControllerPodsScheduledConditionType): Delete const.

pkg/operator/controller/ingress/metrics_test.go (TestDeleteIngressControllerConditionsMetric): Delete a reference to the "PodsScheduled" status condition in a test case.

pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Delete the call to computeDeploymentPodsScheduledCondition. Pass the list of pods to computeDeploymentReplicasMinAvailableCondition.
(PruneConditions): Delete pruning of the "DeploymentDegraded" status condition (which was removed in 4.6.0) and instead prune the "PodsScheduled" status condition.
(computeDeploymentPodsScheduledCondition): Rename...
(checkPodsScheduledForDeployment): ...to this. Change the return value from an operator status condition to an error value.
(computeDeploymentReplicasMinAvailableCondition): Call checkPodsScheduledForDeployment and include the error string in the status condition's message if the function returns an error.
(computeIngressDegradedCondition): Remove IngressControllerPodsScheduledConditionType from the list of conditions to check.

pkg/operator/controller/ingress/status_test.go (TestComputePodsScheduledCondition): Rename...
(Test_checkPodsScheduledForDeployment): ...to this. Update for the function's new name and result value.
(TestComputeIngressDegradedCondition): Delete test cases for PodsScheduled.
(TestComputeDeploymentReplicasMinAvailableCondition): Update to set the deployment's label selector and pass in a list of pods (which is empty).

checkPodsScheduledForDeployment: Add nil check

checkPodsScheduledForDeployment takes a pointer to a deployment, which should never be nil, so return an error indicating that there is a bug if the deployment is somehow nil.

pkg/operator/controller/ingress/status.go (checkPodsScheduledForDeployment): Check whether deployment is nil and return an error if it is.

pkg/operator/controller/ingress/status_test.go (Test_checkPodsScheduledForDeployment): Add a "nil deployment" test case.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

candita · 2022-12-07T20:44:33Z

pkg/operator/controller/ingress/status.go

+// result of that check.
+func checkPodsScheduledForDeployment(deployment *appsv1.Deployment, pods []corev1.Pod) error {
+	if deployment == nil {
+		return errors.New("Deployment was nil.  This is a bug!")


Suggested change

return errors.New("Deployment was nil. This is a bug!")

return errors.New("System or configuration error detected - deployment was nil.")

Is that better than just saying, "This is a bug"? We don't want to mislead the cluster admin into thinking that maybe something is wrong with configuration that the cluster admin can fix when really the problem is that the operator has a bug that needs to be fixed.

Then maybe just "System error detected." and put a link to the bug reporting tool!

I updated the error message as you suggested. The Jira link is a bit unwieldy though.

candita · 2022-12-07T21:15:54Z

pkg/operator/controller/ingress/status_test.go

-		{
-			name: "pods not scheduled for <10m",
-			conditions: []operatorv1.OperatorCondition{
-				cond(IngressControllerPodsScheduledConditionType, operatorv1.ConditionFalse, "", clock.Now().Add(time.Minute*-9)),
-			},
-			expectIngressDegradedStatus: operatorv1.ConditionFalse,
-			expectRequeue:               true,
-			// Grace period is 10 minutes, subtract the 9 minute spoofed last transition time
-			expectAfter: time.Minute,
-		},
-		{
-			name: "pods not scheduled for >10m",
-			conditions: []operatorv1.OperatorCondition{
-				cond(IngressControllerPodsScheduledConditionType, operatorv1.ConditionFalse, "", clock.Now().Add(time.Minute*-31)),
-			},
-			expectIngressDegradedStatus: operatorv1.ConditionTrue,
-			expectRequeue:               true,
-			// Just use the one minute retry duration for this degraded condition
-			expectAfter: time.Minute,
-		},


Do we no longer have a grace period for which pods can remain unscheduled and not yet considered unavailable?

There is no longer a grace period for the PodsScheduled status condition, because the PodsScheduled status condition no longer even exists after this PR.

The point of this PR is to remove the PodsScheduled status condition because it was spuriously indicating a condition (some pods weren't scheduled) as a problem when really it wasn't necessarily a problem at all (the not-scheduled pods might actually be pods that are waiting to be garbage-collected). However, the operator still reports the DeploymentReplicasMinAvailable status condition (which has a 60-second grace period, before and after this PR), which conveys the information that really does matter (whether there are enough pods that are scheduled, running, and ready).

So will this automatically continue to reconcile while there are not enough pods scheduled?

Yeah, if there are not enough available pods, then DeploymentReplicasMinAvailable will have status False, and the grace logic will queue reconciliation for the end of the condition's grace period. This is another way in which PodsScheduled is superfluous: DeploymentReplicasMinAvailable has a shorter grace period than PodsScheduled had, so if DeploymentReplicasMinAvailable is False, its grace period already would have triggered reconcilation (and still does trigger it) before PodsScheduled's grace period would have triggered reconciliation.

gcs278 · 2022-12-07T21:19:39Z

I forgot about status pruning, good catch.

/lgtm

checkPodsScheduledForDeployment takes a pointer to a deployment, which should never be nil, so return an error indicating that there is a bug if the deployment is somehow nil. * pkg/operator/controller/ingress/status.go (checkPodsScheduledForDeployment): Check whether deployment is nil and return an error if it is. * pkg/operator/controller/ingress/status_test.go (Test_checkPodsScheduledForDeployment): Add a "nil deployment" test case.

gcs278 · 2022-12-12T18:58:44Z

pkg/operator/controller/ingress/status.go

+// result of that check.
+func checkPodsScheduledForDeployment(deployment *appsv1.Deployment, pods []corev1.Pod) error {
+	if deployment == nil {
+		return errors.New("System error detected: deployment was nil.  Please report this issue to Red Hat: https://issues.redhat.com/secure/CreateIssueDetails!init.jspa?pid=12332330&issuetype=1&components=12367900&priority=10300&customfield_12316142=26752")


I have some concern this link may break in the future. Would it be better to just do https://issues.redhat.com/?

I think that that would lead to more misrouted bug reports.

I could say "open a support case", but I don't even know the link for that, and neither would OKD users.

🤷‍♂️ this is fine, hopefully we never hit this logic

gcs278 · 2022-12-15T20:45:45Z

/lgtm

candita · 2022-12-16T20:06:16Z

/retest-required

candita · 2022-12-16T20:18:00Z

This test output for e2e-gcp-ovn-serial is unusual in that it mentions two bugs:

ERRO[2022-12-08T03:23:26Z] Some steps failed:
...

could not run steps: step e2e-gcp-ovn-serial failed: "e2e-gcp-ovn-serial" test steps failed: "e2e-gcp-ovn-serial" pod "e2e-gcp-ovn-serial-openshift-e2e-test" failed: the pod ci-op-jk9ifky1/e2e-gcp-ovn-serial-openshift-e2e-test failed after 1h31m9s (failed containers: test): ContainerFailed one or more containers exited
Container test exited with code 1, reason Error

57746Z","updated_at":"2022-12-08T02:54:41.100513Z","deleted_at":null,"status":"ON_QA","last_change_time":"2022-11-28T10:13:32Z","summary":"masters repeatedly losing connection to API and going NotReady","affects_versions":["4.12","4.13"],"fix_versions":[],"components":["Machine Config Operator / platform-baremetal"],"labels":["sippy-watchlist","sippy-watchlist","trt","trt"],"url":"https://issues.redhat.com/browse/OCPBUGS-3874"},{"id":14935611,"key":"OCPBUGS-3642","created_at":"2022-11-14T18:07:08.62315Z","updated_at":"2022-12-08T02:54:40.527553Z","deleted_at":null,"status":"New","last_change_time":"2022-11-14T16:25:50Z","summary":"Readiness and Liveliness Probe errors in openshift-oauth-apiserver and test failures","affects_versions":["4.12"],"fix_versions":[],"components":["apiserver-auth"],"labels":[],"url":"https://issues.redhat.com/browse/OCPBUGS-3642"}]},{"Name":"[sig-network-edge] ns/openshift-authentication route/oauth-openshift disruption/ingress-to-oauth-server connection/new should be available throughout the test","Risk":{"Level":{"Name":"Low","Level":1},"Reasons":["This test has passed 10.53% of 19 runs on release 4.13 [amd64 gcp ha ovn serial] in the last week."]},"OpenBugs":[]},{"Name":"[sig-network-edge] ns/openshift-console route/console disruption/ingress-to-console connection/reused should be available throughout the test","Risk":{"Level":{"Name":"Low","Level":1},"Reasons":["This test has passed 21.05% of 19 runs on release 4.13 [amd64 gcp ha ovn serial] in the last week."]},"OpenBugs":[]}],"OverallRisk":{"Level":{"Name":"High","Level":10},"Reasons":["Maximum failed test risk: High"]},"OpenBugs":[]}

candita · 2022-12-17T00:46:40Z

/retest-required

candita · 2022-12-17T00:47:19Z

/approve

openshift-ci · 2022-12-17T00:47:37Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: candita

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [candita]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

openshift-ci-robot · 2022-12-17T03:34:20Z

/retest-required

Remaining retests: 0 against base HEAD d9d1a2b and 2 for PR HEAD 67b0e92 in total

openshift-ci-robot · 2022-12-19T14:56:10Z

/retest-required

Remaining retests: 0 against base HEAD 560b2d3 and 1 for PR HEAD 67b0e92 in total

openshift-ci · 2022-12-19T17:31:52Z

@Miciah: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name	Commit	Details	Required	Rerun command
ci/prow/e2e-aws-ovn-single-node	`67b0e92`	link	false	`/test e2e-aws-ovn-single-node`
ci/prow/e2e-gcp-ovn-serial	`67b0e92`	link	true	`/test e2e-gcp-ovn-serial`

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

openshift-bot · 2023-01-01T00:01:14Z

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target OCP 4.12), recalculating validity.

openshift-bot · 2023-01-01T00:01:25Z

/bugzilla refresh

The requirements for Bugzilla bugs have changed (BZs linked to PRs on main branch need to target OCP 4.12), recalculating validity.

openshift-ci-robot · 2023-01-01T00:01:27Z

@openshift-bot: This pull request references Jira Issue OCPBUGS-434, which is valid.

3 validation(s) were run on this bug

bug is open, matching expected state (open)
bug target version (4.13.0) matches configured target version for branch (4.13.0)
bug is in the state POST, which is one of the valid states (NEW, ASSIGNED, POST)

Requesting review from QA contact:
/cc @melvinjoseph86

In response to this:

/jira refresh

The requirements for Jira bugs have changed (Jira issues linked to PRs on main branch need to target OCP 4.12), recalculating validity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci · 2023-01-01T00:02:24Z

@openshift-bot: No Bugzilla bug is referenced in the title of this pull request.
To reference a bug, add 'Bug XXX:' to the title of this pull request and request another bug refresh with /bugzilla refresh.

Retaining the bugzilla/valid-bug label as it was manually added.

In response to this:

/bugzilla refresh

The requirements for Bugzilla bugs have changed (BZs linked to PRs on main branch need to target OCP 4.12), recalculating validity.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

candita · 2023-01-13T20:14:15Z

@Miciah There's a new frequent (but not permanent) failure in TestUnmanagedDNSToManagedDNSInternalIngressController. Could this change have caused that?

openshift-ci-robot · 2023-01-13T20:50:39Z

/retest-required

Remaining retests: 0 against base HEAD fc69408 and 0 for PR HEAD 67b0e92 in total

openshift-ci-robot · 2023-01-13T22:17:05Z

@Miciah: All pull requests linked via external trackers have merged:

openshift/cluster-ingress-operator#854

Jira Issue OCPBUGS-434 has been moved to the MODIFIED state.

In response to this:

Absorb PodsScheduled condition into MinAvailable

Remove the PodsScheduled status condition on IngressController objects, and report the information that PodsScheduled reported instead in the DeploymentReplicasMinAvailable status condition's message when its status is "False".

Before this change, the PodsScheduled status condition could erroneously signal that there was a problem when some misscheduled pods existed even if there were enough correctly scheduled and available pods.

The PodsScheduled status condition was added in order to provide diagnostic information when there was a problem. That is, PodsScheduled was intended to indicate why there was a problem, not whether there was a problem. The DeploymentReplicasMinAvailable status condition definitively indicates whether there is a problem, and its status message should indicate why there is a problem. Having a separate status condition for the "why" is superfluous and unnecessarily complicated in this instance.

Follow-up to #465.

pkg/operator/controller/ingress/controller.go (IngressControllerPodsScheduledConditionType): Delete const.

pkg/operator/controller/ingress/metrics_test.go (TestDeleteIngressControllerConditionsMetric): Delete a reference to the "PodsScheduled" status condition in a test case.

pkg/operator/controller/ingress/status.go (syncIngressControllerStatus): Delete the call to computeDeploymentPodsScheduledCondition. Pass the list of pods to computeDeploymentReplicasMinAvailableCondition.
(PruneConditions): Delete pruning of the "DeploymentDegraded" status condition (which was removed in 4.6.0) and instead prune the "PodsScheduled" status condition.
(computeDeploymentPodsScheduledCondition): Rename...
(checkPodsScheduledForDeployment): ...to this. Change the return value from an operator status condition to an error value.
(computeDeploymentReplicasMinAvailableCondition): Call checkPodsScheduledForDeployment and include the error string in the status condition's message if the function returns an error.
(computeIngressDegradedCondition): Remove IngressControllerPodsScheduledConditionType from the list of conditions to check.

pkg/operator/controller/ingress/status_test.go (TestComputePodsScheduledCondition): Rename...
(Test_checkPodsScheduledForDeployment): ...to this. Update for the function's new name and result value.
(TestComputeIngressDegradedCondition): Delete test cases for PodsScheduled.
(TestComputeDeploymentReplicasMinAvailableCondition): Update to set the deployment's label selector and pass in a list of pods (which is empty).

checkPodsScheduledForDeployment: Add nil check

checkPodsScheduledForDeployment takes a pointer to a deployment, which should never be nil, so return an error indicating that there is a bug if the deployment is somehow nil.

pkg/operator/controller/ingress/status.go (checkPodsScheduledForDeployment): Check whether deployment is nil and return an error if it is.

pkg/operator/controller/ingress/status_test.go (Test_checkPodsScheduledForDeployment): Add a "nil deployment" test case.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

openshift-ci-robot added the jira/invalid-bug Indicates that a referenced Jira bug is invalid for the branch this PR is targeting. label Nov 4, 2022

openshift-ci bot requested review from candita and lmzuccarelli November 4, 2022 15:29

openshift-ci bot requested a review from lihongan November 4, 2022 15:35

Miciah force-pushed the OCPBUGS-434-absorb-PodsScheduled-condition-into-AllAvailable branch from 6947f72 to 509baf5 Compare November 4, 2022 16:45

Miciah changed the title ~~OCPBUGS-434: Absorb PodsScheduled condition into AllAvailable~~ OCPBUGS-434: Absorb PodsScheduled condition into MinAvailable Nov 4, 2022

openshift-ci bot assigned candita and gcs278 Nov 30, 2022

gcs278 reviewed Dec 6, 2022

View reviewed changes

Miciah force-pushed the OCPBUGS-434-absorb-PodsScheduled-condition-into-AllAvailable branch from f4da6f3 to 436fd31 Compare December 7, 2022 00:20

openshift-ci bot requested a review from melvinjoseph86 December 7, 2022 00:22

candita reviewed Dec 7, 2022

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 7, 2022

Miciah force-pushed the OCPBUGS-434-absorb-PodsScheduled-condition-into-AllAvailable branch from 436fd31 to 67b0e92 Compare December 8, 2022 00:56

openshift-ci bot removed the lgtm Indicates that a PR is ready to be merged. label Dec 8, 2022

gcs278 reviewed Dec 12, 2022

View reviewed changes

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Dec 15, 2022

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 17, 2022

openshift-merge-robot merged commit b81ee72 into openshift:master Jan 13, 2023

Miciah mentioned this pull request Jul 19, 2023

NE-1183: Rename unit tests for specific functions #880

Merged

	return errors.New("Deployment was nil. This is a bug!")
	return errors.New("System or configuration error detected - deployment was nil.")

OCPBUGS-434: Absorb PodsScheduled condition into MinAvailable #854

OCPBUGS-434: Absorb PodsScheduled condition into MinAvailable #854

Conversation

Miciah commented Nov 4, 2022 • edited Loading

Absorb PodsScheduled condition into MinAvailable

checkPodsScheduledForDeployment: Add nil check

openshift-ci-robot commented Nov 4, 2022

Miciah commented Nov 4, 2022

openshift-ci-robot commented Nov 4, 2022

openshift-ci-robot commented Nov 4, 2022

Miciah commented Nov 4, 2022

Miciah commented Nov 11, 2022

candita commented Nov 30, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

candita commented Dec 7, 2022

Miciah commented Dec 7, 2022

openshift-ci-robot commented Dec 7, 2022

Absorb PodsScheduled condition into MinAvailable

checkPodsScheduledForDeployment: Add nil check

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcs278 commented Dec 7, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gcs278 commented Dec 15, 2022

candita commented Dec 16, 2022

candita commented Dec 16, 2022

candita commented Dec 17, 2022

candita commented Dec 17, 2022

openshift-ci bot commented Dec 17, 2022

openshift-ci-robot commented Dec 17, 2022

openshift-ci-robot commented Dec 19, 2022

openshift-ci bot commented Dec 19, 2022

openshift-bot commented Jan 1, 2023

openshift-bot commented Jan 1, 2023

openshift-ci-robot commented Jan 1, 2023

openshift-ci bot commented Jan 1, 2023

candita commented Jan 13, 2023

openshift-ci-robot commented Jan 13, 2023

openshift-ci-robot commented Jan 13, 2023

Absorb PodsScheduled condition into MinAvailable

checkPodsScheduledForDeployment: Add nil check

Miciah commented Nov 4, 2022 •

edited

Loading

`checkPodsScheduledForDeployment`: Add nil check

`checkPodsScheduledForDeployment`: Add nil check

`checkPodsScheduledForDeployment`: Add nil check