-
Notifications
You must be signed in to change notification settings - Fork 192
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OCPBUGS-13190: Avoid spurious updates for internalTrafficPolicy #927
OCPBUGS-13190: Avoid spurious updates for internalTrafficPolicy #927
Conversation
@Miciah: This pull request references Jira Issue OCPBUGS-13190, which is valid. The bug has been moved to the POST state. 3 validation(s) were run on this bug
Requesting review from QA contact: The bug has been updated to refer to the pull request using the external bug tracker. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/assign |
This appears to be another spurious update in the ingress-operator logs from the last e2e-aws-operator job run:
With this PR, the operator ignores all the fields that the diff shows changed, so my guess is that the problem is with null versus empty annotations. I've pushed a3ec26a to treat null and empty annotations as equal. |
@Miciah: This pull request references Jira Issue OCPBUGS-13190, which is valid. 3 validation(s) were run on this bug
Requesting review from QA contact: In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
a3ec26a
to
9c5669c
Compare
https://github.com/openshift/cluster-ingress-operator/compare/a3ec26aeb6113a607e614d7d3fbcb290ab8177b9..9c5669c2c84175af26090df70488ec247314ec2a rebases to fix a conflict in |
/lgtm |
e2e-aws-operator failed on e2e-aws-ovn failed because
/test e2e-aws-ovn |
/test e2e-gcp-ovn e2e-aws-operator failed because of |
@Miciah: Overrode contexts on behalf of Miciah: ci/prow/e2e-aws-operator In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
I reviewed again, still looks good to me. Thanks for the fix. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: gcs278 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold Revision 9c5669c was retested 3 times: holding |
e2e-hypershift failed because the "e2e-hypershift-hypershift-install" step timed out "Waiting for operator deployment to be observed". I don't know where to start with diagnosing e2e-hypershift failures. The CI errors are not meaningful to me. @enxebre, how can my team diagnose failures with this new job? I did notice that https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/pr-logs/pull/openshift_cluster-ingress-operator/927/pull-ci-openshift-cluster-ingress-operator-master-e2e-hypershift/1661465639627264000/artifacts/e2e-hypershift/dump-management-cluster/artifacts/namespaces/hypershift/core/pods/logs/operator-d98677dc8-dk7ks-operator.log has thousands of error messages, but most of them are "manifest unknown" or "unauthorized: authentication required" errors when attempting to pull images from registry.ci.openshift.org. Hoping this is a transient failure... |
e2e-aws-ovn failed because of OCPBUGS-14321, which should be fixed by openshift/origin#27955. e2e-aws-ovn-upgrade failed because
I saw a couple similar failures in search.ci, so I filed OCPBUGS-14400 for these failures. |
9c5669c
to
6cd3b67
Compare
/lgtm |
/hold |
The e2e-hypershift job for this PR passed earlier, but the same job has been failing on #928. I want to see whether e2e-hypershift is still passing on this PR (indicating that #928 is causing a failure) or whether it fails now (indicating that something else broke e2e-hypershift). |
/hold cancel |
e2e-aws-ovn-serial failed on
Search.ci shows several similar failures for other jobs, so I filed OCPBUGS-14930 to track the issue. However, the failure rate appears to be low. e2e-aws-operator failed on
This error appears to be related to API server flakiness and probably does not represent flakiness on the part of the test itself or any problem with the changes in this PR. In this CI job run, several operators failed to roll out all their pods because the kubelet of one of the of the control-plane nodes stopped responding; see |
e2e-aws-operator failed again because |
/test all |
e2e-aws-operator failed because must-gather failed. e2e-aws-ovn-single-node failed because
Having two pods is unexpected because this job creates a single-node cluster, and the deployment and replicaset have
The ingress operator does use the OnePodPerNodeController controller to delete misscheduled pods, but this controller ignores not-ready pods. The node predicate only excludes remote nodes: "nodeAffinity": {
"requiredDuringSchedulingIgnoredDuringExecution": {
"nodeSelectorTerms": [
{
"matchExpressions": [
{
"key": "node.openshift.io/remote-worker",
"operator": "NotIn",
"values": [
""
]
}
]
}
]
}
} I don't know why the scheduler scheduled two pods, or why one of them failed the node predicate. Maybe this failure was a fluke. Also,
Let's see whether it's an eventual consistency issue with image mirroring. |
e2e-aws-operator failed again because must-gather failed. |
@Miciah: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
@Miciah: Jira Issue OCPBUGS-13190: All pull requests linked via external trackers have merged: Jira Issue OCPBUGS-13190 has been moved to the MODIFIED state. In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
/cherrypick release-4.13 release 4.12 |
@candita: #927 failed to apply on top of branch "release-4.13":
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
Fix included in accepted release 4.13.0-0.nightly-2024-05-31-202348 |
Avoid spurious updates for internalTrafficPolicy
Specify
spec.internalTrafficPolicy
on NodePort- and ClusterIP-type services that the operator manages. Also, ignore updates to thespec.ipFamilies
andspec.ipFamilyPolicy
fields.Before this PR, the update logic for NodePort- and ClusterIP-type services would try to revert the default values that the API set for these fields.
assets/router/service-cloud.yaml
:assets/router/service-internal.yaml
: SpecifyinternalTrafficPolicy: Cluster
.pkg/manifests/bindata.go
: Regenerate.pkg/operator/controller/ingress/internal_service.go
(internalServiceChanged
): Ignorespec.ipFamilies
andspec.ipFamilyPolicy
.pkg/operator/controller/ingress/internal_service_test.go
(Test_desiredInternalIngressControllerService
): Verify thatspec.internalServiceChanged
is set to "Cluster".(
Test_internalServiceChanged
): Verify that changes tospec.internalTrafficPolicy
are detected and that changes tospec.ipFamilies
andspec.ipFamilyPolicy
are ignored.pkg/operator/controller/ingress/nodeport_service.go
(desiredNodePortService
): Setspec.internalTrafficPolicy
to "Cluster".(
nodePortServiceChanged
): Ignorespec.ipFamilies
andspec.ipFamilyPolicy
.pkg/operator/controller/ingress/nodeport_service_test.go
(TestDesiredNodePortService
): Verify thatspec.internalTrafficPolicy
is set to "Cluster".(
TestNodePortServiceChanged
): Verify that changes tospec.internalTrafficPolicy
are detected and that changes tospec.ipFamilies
andspec.ipFamilyPolicy
are ignored.Ignore updates for null versus empty ennotations
Ignore updates to annotations when the update is from null to empty or vice versa.
pkg/operator/controller/ingress/internal_service.go
(internalServiceChanged
): UseEquateEmpty
when comparing annotations.pkg/operator/controller/ingress/internal_service_test.go
(TestInternalServiceChangedEmptyAnnotations
): New test to verify thatinternalServiceChanged
treats empty and null annotations as equal.pkg/operator/controller/ingress/load_balancer_service_test.go
(TestLoadBalancerServiceChangedEmptyAnnotations
): New test to verify thatloadBalancerServiceChanged
treats empty and null annotations as equal.pkg/operator/controller/ingress/nodeport_service.go
(nodePortServiceChanged
): UseEquateEmpty
when comparing annotations.pkg/operator/controller/ingress/nodeport_service_test.go
(TestNodePortServiceChangedEmptyAnnotations
): New test to verify thatloadBalancerServiceChanged
treats empty and null annotations as equal.