-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure OVS NIC for OVN #1860
Configure OVS NIC for OVN #1860
Conversation
@runcom FYI |
a24c7ce
to
d221d46
Compare
/test e2e-ovn-step-registry |
@trozet: The specified target(s) for
Use
In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@dcbw fyi |
d221d46
to
00b2d8c
Compare
Is there any documentation/summary about this work? Jira? BZ? Enhancement? |
https://issues.redhat.com/browse/SDN-1030 and more specifically https://issues.redhat.com/browse/SDN-1032 |
path: "/usr/local/bin/configure-ovs.sh" | ||
contents: | ||
inline: | | ||
#!/usr/bin/env bash |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW after #1766 lands, it will become possible to add Go code into the MCD binary itself and have it execute on the host reliably.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks. I had planned on moving to go after I got the bash working. I didn't know about this option.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#1766 landed. Now as I understand things this PR is about controlling the SDN, which we do not need during "firstboot" which is when we perform os updates, only after kubelet is ready to start.
I think all of this bash code could become a machine-config-daemon initialize-ovs
subcommand or so, then the MCD just invokes systemd-run --unit mco-ovs /run/bin/machine-config-daemon initialize-ovs
when it starts up the first time on a node. (Note that you will need to handle the case of the MCD being restarted without a node restart, e.g. new daemonset rollout, so your code should be idempotent)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK after some real-time discussion, it sounds like we need this before kubelet, which does not fit neatly into the current architecture. Today we have:
- Code shipped as part of the OS that runs before and up to kubelet (podman, crio, kubelet)
- The special MCD code introduced for Use MCD binary from container in /run/bin #1766 - but that only applies on the firstboot, after that the MCD pod takes over (i.e. it's after kubelet)
(We're also implicitly having a larger debate about how much "host management" should happen in a pod)
One thing we could do is ship this code as part of a container image (actually, it could be the SDN container), and build upon the "host binary" trick we're already doing there, but instead of extracting it from the pod, have our systemd unit pull and extract it directly and run Before=kubelet.service
. Or, another variant of this is to basically just podman run --privileged -v /:/host
and have the container perform the same trick the MCD does and extract its binary to the host, where it can then be run via systemd-run
.
This is the problem with using a private JIRA instance to track work on public Free/Open Source software... |
/retest |
I think I'm going to take a different approach to this PR and try to configure all of the NICs before networking comes up. |
adaeb21
to
55c97cf
Compare
/test e2e-ovn-step-registry |
21fc932
to
39564eb
Compare
Is this PR intended to fix the ovn-step-registry job? Bc it has a very spotty passage rate. |
No, this PR is not intended to fix anything. If you have some links to other jobs where it is failing @dcbw or I can take a look. |
39564eb
to
41ffbca
Compare
/lgtm Other review comments can be handled later in a follow-up PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have recommendations for simplifications but no blocking objections to the current state. (ie, all of this could potentially be addressed in followups)
/retest |
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: abhat, cgwalters, danwinship, trozet The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@kikisdeliveryservice can you please remove the hold? |
👍 /hold cancel |
/retest Please review the full test history for this PR and help us cut down flakes. |
10 similar comments
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
/retest Please review the full test history for this PR and help us cut down flakes. |
@kikisdeliveryservice is there anyway we can override e2e-gcp-upgrade? This patch only affects the ovn-kubernetes path (not openshift-sdn). The e2e-gcp-upgrade job is failing due to a known bug in openshift-sdn: https://bugzilla.redhat.com/show_bug.cgi?id=1828858 The only job actually testing this this patch is e2e-ovn-step-registry which passed. |
@trozet: The following tests failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
/retest Please review the full test history for this PR and help us cut down flakes. |
As explained, freeing up some jobs, this can't affect e2e-gcp-upgrade (maybe some other future upgarde job will) EDIT: to further confirm https://prow.ci.openshift.org/pr-history/?org=openshift&repo=machine-config-operator&pr=1860 (this commit passed already) /override ci/prow/e2e-gcp-upgrade |
@runcom: Overrode contexts on behalf of runcom: ci/prow/e2e-gcp-upgrade In response to this:
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. |
@@ -5,7 +5,8 @@ contents: | | |||
Description=Update GCP routes for forwarded IPs. | |||
ConditionKernelCommandLine=|ignition.platform.id=gce | |||
ConditionKernelCommandLine=|ignition.platform.id=gcp | |||
After=network.target | |||
Wants=network-online.target | |||
After=network-online.target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
triple checking on this service - @squeed you wrote the original https://github.com/openshift/machine-config-operator/blob/release-4.5/templates/master/00-master/gcp/units/openshift-gcp-routes.service#L8 w/o the wants but it seems Tim found an issue requiring it for gcp+ovn - just triple checking as this service caused some weird issues in the past
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as a follow up, looks like Casey is on PTO also - talked to Stefan, we're not fully sure but Tim pointed out this is needed anyway in OVN+GCP
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes from what I found the gcp routes thing relies on network connectivity and finding an interface to use. With these changes that interface will change sometime after network.target but before network-online.target...meaning there could be a race where gcp routes uses the wrong interface. I see GCP just passed with this patch on
So I think we are OK.
OVN-k8s needs to be able to move the primary NIC of the host into the
OVS bridge, making the physical NIC essentially a Layer 2 port. This
patch uses nmcli to configure OVS with the NIC on it, as well as bring
up the Layer 3 OVS interface. The configuration can be auto-detected
based on the existing default gateway interface or NM configuration key
files may be provided during ignition and placed on the host.