-
Notifications
You must be signed in to change notification settings - Fork 387
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Decouple kubelet node-ip from the Antrea default Gateway uplink #2344
Comments
after thinking more about this, i suspect maybe its related to the default ethernet device being wrong ? this is the vbox interface....
but the network for my kubelets is :
|
using 0.0.0.0/0 as the interface... is that required ? or should that be configurableIve updated my virtualmachines to make sure they use the right @antoninbas i see that you wrote the docs with the spec that the way the interface is picked is by looking up the 0.0.0.0/0 interface https://github.com/antrea-io/antrea/blob/main/docs/design/windows-design.md . Is that 0.0.0.0/0 interface destination prefix explicitly required for the uplink? or is that just a hueristic.
|
(note ive modified this question since my original post, as ive played more with it to figure some things out) |
@lzhecheng any thoughts on this windows IP / gateway configuration issue ? |
btw, from a working cluster,
just as a reference |
So my suggestion here is: we should support an way to override Antrea automatically trying to find the default gateway :) What if we verify if some sort of configuration exists (env var, as suggested by @jayunit100 ) or whatever that defines "hey, this is your public interface, which is different from your kubernetes node ip address because you have two net interfaces: one for management/controlplane comms and other for public traffic) and if that's defined, allows Antrea to find the right interface/gateway with that. Example:
If no override option exists, the IP 192.168.0.20 would be picked, which is wrong in the sense that this IP address does not have a default gateway. But if we allow that to be overwritten, and say "PUBLICNET=10.0.10.0/24" it may have an internal detection logic that says "ok, so my default gateway is 10.0.10.1 which is the gateway from my interface 10.0.10.20/24, which is inside my CIDR of 10.0.10.0/24" @jayunit100 wdyt? |
yeah, i agree, being able to override the default gateway is the thing that matters. |
@jayunit100 @rikatz Thanks for catching the issue and giving suggestions. For Antrea Windows, we always assume there is only one interface on the Node, which is configured with NodeIP (used by kubelet) to comunnicate in the cluster and to access external addresses. Then we could add the interface on OVS bridge to work as the uplink, and we expected Pod-to-external traffic could leave the host from that interface. The default gateway could avoid losing configurations when migrating the interface to OVS bridge. If the default gateway is configured on a different interface from the OVS uplink, I don't think Antrea could satisfy the requirement currently. First, the original routing entry on the default gateway should still exist on the Windows host after OVS uplink interface is configured, because the configuration on that interface is never changed. If Anter adds the routing configuration, it should break the host networking configuration(management traffic might be forwarded from a wrong interface). Second, Antrea would use the $NODE_IP (configured on kubelet) to perform SNAT on OVS bridge for the Pod-to-external traffic, after the packet leaves OVS, it will finally leave the Windows host from the "management" interface according to the routing configuration. But for the reply packet which enters Windows host from management interface, might be not able to be back to OVS to perform de-SNAT without additional configurations. As a result, the client Pod is not able to receive the packet. We might need more time to design to support the scenario that multi-interfaces exist on the Windows host, and the interfaces for Pod-toPod and Pod-to-external traffic are different. |
Thanks @wenyingd ! So... can you clarify what you mean by "losing configurations while migrating the interface" |
To support Windows containers, OVS is working as an extension of HNS Network. Antrea uses "Transparent" HNS Network, and add the physical interface to OVS bridge as the uplink. During this operations, we have removed the L3 configurations from the physical interface, and added them to OVS bridge (br-int). So there should be a short time the existing L3 configurations are "lost" on the host , and then back after they are re-configured on the OVS bridge. |
ah ok ! so you are saying, its totally possible to add this functionality as long as we allow for L3 configuration to be down for a few moments ? That is an acceptable comprimise for our purposes.... if so, can we make a branch of antrea that exhibits this behaviour ? kubernetes-sigs/sig-windows-dev-tools#46 depends on this, and it will allow us to continue using antrea as the default CNI on windows development tooling for upstream Kubernetes :) |
I think @jayunit100 @rikatz 's requirement makes sense.
|
I don't think we can support to use the management interface as the 'only' uplink, because we need to use the Node IP (used by kubelet) to perform tunnel in Encap mode. But what if two uplink interfaces? I am not sure about it, let me have a try.
For OVS operations, they should be the same. The key point is how to forward the "public traffic" from OVS management device to the public network device which is really connected to the external world. If there are two uplink interfaces, it should be easy to implement. So the precondition is still if or not we could support multiple uplink interfaces on OVS.
Yes, we could not assume a default route must exist. Then Antrea should support to perform the defined routing entries on OVS using OpenFlow. |
Great thank you. I will test the patch if available !
|
Having offline discussion with @tnqn , it is valuable to support multical interfaces on Windows and let user decide which interface is used as OVS uplink. In this case, no default gateway configuration on the uplink interface is allowed. One precondition is the chosen interface should be used for Pod traffic to access other Pods in the cluster and to access external addresses, and Antrea would use it to set up tunnel in Encap mode. Besides, if the chosen interface is different from the one that kubelet is using, a risk is the ClusterIP/NodePort Service whose backend endpoint is a hostNetwork Pod, because the IP of the Pod is not managed by OVS. I have made some expriments, it should work when Pod accesses NodeIP if Antrea also performs SNAT on Windows host (not only in OVS). As for NodePort Service, Antrea is using kube-proxy to provide NodePort Service, it should work. As for AnteraProxy, additional work is needed for Antrea to redirect the NodePort Service packets to OVS. Code change is needed to support user choose interfaces as uplink on the Windows host. |
ok, to clarify:
sorry if my questions are redundant :) not an OVS expert :) Thank you so much for looking at this so quickly! |
Correct, I mean 'must'.
In Encap mode, for the Pod-to-Pod traffic (no matter the target Pod is the direct destination or behind a Service), we encap the packet if the destination Pod is on a different Node. But we don't encap the Pod-to-external traffic.
If there is no default gateway on the chosen interface, and the packet destination is not known (that should be external traffic), we plan to let Windows host to perform SNAT. Some additional configuration on Windows host is needed, and we will add it. |
! ok thanks !
|
I mean add special Powershell commands |
hm I tried compiling the binary from your branch here https://storage.googleapis.com/jayunit100/antrea-agent-if-ps-fixes.exe but seems to not run for me. maybe a golfing windows compilation issue on OS X? Do you have an agent.exe I can run? |
Describe the bug
TLDR Windows node logic for selecting default gateways seems to be a little too highly specified with the "assume the default gateway is on the same network as the kubelet node IP". Can we put some kinda knob in here?...
Concretely : I want our windows development environments to work on a private network, where im not gauranteed a default route attached to the
node-ip
(vagrant doesnt make it easy to do this).However, antrea-agent mandates that a default route exists on the kubelet's
node-ip
.prepareHostNetwork
function is called before setting an OVS bridge up. The idea here is that an uplink needs to be setup before we can make an OVS bridge.GetIPNetDeviceFromIP
(in pkg/agent/util/net.go)ultimately the failure happens at this choke point:
Can we make it so that antrea is able to just use ANY default gateway we provide as input, rather then relying on the guess provided by the
_, adapter, err := util.GetIPNetDeviceFromIP(i.nodeConfig.NodeIPAddr.IP)
?Looking at https://docs.projectcalico.org/reference/felix/configuration, theres a
DeviceRouteSourceAddress
which you can select. But in the code for antrea, we seem to guess the route source address based on thenode-ip
of the kubelet.For details, this is what you ultimately get in antrea-agent if it the node-ip for the windows kubelet has no default gateway interface...
To Reproduce
install virtualbox and then try the windows development recipes...
git clone https://github.com/kubernetes-sigs/sig-windows-dev-tools/ ; make all
Expected
Antrea agent would give a clear indication that we're missing an interface as opposed to just dumping the query info.
Actual behavior
Antrea agent dumps the Get-NetRoute query and just dies.
Versions:
Please provide the following information:
The text was updated successfully, but these errors were encountered: