Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port forwarding should be through iptables #15

Closed
jbeda opened this issue Jun 7, 2014 · 19 comments · Fixed by #25604 · May be fixed by codefresh-io/kubernetes#1
Closed

Port forwarding should be through iptables #15

jbeda opened this issue Jun 7, 2014 · 19 comments · Fixed by #25604 · May be fixed by codefresh-io/kubernetes#1
Assignees
Labels
area/docker priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.

Comments

@jbeda
Copy link
Contributor

jbeda commented Jun 7, 2014

As we've disabled docker managing IP tables we no longer get it setting up explicit port forwarding. Instead we are falling back on user mode proxying that is built into docker to do this port forwarding.

We should either build our networking mode into Docker proper or have the kubelet set up and maintain the IPTables forwarding rules we need.

@thockin
Copy link
Member

thockin commented Jun 24, 2014

Why did we disable IP tables?

@jbeda
Copy link
Contributor Author

jbeda commented Jun 24, 2014

We needed to disable the blanket NAT for all traffic out of the bridge and only nat for ! 10.0.0.0/8. But the Docker flag for disable that also disabled the NAT for setting up iptables for incoming port maps. Docker then falls back on user mode proxying of that traffic.

We either need to make Docker itself more flexible or start setting up the port forwarding in the kubelet. If we do it in the kubelet we could do something like:

  1. Set up the netns container for the pod
  2. Set up the port forwards
  3. Launch other containers for the pod

If we want to adapt Docker, it might be enough to split the iptables flag into a flag for configuring the iptables for the bridge in general vs. configuring iptables for the incoming port mapping.

@thockin
Copy link
Member

thockin commented Jun 24, 2014

I feel like a moron for not understanding the problem after that
explanation. Why do we need to disable NAT? I'm still digging into the
impl of Kubelet, so maybe it will emerge.

Aren't you on vacation?

On Tue, Jun 24, 2014 at 6:21 AM, Joe Beda notifications@github.com wrote:

We needed to disable the blanket NAT for all traffic out of the bridge and
only nat for ! 10.0.0.0/8. But the Docker flag for disable that also
disabled the NAT for setting up iptables for incoming port maps. Docker
then falls back on user mode proxying of that traffic.

We either need to make Docker itself more flexible or start setting up the
port forwarding in the kubelet. If we do it in the kubelet we could do
something like:

  1. Set up the netns container for the pod
  2. Set up the port forwards
  3. Launch other containers for the pod

If we want to adapt Docker, it might be enough to split the iptables flag
into a flag for configuring the iptables for the bridge in general vs.
configuring iptables for the incoming port mapping.

Reply to this email directly or view it on GitHub
#15 (comment)
.

@brendandburns
Copy link
Contributor

Joe is the expert on our config, but I think we needed to disable NAT to be
able to assign our own IP addresses.

Brendan
On Jun 24, 2014 8:26 AM, "Tim Hockin" notifications@github.com wrote:

I feel like a moron for not understanding the problem after that
explanation. Why do we need to disable NAT? I'm still digging into the
impl of Kubelet, so maybe it will emerge.

Aren't you on vacation?

On Tue, Jun 24, 2014 at 6:21 AM, Joe Beda notifications@github.com
wrote:

We needed to disable the blanket NAT for all traffic out of the bridge
and
only nat for ! 10.0.0.0/8. But the Docker flag for disable that also
disabled the NAT for setting up iptables for incoming port maps. Docker
then falls back on user mode proxying of that traffic.

We either need to make Docker itself more flexible or start setting up
the
port forwarding in the kubelet. If we do it in the kubelet we could do
something like:

  1. Set up the netns container for the pod
  2. Set up the port forwards
  3. Launch other containers for the pod

If we want to adapt Docker, it might be enough to split the iptables flag
into a flag for configuring the iptables for the bridge in general vs.
configuring iptables for the incoming port mapping.

Reply to this email directly or view it on GitHub
<
#15 (comment)

.


Reply to this email directly or view it on GitHub
#15 (comment)
.

@jbeda
Copy link
Contributor Author

jbeda commented Jun 24, 2014

Quick clarification and then back to vacation :)

We want traffic between containers to use the container API across nodes. Say we have Node A with a container IP space of 10.244.1.0/24 and Node B with a container IP space of 10.244.2.0/24. And we have Container A1 at 10.244.1.1 and Container B1 at 10.244.2.1. We want Container A1 to talk to Container B1 directly with no NAT. B1 should see the "source" in the IP packets of 10.244.1.1 -- not the "primary" IP for Node A. That means that we want to turn off NAT for traffic between containers (and also between VMs and containers).

But we don't support (yet?) the extra container IPs that we've provisioned talking to the internet directly. We don't map external IPs to the container IPs. So we solve that problem by having traffic that isn't to the internal network (! 10.0.0.0/8) get NATed through the primary host IP address so that it can get 1:1 NATed by the GCE networking when talking to the internet. Similarly, incoming traffic from the internet has to get NATed/proxied through the host IP.

So we end up with 3 cases:

  1. Container -> Container or Container <-> VM. These should use 10. addresses directly and there should be no NAT.
  2. Container -> Internet. These have to get mapped to the primary host IP so that GCE knows how to egress that traffic. There is actually 2 layers of NAT here: Container IP -> Internal Host IP -> External Host IP. The first level happens in the guest with IP tables and the second happens as part of GCE networking. The first one (Container IP -> internal host IP) does dynamic port allocation while the second maps ports 1:1.
  3. Internet -> Container. This also has to go through the primary host IP and also has 2 levels of NAT, ideally. However, the path currently is a proxy with (External Host IP -> Internal Host IP -> Docker) -> (Docker -> Container IP). Once this bug is closed, it should be External Host IP -> Internal Host IP -> Container IP. But to get that second arrow we have to set up the port forwarding iptables rules per mapped port.

If this is still confusing let me know and I can find some time for a hangout to dive into the details.

@bgrant0607
Copy link
Member

Docker issues that may be relevant to the iptables issue:
moby/moby#6002
moby/moby#6034

ncdc referenced this issue in ncdc/kubernetes Aug 1, 2014
Introduce pluggable strategies for build types
@thockin
Copy link
Member

thockin commented Aug 18, 2014

I think this is fixed, or at least soon. With my PR for true IP-per-pod, I am pretty sure we circumvent Docker's proxying entirely.

@bgrant0607 bgrant0607 added this to the v0.5 milestone Oct 4, 2014
@thockin thockin closed this as completed Oct 5, 2014
@jbeda
Copy link
Contributor Author

jbeda commented Oct 6, 2014

Tim -- can you confirm that we use iptables when you specify a hostPort in a pod manifest? Docker was setting that up. Did we find a way to have Docker do it? Or are we doing it in the kubelet now?

@thockin
Copy link
Member

thockin commented Oct 6, 2014

Ah, right, HostPorts. Yes, in case of HostPorts we do still get proxied through docker, it seems.

@thockin thockin reopened this Oct 6, 2014
@brendandburns brendandburns modified the milestones: v1.0, v0.5 Oct 21, 2014
@brendandburns
Copy link
Contributor

I'm going to move this out of the 0.5 release, as I think that this is a P2 and thus not a blocker.

@jbeda
Copy link
Contributor Author

jbeda commented Oct 21, 2014

Your call -- but this means that any hostPort traffic will go through the docker process. It'll be slow and might restrict some types of serving workloads. As we get a better external access story this may or may not become more important.

@bgrant0607 bgrant0607 removed this from the v1.0 milestone Dec 3, 2014
@bgrant0607 bgrant0607 added the priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. label Dec 3, 2014
@errordeveloper
Copy link
Member

errordeveloper commented Dec 8, 2014

@slintes suggests that it works without the --iptables=false --ip-masq=false flags now (weave/issues#260 (comment)).

@bluehallu
Copy link

@errordeveloper Your link is broken, correct link is weaveworks/weave#260 (comment)

batikanu added a commit to batikanu/kubernetes that referenced this issue Jul 1, 2016
…gyGmbH/hyperkube-install

Hyperkube install
lazypower pushed a commit to lazypower/kubernetes that referenced this issue Sep 8, 2016
resouer pushed a commit to resouer/kubernetes that referenced this issue Sep 20, 2016
print when podconfig is sent to main loop
joshwget pushed a commit to joshwget/kubernetes that referenced this issue Sep 20, 2016
cofyc added a commit to cofyc/kubernetes that referenced this issue Mar 15, 2018
Pass monitors addresses in a comma-separated list instead of trying one
tossmilestone added a commit to tossmilestone/kubernetes that referenced this issue Jul 25, 2018
…rap-race

Avoid copying aggregated admin/edit/view roles during bootstrap
yujuhong added a commit to yujuhong/kubernetes that referenced this issue Nov 7, 2018
seans3 pushed a commit to seans3/kubernetes that referenced this issue Apr 10, 2019
683ed1d (Brandon Philips, 27 minutes ago)    release-1.3: features-1.3: make dlorenc the minikube assignee
varunmar pushed a commit to varunmar/kubernetes that referenced this issue Apr 26, 2019
cjcullen pushed a commit to cjcullen/kubernetes that referenced this issue Apr 27, 2019
b3atlesfan pushed a commit to b3atlesfan/kubernetes that referenced this issue Feb 5, 2021
stevekuznetsov added a commit to stevekuznetsov/kubernetes that referenced this issue Nov 10, 2021
HACK: Search for the right CRD cluster when watching with wildcards
linxiulei pushed a commit to linxiulei/kubernetes that referenced this issue Jan 18, 2024
…ckerfile

Add entrypoint and copy kernel monitor config into docker image
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docker priority/awaiting-more-evidence Lowest priority. Possibly useful, but not yet enough support to actually get it done. sig/network Categorizes an issue or PR as relevant to SIG Network.
Projects
None yet
8 participants