Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

change source of ipmasq rule from ipn to ip #279

Merged
merged 2 commits into from
May 10, 2019

Conversation

mars1024
Copy link
Member

@mars1024 mars1024 commented Mar 19, 2019

ptp plugin : change source of ipmasq rule from ipn to ip

The old ipmasq rule's source is the whole subnet, this will cause all pods on the host will always hit the first masq chain. I think this may be confused.

The POSTROUTING chain of table nat is as follows,

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2216  142K KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
 1867  118K CNI-3115be0572d3543649291916  all  --  *      *       172.17.0.64/26       0.0.0.0/0            /* name: "test" id: "a09b2aa6ab03415ee44b4065045437250d03939597514e608701bfd9045fce4c" */
    0     0 CNI-832daee03b4f87f14a3870e7  all  --  *      *       172.17.0.64/26       0.0.0.0/0            /* name: "test" id: "10776f6e4370471836304f7d97c5e4156fe7185f1d252a7a6101a6c96a5d9222" */
    0     0 CNI-724649eec0f592edcec021e8  all  --  *      *       172.17.0.64/26       0.0.0.0/0            /* name: "test" id: "e6ed6b3fd7d4ce69c68d59c5946a8859b794d65606a3dc39dc9d10957c9363ac" */
    0     0 CNI-ce5582b709da19001144c36c  all  --  *      *       172.17.0.64/26       0.0.0.0/0            /* name: "test" id: "3ab516ecae85ca5ca275f9a6b40d613571a7914566410f3febf3661b545baf1e" */

We can see that only the first masq chain is hit, and the other chains have never been hit so their pkts and bytes are 0. So I change the source of ipmasq rule from subnet to ip to make each pod to go through its own masq CHAIN.

@mars1024
Copy link
Member Author

@rosenhouse @squeed @dcbw PTAL, thx ~

@mars1024
Copy link
Member Author

ping @squeed @rosenhouse @dcbw , thx ~

@mars1024
Copy link
Member Author

Sorry to disturb, if you have time, please take a look and help reviews, thanks! @squeed @rosenhouse @dcbw

@squeed
Copy link
Member

squeed commented Apr 24, 2019

oh neat, sorry for missing this
lgtm

@bboreham
Copy link
Contributor

Can you post the equivalent iptables chain after your change? I'm having trouble understanding the exact nature of the code change.

@squeed
Copy link
Member

squeed commented Apr 24, 2019

Hey, I tried this and it errored out:

$ sudo CNI_PATH=./bin ../cni/bin/cnitool del myptp /var/run/netns/testing 
running [/sbin/iptables -t nat -X CNI-99ed75a8861adf76cad607a0 --wait]: exit status 1: iptables: Too many links.

@mars1024 mars1024 closed this Apr 25, 2019
@mars1024 mars1024 reopened this Apr 25, 2019
@mars1024
Copy link
Member Author

Can you post the equivalent iptables chain after your change? I'm having trouble understanding the exact nature of the code change.

Sure, here is an example,

Chain POSTROUTING (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination         
 2216  142K KUBE-POSTROUTING  all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* kubernetes postrouting rules */
 1867  118K CNI-3115be0572d3543649291916  all  --  *      *       172.17.0.65       0.0.0.0/0            /* name: "test" id: "a09b2aa6ab03415ee44b4065045437250d03939597514e608701bfd9045fce4c" */
 123   456K CNI-832daee03b4f87f14a3870e7  all  --  *      *       172.17.0.66       0.0.0.0/0            /* name: "test" id: "10776f6e4370471836304f7d97c5e4156fe7185f1d252a7a6101a6c96a5d9222" */
 234   345K CNI-724649eec0f592edcec021e8  all  --  *      *       172.17.0.67       0.0.0.0/0            /* name: "test" id: "e6ed6b3fd7d4ce69c68d59c5946a8859b794d65606a3dc39dc9d10957c9363ac" */
 10     20K CNI-ce5582b709da19001144c36c  all  --  *      *       172.17.0.68       0.0.0.0/0            /* name: "test" id: "3ab516ecae85ca5ca275f9a6b40d613571a7914566410f3febf3661b545baf1e" */

Just as above, you can see the ip-masquerade rule of each pod will use its own IP as source but not the whole subnet, this will make network packets from echo pod will only go through its own rule and chain but not the first ip-masquerade rule in POSTROUTING chain.
Based on this, we can see the accurate bytes and packets which need to be masqueraded from each pod.
Moreover, if subnet changes, we can also make sure that the origin rules have no conflict with new ones.

@mars1024
Copy link
Member Author

Hey, I tried this and it errored out:

$ sudo CNI_PATH=./bin ../cni/bin/cnitool del myptp /var/run/netns/testing 
running [/sbin/iptables -t nat -X CNI-99ed75a8861adf76cad607a0 --wait]: exit status 1: iptables: Too many links.

I think this maybe because you add with old ptp plugin (-s subnet) and delete with new ptp plugin (-s ip). In this case, the masquerade chain can not be deleted because the rule of POSTROUTING is still using it.
I'll handle the downward compatibility.

@mars1024 mars1024 force-pushed the bugfix/ipmasq_source branch from c1ebd27 to 7efec9e Compare April 25, 2019 07:59
@mars1024
Copy link
Member Author

Updated! PTAL, thanks ~ @squeed @bboreham

@mars1024
Copy link
Member Author

I think CI has been broken by a bug on go-iptables, the IsNotExist function doesn't seem to be taking ip6tables into account.
@squeed If you think it's indeed a bug, i'll pull another request on go-iptables to fix it before approving this.

@squeed
Copy link
Member

squeed commented Apr 26, 2019

Oh, interesting. Could you file a PR (and test) for go-iptables, and I can merge it and do a release (I'm a maintainer there, too).

@mars1024
Copy link
Member Author

Oh, interesting. Could you file a PR (and test) for go-iptables, and I can merge it and do a release (I'm a maintainer there, too).

Here is the PR for fixing CI broken, looking forward to your comments and reviews ~ @squeed

@mars1024
Copy link
Member Author

It seems to work, thanks for your help! @squeed

@bboreham
Copy link
Contributor

I'm still not getting how this change works. I see the code changes from printing ipn.String() to ipn.IP.String(), and in your example the output changes from 172.17.0.64/26 to 172.17.0.66.

I tried to recreate this, and it doesn't do what your example shows: https://play.golang.org/p/zm1Ko5PVIjX

What am I missing?

@mars1024
Copy link
Member Author

mars1024 commented Apr 27, 2019

I'm still not getting how this change works. I see the code changes from printing ipn.String() to ipn.IP.String(), and in your example the output changes from 172.17.0.64/26 to 172.17.0.66.

I tried to recreate this, and it doesn't do what your example shows: https://play.golang.org/p/zm1Ko5PVIjX

What am I missing?

Actually, 172.17.0.66/26 is not very accurate as a CIDR, it's a special struct to store both IP and its mask. And iptables will figure out the true CIDR from it, for example, if we call iptables with 172.17.0.66/26, iptables will recognize it as 172.17.0.64/26.

Here is my test,

$sudo iptables -t filter -A FORWARD -s 172.17.0.66/26 -j ACCEPT

$sudo iptables -t filter -nvL FORWARD
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      *       172.17.0.64/26       0.0.0.0/0

$sudo iptables -t filter -A FORWARD -s 172.17.0.66 -j ACCEPT

$sudo iptables -t filter -nvL FORWARD
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
 pkts bytes target     prot opt in     out     source               destination
    0     0 ACCEPT     all  --  *      *       172.17.0.64/26       0.0.0.0/0
    0     0 ACCEPT     all  --  *      *       172.17.0.66          0.0.0.0/0

@bboreham
Copy link
Contributor

Oh, the thing I was missing is that iptables -L doesn't print out in the same way the rules go in. Thanks.

I generally use iptables-save which doesn't have this issue.

@mars1024
Copy link
Member Author

Yes, iptables-save only show the raw input, but iptables -L can show the rules which is truly working.

Copy link
Contributor

@bboreham bboreham left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ltgm

@mars1024
Copy link
Member Author

mars1024 commented May 3, 2019

hi, @squeed , can you help to merge this?

@mars1024
Copy link
Member Author

mars1024 commented May 9, 2019

I think this PR has been approved for a while, is there something else to do before merging this? Can you do some help for me? @bboreham @squeed

@squeed
Copy link
Member

squeed commented May 10, 2019

lgtm

@squeed squeed merged commit 0950a36 into containernetworking:master May 10, 2019
@mars1024 mars1024 deleted the bugfix/ipmasq_source branch May 10, 2019 13:06
@nickethier
Copy link
Contributor

Hey @mars1024 this PR broke the bridge plugin for me. It looks like the bridge plugin wraps the ipnet in a call to ip.Network which masks out the IP

if err = ip.SetupIPMasq(ip.Network(&ipc.Address), chain, comment); err != nil {

This results in the POSTROUTING rule to jump to the CNI-* chain to always use the masked IP as the source. So for example, if I'm given 172.16.0.2/24 as my container IP, I will end up with a rule like such: -A POSTROUTING -s 172.16.0.0/32 -m comment ... -j CNI-abc123.

@nickethier
Copy link
Contributor

I created a ginkgo spec to reproduce it here: nickethier@7f9b184

The failure looks like: https://dl.nick.sh/screenshots/2019-05-24-23-25-56_1858x375.png

I was able to get that test to pass with the this diff. It doesn't break any of the other tests and works for me when I build and use it. I'm not sure of the reasoning though of why the original author added it so I'm hesitate to call it a fix 😅

diff --git a/plugins/main/bridge/bridge.go b/plugins/main/bridge/bridge.go
index 675ffdd..fb78a5f 100644
--- a/plugins/main/bridge/bridge.go
+++ b/plugins/main/bridge/bridge.go
@@ -517,7 +517,7 @@ func cmdAdd(args *skel.CmdArgs) error {
 			chain := utils.FormatChainName(n.Name, args.ContainerID)
 			comment := utils.FormatComment(n.Name, args.ContainerID)
 			for _, ipc := range result.IPs {
-				if err = ip.SetupIPMasq(ip.Network(&ipc.Address), chain, comment); err != nil {
+				if err = ip.SetupIPMasq(&ipc.Address, chain, comment); err != nil {
 					return err
 				}
 			}

@mars1024
Copy link
Member Author

Thanks for your comments, @nickethier , I've done a local test as below

package main

import (
	"fmt"
	"net"
)

// Network masks off the host portion of the IP
func Network(ipn *net.IPNet) *net.IPNet {
	return &net.IPNet{
		IP:   ipn.IP.Mask(ipn.Mask),
		Mask: ipn.Mask,
	}
}

func main() {
	a := &net.IPNet{
		IP:   []byte{192,168,0,100},
		Mask: []byte{255,255,255,0},
	}
	fmt.Println(a.IP.String(), a.String())
	b := Network(a)
	fmt.Println(b.IP.String(), b.String())
}

The results is

192.168.0.100 192.168.0.100/24
192.168.0.0 192.168.0.0/24

So I think the function Network is used to make a canonical CIDR for iptables, but it lose some information about the origin IP in IPNet at the same time. I think the reason for designing this function is the worry of iptables can only handle canonical CIDR like 192.168.0.0/24, but the truth is iptables can handle CIDR like 192.168.0.100/24 as I mentioned above #279 (comment) .

So I think your fix is right, just pull it ~

tlwr added a commit to tlwr/cni-plugins that referenced this pull request Dec 31, 2020
Closes containernetworking#544

The above issue describes a situation where using the bridge plugin
with IPv6 addresses prevented `DEL` from working correctly.

`DEL` seems to be failing in the body of `TeardownIPMasq`

This arises because:

* twice delete postrouting rules: `ipn.String()` `ipn.IP.String()` containernetworking#279
* we are using a version of go-iptables which is bugged for v6

PR github.com/coreos/go-iptables/pull/74 describes why this does
not work. The error message is not being checked correctly.

Using a later version of go-iptables means that
* when the second `ipt.Delete` fails (this is okay)
* we will correctly interpret this as an non-fatal error
* `TeardownIPMasq` will not prematurely exit the method
* `ipt.ClearChain` now can run
* `ipt.DeleteChain` now can run

This explains why this was working for v4 but not v6

Signed-off-by: toby lorne <toby@toby.codes>
tlwr added a commit to tlwr/cni-plugins that referenced this pull request Jan 5, 2021
Closes containernetworking#544

The above issue describes a situation where using the bridge plugin
with IPv6 addresses prevented `DEL` from working correctly.

`DEL` seems to be failing in the body of `TeardownIPMasq`

This arises because:

* twice delete postrouting rules: `ipn.String()` `ipn.IP.String()` containernetworking#279
* we are using a version of go-iptables which is bugged for v6

PR github.com/coreos/go-iptables/pull/74 describes why this does
not work. The error message is not being checked correctly.

Using a later version of go-iptables means that
* when the second `ipt.Delete` fails (this is okay)
* we will correctly interpret this as an non-fatal error
* `TeardownIPMasq` will not prematurely exit the method
* `ipt.ClearChain` now can run
* `ipt.DeleteChain` now can run

This explains why this was working for v4 but not v6

This commit was amended to include v0.5.0 instead of a pseudo-version
v0.4.6-0.20200318170312-12696f5c9108

Signed-off-by: toby lorne <toby@toby.codes>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants