Skip to content
This repository has been archived by the owner on Jun 20, 2024. It is now read-only.

Fix incorrect xfrm policy being deleted by ipsec.Destroy(). #3669

Merged
merged 1 commit into from
Sep 2, 2019
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 13 additions & 3 deletions net/ipsec/ipsec.go
Original file line number Diff line number Diff line change
Expand Up @@ -223,9 +223,19 @@ func (ipsec *IPSec) Destroy(localPeer, remotePeer mesh.PeerName, connUID uint64,

if outSPIInfo, ok := ipsec.spiInfo[outSPIID]; ok {
ipsec.log.Infof("ipsec: destroy: out %s -> %s 0x%x", localIP, remoteIP, outSPIInfo.spi)

if err := netlink.XfrmPolicyDel(xfrmPolicy(localIP, remoteIP, outSPIInfo.spi)); err != nil {
ipsec.log.Warnf("ipsec: xfrm policy del (%s, %s, 0x%x) failed: %s", localIP, remoteIP, outSPIInfo.spi, err)
policy, err := netlink.XfrmPolicyGet(xfrmPolicy(localIP, remoteIP, outSPIInfo.spi))
if err != nil {
ipsec.log.Warnf("ipsec: xfrm policy get (%s, %s, 0x%x) failed: %s", localIP, remoteIP, outSPIInfo.spi, err)
} else {
if len(policy.Tmpls) == 1 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be right to loop through the policy templates and pick the matching one?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was unsure about that. It looks like weave only creates policies with one template but a range would look a little neater.

Copy link
Contributor

@murali-reddy murali-reddy Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see comment #3666 (comment) From my understanding of the logs shared in #3666 it ended up creating two connections and creating corresponding two policies. Eventually one connections is deleted (connection shutting down due to error: Multiple connections to 56:ff:d7:91:ac:16(192.168.128.10) added to 02:df:bc:77:e3:17(192.168.128.11)) which triggered deleting xfrm policy which as side effect was deleting the unintended policy as well.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see weave does a policy update instead of creating a new policy so there are never two policies for a given src/dst pair. Trying to create two such policies on the terminal fails for me:

[rancher@hostyname0 ~]$ sudo ip xfrm policy list
src 192.168.128.10/32 dst 192.168.128.11/32 proto udp 
        dir out priority 0 ptype main 
        mark 0x20000/0x20000
        tmpl src 192.168.128.10 dst 192.168.128.11
                proto esp spi 0xd3e754be reqid 0 mode transport
[rancher@hostyname0 ~]$ sudo ip xfrm policy add src 192.168.128.10/32 dst 192.168.128.11/32 proto udp dir out mark 0x20000         ptype main         tmpl src 192.168.128.10 dst 192.168.128.11         proto esp spi 0x1234 reqid 0 mode transport
RTNETLINK answers: File exists
[rancher@hostyname0 ~]$ sudo ip xfrm policy update src 192.168.128.10/32 dst 192.168.128.11/32 proto udp dir out mark 0x20000         ptype main         tmpl src 192.168.128.10 dst 192.168.128.11         proto esp spi 0x1234 reqid 0 mode transport
[rancher@hostyname0 ~]$ sudo ip xfrm policy list
src 192.168.128.10/32 dst 192.168.128.11/32 proto udp 
        dir out priority 0 ptype main 
        mark 0x20000/0xffffffff
        tmpl src 192.168.128.10 dst 192.168.128.11
                proto esp spi 0x00001234 reqid 0 mode transport

As far as I can tell XfrmPolicyDel and XfrmPolicyGet can only delete/get a single policy at a time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added some debug statements to make sure of this.

	// Destroy outbound
	policies, err := netlink.XfrmPolicyList(syscall.AF_INET)
	if err != nil {
		return errors.Wrap(err, "xfrm policy list")
	}
	for _, p := range policies {
		ipsec.log.Warnf("ipsec: %#v", p)
	}

	if outSPIInfo, ok := ipsec.spiInfo[outSPIID]; ok {
		ipsec.log.Infof("ipsec: destroy: out %s -> %s 0x%x", localIP, remoteIP, outSPIInfo.spi)
		policy, err := netlink.XfrmPolicyGet(xfrmPolicy(localIP, remoteIP, outSPIInfo.spi))
		if err != nil {
			ipsec.log.Warnf("ipsec: xfrm policy get (%s, %s, 0x%x) failed: %s", localIP, remoteIP, outSPIInfo.spi, err)
		} else {
			if len(policy.Tmpls) == 1 {
				if policy.Tmpls[0].Spi == int(outSPIInfo.spi) {
					if err := netlink.XfrmPolicyDel(xfrmPolicy(localIP, remoteIP, outSPIInfo.spi)); err != nil {
						ipsec.log.Warnf("ipsec: xfrm policy del (%s, %s, 0x%x) failed: %s", localIP, remoteIP, outSPIInfo.spi, err)
					}
				} else {
					ipsec.log.Debugf("ipsec: xfrm not my policy (%s, %s, 0x%x) got 0x%x ", localIP, remoteIP, outSPIInfo.spi, policy.Tmpls[0].Spi)
				}
			}
		}

INFO: 2019/07/30 15:36:25.657140 Setting up IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/30 15:36:25.657207 ipsec: InitSALocal: 192.168.128.10 -> 192.168.128.11 :6784 0x50c4c7a3
[192.168.128.10:6783|56:ff:d7:91:ac:16(192.168.128.10)]: connection shutting down due to error: Multiple connections to 56:ff:d7:91:ac:16(192.168.128.10) added to 02:df:bc:77:e3:17(192.168.128.11)
INFO: 2019/07/30 15:36:25.668534 Destroying IPsec between 02:df:bc:77:e3:17(192.168.128.11) and 56:ff:d7:91:ac:16(192.168.128.10)
INFO: 2019/07/30 15:36:25.684302 fastdp ->[192.168.128.10:6784|56:ff:d7:91:ac:16(192.168.128.10)]: IPSec init SA remote
INFO: 2019/07/30 15:36:25.684318 ipsec: InitSARemote: 192.168.128.11 -> 192.168.128.10 :6784 0x2bc7201
INFO: 2019/07/30 15:36:25.684446 ipsec: destroy: in 192.168.128.10 -> 192.168.128.11 0x189a56b9
WARN: 2019/07/30 15:36:25.693006 ipsec: netlink.XfrmPolicy{Dst:(*net.IPNet)(0xc4217e5e90), Src:(*net.IPNet)(0xc4217e5ec0), Proto:0x11, DstPort:0, SrcPort:0, Dir:0x1, Priority:0, Index:14825, Mark:(*netlink.XfrmMark)(0xc4219e9878), Tmpls:[]netlink.XfrmPolicyTmpl{netlink.XfrmPolicyTmpl{Dst:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xc0, 0xa8, 0x80, 0xa}, Src:net.IP{0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0xff, 0xff, 0xc0, 0xa8, 0x80, 0xb}, Proto:0x32, Mode:0x0, Spi:45904385, Reqid:0}}}
INFO: 2019/07/30 15:36:25.693033 ipsec: destroy: out 192.168.128.11 -> 192.168.128.10 0x4a6e24ae
DEBU: 2019/07/30 15:36:25.693074 ipsec: xfrm not my policy (192.168.128.11, 192.168.128.10, 0x4a6e24ae) got 0x2bc7201 

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can see weave does a policy update instead of creating a new policy so there are never two policies for a given src/dst pair.

Ok. I checked the code. You are right. Its just update.

From what I see root cause is connection shutting down due to error: Multiple connections to 56:ff:d7:91:ac:16(192.168.128.10) added to 02:df:bc:77:e3:17(192.168.128.11) so there are two connections gets established between peers. Resulting in two xfrm policies attempted to created/updated. Eventually one version of policy (corresponding to a connection) overwrites the other. When redundant connection is shutdown corresponding xfrm policy is getting destroyed.

It seems to me its still possible we may end up with no xfrm policy. This can happen if we have xfrm policy in place corresponds to connection that is getting shutdown, in which case policy gets deleted due to matching spi.

Will check in case of multiple connection which of the connection (or order) is dropped.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like XfrmPolicyUpdate calls xfrmPolicyAddOrUpdate internally so I doubt we could end up without a policy. I'm pretty sure the case where the first connection is completely shut down when the second connection is created was the working case before my changes.

if policy.Tmpls[0].Spi == int(outSPIInfo.spi) {
if err := netlink.XfrmPolicyDel(xfrmPolicy(localIP, remoteIP, outSPIInfo.spi)); err != nil {
ipsec.log.Warnf("ipsec: xfrm policy del (%s, %s, 0x%x) failed: %s", localIP, remoteIP, outSPIInfo.spi, err)
}
} else {
ipsec.log.Debugf("ipsec: xfrm not my policy (%s, %s, 0x%x) got 0x%x ", localIP, remoteIP, outSPIInfo.spi, policy.Tmpls[0].Spi)
}
}
}

outSA := &netlink.XfrmState{
Expand Down