Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(network_policy): mask mark reset on FW marks #992

Merged
merged 1 commit into from
Sep 25, 2020

Conversation

aauren
Copy link
Collaborator

@aauren aauren commented Sep 25, 2020

TL;DR;

This PR changes the way we manage marks in the NetworkPolicyController by applying a mask so that we only reset our FW mark as part of the FW chain and leave the other marks that may exist on the packet alone.

Background

Ever since kube-router v1.0.0-rc5 we've been resetting all marks on the packet whenever the pod has a network policy applied to it as part of the new MARK and RETURN flow which allows traffic to traverse all relevant network policies instead of being ACCEPT'ed after hitting the first match rule.

However, there are other systems in kube-router and in the CNI subsystems which also apply marks. When the packet is traversing a firewall chain that is applied by the NPC, it goes through like this (using FORWARD as an example here):

  1. FORWARD -> KUBE-ROUTER-FORWARD
  2. KUBE-ROUTER-FORWARD -> if a pod has a network policy applied to it -> KUBE-POD-FW-<hash>
  3. KUBE-POD-FW-<hash> -> KUBE-NWPLCY-<hash> (there will be one of these for each individual policy applied)
  4. If the pod matches a rule it will be MARKed with 0x10000/0x10000 and then RETURNed to the parent KUBE-POD-FW-<hash> chain
    4.5) If it doesn't match a rule, it will finish traversing the chain and will not receive a MARK and eventually end up back in the parent KUBE-POD-FW-<hash> chain
  5. If the packet doesn't have the mark 0x10000/0x10000 then it will be logged and rejected
  6. If the packet does have the mark, then it will clear the mark with 0x0 and the packet will be returned to the parent KUBE-ROUTER-FORWARD chain so that it can finish processing and eventually be accepted if it makes it all the way through the KUBE-ROUTER-FORWARD chain.

Problem

The problem with this flow is how the mark is cleared. When it clears the mark with 0x0 at the end of the KUBE-POD-FW-<hash> chain, it not only clears the mark it applied, but since it doesn't have a mask on it, it clears all marks which has the potential to disrupt other elements of iptables that have applied a mark (like kube-router's DSR functionality, hostPort CNI nat markings, etc).

This change adds a mask to the mark so that we only unset the specific mark that we set in the FW chain and leave the other marks that may or may not exist on the packet intact.

Testing Procedure

I've tested this by watching the counters on a test host by mocking up some iptables chains:

This was done against a test VM that had no other rules on it:

# reproduce current kube-router state
$ iptables -N TEST_MARK_POLICY
iptables -A TEST_MARK_POLICY -m comment --comment "add hostport mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x2000/0x2000
iptables -A TEST_MARK_POLICY -m comment --comment "add kube-router FW mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x10000/0x10000
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without kube-router FW mark" -m mark ! --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "clear mark the way kube-router currently clears the mark" -j MARK --set-xmark 0x0/0xffffffff
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets with kube-router FW mark (to make sure mark no longer exists)" -m mark --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark (to make sure mark still exists)" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -j ACCEPT
iptables -I OUTPUT -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j TEST_MARK_POLICY

$ curl "10.10.2.102:9090"
{}

$ iptables -nvL TEST_MARK_POLICY
Chain TEST_MARK_POLICY (1 references)
 pkts bytes target     prot opt in     out     source               destination
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add hostport mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x2000
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add kube-router FW mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without kube-router FW mark */ mark match ! 0x10000/0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark */ mark match ! 0x2000/0x2000
    6   416 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* clear mark the way kube-router currently clears the mark */ MARK and 0x0
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets with kube-router FW mark (to make sure mark no longer exists) */ mark match 0x10000/0x10000
    6   416 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark (to make sure mark still exists) */ mark match ! 0x2000/0x2000
    0     0 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0

Above we can see that we:

  1. Applied both a 0x2000 mark representing the current CNI hostPort marking as well as the 0x10000 mark representing the kube-router FW mark
  2. Ensure that the packet matched both of the marks otherwise return it
  3. Clear the mark using the current kube-router logic 0x0
  4. Then return the traffic if it still matches the kube-router FW mark 0x10000
  5. Then return the traffic if it doesn't match the hostPort mark 0x2000
  6. Add a final catch all ACCEPT at the end

From the packet counters we can see that:

  • the traffic goes through and receives both marks
  • both 0x10000 and 0x2000 were successfully applied and matched (as it the packet doesn't get returned)
  • we can see that the mark was successfully cleared
  • that the mark no longer matches 0x10000
  • that the mark also no longer matches 0x2000 and is returned instead of passing through the end of the chain to ACCEPT

If we then change this to use the logic from this PR we do the same thing only this time we add a mask to the mark:

# Attempt to mask mark reset with new rules
$ iptables -F TEST_MARK_POLICY
iptables -A TEST_MARK_POLICY -m comment --comment "add hostport mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x2000/0x2000
iptables -A TEST_MARK_POLICY -m comment --comment "add kube-router FW mark for traffic going to host on port 9090" -d 10.10.2.102/32 -m tcp -p tcp --dport 9090 -j MARK --set-xmark 0x10000/0x10000
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without kube-router FW mark" -m mark ! --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "clear mark the way kube-router should clear the mark" -j MARK --set-xmark 0/0x10000
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets with kube-router FW mark (to make sure mark no longer exists)" -m mark --mark 0x10000/0x10000 -j RETURN
iptables -A TEST_MARK_POLICY -m comment --comment "rule to return packets without hostport mark (to make sure mark still exists)" -m mark ! --mark 0x2000/0x2000 -j RETURN
iptables -A TEST_MARK_POLICY -j ACCEPT

$ curl "10.10.2.102:9090"
{}

$ iptables -nvL TEST_MARK_POLICY
Chain TEST_MARK_POLICY (1 references)
 pkts bytes target     prot opt in     out     source               destination
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add hostport mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x2000
    6   416 MARK       tcp  --  *      *       0.0.0.0/0            10.10.2.102          /* add kube-router FW mark for traffic going to host on port 9090 */ tcp dpt:9090 MARK or 0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without kube-router FW mark */ mark match ! 0x10000/0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark */ mark match ! 0x2000/0x2000
    6   416 MARK       all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* clear mark the way kube-router should clear the mark */ MARK and 0xfffeffff
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets with kube-router FW mark (to make sure mark no longer exists) */ mark match 0x10000/0x10000
    0     0 RETURN     all  --  *      *       0.0.0.0/0            0.0.0.0/0            /* rule to return packets without hostport mark (to make sure mark still exists) */ mark match ! 0x2000/0x2000
    6   416 ACCEPT     all  --  *      *       0.0.0.0/0            0.0.0.0/0

From the packet counters we can see that:

  • the traffic goes through and receives both marks
  • both 0x10000 and 0x2000 were successfully applied and matched (as it the packet doesn't get returned)
  • we can see that the mark was successfully cleared
  • that the mark no longer matches 0x10000
  • that the mark now matches 0x2000 and is not returned
  • now the packet passes through the end of the chain to ACCEPT

kube-router testing

I've also tested this against our kube-router cluster and ensured that hostPort and DSR work correctly after effecting the change.

Don't resent all marks, only the mark that we originally set as part of
the firewall rules so that we don't affect other systems like hostPort
and other elements of the nat chain that may apply their own marks.
@murali-reddy
Copy link
Member

@aauren thanks for the PR and explanation. lgtm.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants