fix(network_policy): mask mark reset on FW marks #992
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
TL;DR;
This PR changes the way we manage marks in the NetworkPolicyController by applying a mask so that we only reset our FW mark as part of the FW chain and leave the other marks that may exist on the packet alone.
Background
Ever since kube-router v1.0.0-rc5 we've been resetting all marks on the packet whenever the pod has a network policy applied to it as part of the new MARK and RETURN flow which allows traffic to traverse all relevant network policies instead of being ACCEPT'ed after hitting the first match rule.
However, there are other systems in kube-router and in the CNI subsystems which also apply marks. When the packet is traversing a firewall chain that is applied by the NPC, it goes through like this (using FORWARD as an example here):
FORWARD
->KUBE-ROUTER-FORWARD
KUBE-ROUTER-FORWARD
-> if a pod has a network policy applied to it ->KUBE-POD-FW-<hash>
KUBE-POD-FW-<hash>
->KUBE-NWPLCY-<hash>
(there will be one of these for each individual policy applied)MARK
ed with0x10000/0x10000
and thenRETURN
ed to the parentKUBE-POD-FW-<hash>
chain4.5) If it doesn't match a rule, it will finish traversing the chain and will not receive a
MARK
and eventually end up back in the parentKUBE-POD-FW-<hash>
chain0x10000/0x10000
then it will be logged and rejected0x0
and the packet will be returned to the parentKUBE-ROUTER-FORWARD
chain so that it can finish processing and eventually be accepted if it makes it all the way through theKUBE-ROUTER-FORWARD
chain.Problem
The problem with this flow is how the mark is cleared. When it clears the mark with
0x0
at the end of theKUBE-POD-FW-<hash>
chain, it not only clears the mark it applied, but since it doesn't have a mask on it, it clears all marks which has the potential to disrupt other elements of iptables that have applied a mark (like kube-router's DSR functionality, hostPort CNI nat markings, etc).This change adds a mask to the mark so that we only unset the specific mark that we set in the FW chain and leave the other marks that may or may not exist on the packet intact.
Testing Procedure
I've tested this by watching the counters on a test host by mocking up some iptables chains:
This was done against a test VM that had no other rules on it:
Above we can see that we:
0x2000
mark representing the current CNI hostPort marking as well as the0x10000
mark representing the kube-router FW mark0x0
0x10000
0x2000
ACCEPT
at the endFrom the packet counters we can see that:
0x10000
and0x2000
were successfully applied and matched (as it the packet doesn't get returned)0x10000
0x2000
and is returned instead of passing through the end of the chain to ACCEPTIf we then change this to use the logic from this PR we do the same thing only this time we add a mask to the mark:
From the packet counters we can see that:
0x10000
and0x2000
were successfully applied and matched (as it the packet doesn't get returned)0x10000
0x2000
and is not returnedkube-router testing
I've also tested this against our kube-router cluster and ensured that hostPort and DSR work correctly after effecting the change.