-
Notifications
You must be signed in to change notification settings - Fork 465
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Swap temporary IPSets during ipset restore #1068
Conversation
879bae1
to
f114df9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One small nit, otherwise everything looks good here. I'll try to try it out in one of our clusters today.
LGTM as well. Aaron, as you plan to test I will skip testing part. thanks @bnu0 for the PR |
Further troubleshooting shows that the increase is only because of how many more If you remove or at least greatly reduce the number of I would recommend that we only create 1 TMP ipset per ipset type and then re-use them throughout the restore, and only destroy them at the end. |
ipSetRestore.WriteString(fmt.Sprintf("create %s %s\n", tmpSetName, setOptions)) | ||
tmpSets[setOptions] = tmpSetName | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we have a flush here as well or just do it here? as else part if the set exists? Just to ensure we start clean. For e.g. to avoide cases like kube-router starting after crash which may have left stale entries.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, makes sense
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed in e731d80
thanks @bnu0 for the PR and promot follow up. LGTM |
* ipset restore: use temporary sets and swap them into the real ones * move const * switch to shared tmp ipsets * preemptively flush tmp set in case it already existed
@bnu0 Thanks for the help! I double-checked this version again after your patch and the performance improvement holds. I back-ported your fix to the v1.2 branch and this is now part of the bug fix release v1.2.2 that was just made. |
The problem.
There is a bit of a bug in kube-router, which only seems to exist since 1.2.x — since the implementation was changed to use
iptables-restore
and friends. Essentially, the generated IPSets are observably empty/incomplete for a brief moment between theflush
and the completion of all theadd
s provided toipset restore
. This causes inbound or outbound connection failures for ingress/egress networkpolicies, respectively.After some discussion in slack, here's a PR.
A Solution?
This PR uses an approach discussed on the netfilter mailing list to avoid this problem by creating new sets and swapping them into place.
base32(sha256(...))
) from the target's name.An example:
Example
If you have pods with type=ingress NetworkPolicy, for example, then
connection refused
error.POD-FW
chain,NWPLCY
chain,-j REJECT
-ed.Reproduction
You can reproduce this fairly easily by launching a bunch of short-lived pods which have a netpol acting on them, while simultaneously making network requests in a loop to some long-lived pod that also has a netpol (same or different one) acting on it. The short-lived pods only exist to repeatedly trigger the synchronization loop in kube-router, which will momentarily remove all IPSet members from the long-lived pod's ipsets. Your loop will sometimes catch it in this state and get a
connection refused
error. If you are monitoringnflog:100
, you will see it there too.