-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
KEP 3453 iptables performance alpha feature docs #37649
Conversation
✅ Pull request preview available for checkingBuilt without sensitive environment variables
To edit notification comments on pull requests, go to your Netlify site settings. |
Unknown CLA label state. Rechecking for CLA labels. Send feedback to sig-contributor-experience at kubernetes/community. /check-cla |
The level of documentation we ask for for alpha is “some”. For beta, we want a good attempt at production ready docs; a few snags are OK. |
If we want people to try things out, we should document them well! |
Before setting this PR ready for review, please target the |
Hi @danwinship! The deadline for the PR ready for review was this Tuesday 15th November 2022. Please review it. |
I am not sure where this would be documented... I can't find any other discussion of kube-proxy feature gates (or really even kube-proxy options, other than in the autogenerated "man page"). The only place I can find that really talks about the kube-proxy iptables mode specifically is content/en/docs/concepts/services-networking/service.md, but it only barely mentions it, and it doesn't seem like a comment about an alpha feature gate would really fit there. I'm OK if this doesn't get documented in this release; we don't have any feedback on it yet from testing anyway, so all we could really say is "it's supposed to make things faster but we don't know how much faster and oh by the way it's possible that it totally breaks things". |
Thanks, @danwinship, for taking a look. Since this looks like mostly internal K8s work, I'm ok with dropping the documentation if sig-docs and sig-network are ok with it too (with a Thanks all |
#36675 aims to create that new home (and I think is worth merging, although I haven't looked recently for feedback). |
Hi @danwinship! This PR needs a doc review by Mon Nov 28th to get this into the release. Please reach out to required SIGs to get their review. Thank you! |
@danwinship would you be willing to make a PR based on the work in #36675 to update v1.26 accordingly? You'll need to rebase if / when that lands. Sorry we can't give you a better picture of how the base branch will look. |
FYI @danwinship, #36675 just merged. |
The 1.26 release is in less than a week so I would love to get this documented first |
a6e0d1d
to
5fc166e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I recommend adding this into https://k8s.io/docs/reference/networking/virtual-ips/ once the main branch has merged into dev-1.26
If we miss the release, we can instead add these changes into the live documentation post release.
Thank you! @sftim I created this issue to track your suggestion. #38286 |
@danwinship @sftim -- So, if I understand correctly, we are not merging the docs changes for this before the release? |
#38242 is about to land. If you rebase on upstream's dev-1.26 then we should be able to get this in the release. If not, it's not a huge problem because this repo is continously deployed: update the PR to target main, go through code review, and we can get an update out, pronto. |
5fc166e
to
e117286
Compare
👷 Deploy Preview for kubernetes-io-vnext-staging processing.
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
To add to what was said earlier in the thread, I think this content should be updated as a fast follow after the release. For example, we should update this line to be more direct: "(and should probably not be enabled in production clusters yet)." Given that #38286 is open to do so, this looks good to me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @danwinship
What might be nice is a blog article about the 3 modes (iptables
, ipvs
and kernelspace
), and the best way to tune each of them for performance.
We should mention this within the docs too - all features and all feature gates need some documentation - but right now this isn't ready to merge. I've explained more inline.
The latest version of kube-proxy has some performance improvements in | ||
this code, but they are not enabled by default (and should probably | ||
not be enabled in production clusters yet). To try them out, enable | ||
the `MinimizeIPTablesRestore` [feature | ||
gate](/docs/reference/command-line-tools-reference/feature-gates/) for | ||
kube-proxy with `--feature-gates=MinimizeIPTablesRestore=true,…`. If | ||
you were previously overriding the `--iptables-min-sync-period` | ||
option, you should also try removing that override and letting it use | ||
the default value ("`1s`"). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not keen on this. It'll date quickly (“the latest version” will be wrong as soon as v1.27 is out) and is framed as the change more than documenting the behavior.
The heading should describe the optimization (eg: Optimization for iptables-restore
) and let the reader infer that:
- we saw that as an improvement
- it applies to iptables mode
If the feature gate is really experimental, let's mention that right in the heading, but set a manual fragment identifier so that links to this still work for the beta. I can provide an example once we know what the heading text is going to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is framed as the change more than documenting the behavior
Right, because KEP 3453 is just a change, not a new feature. Once it goes to beta, there's really nothing to say about it any more.
(Yes, if we enable the feature gate by default in 1.28 and it breaks some clusters, then those users should file bugs and disable the feature gate. But if that happened, there is no reason they would end up finding this particular section of the documentation; the warning would need to be in the release notes, not here. And then once the feature goes to GA, there's definitely nothing to document here, since at that point, even if there are still bugs, they're just bugs in kube-proxy, not bugs in a particular feature.)
This is an unusual KEP, and it was not even originally supposed to have been a KEP. I had already implemented the change, and then Tim suggested that we put it through the KEP process, mostly because he wanted to make sure that we had thought about all of the things that the KEP template forces you to think about. (And also so we'd have a feature gate to make it opt-in/out-able.)
I guess rather than documenting KEP 3453, we could add an "Optimizing the iptables mode" section, which would document the current behavior of --iptables-sync-period
and --iptables-min-sync-period
, and then mention the new feature gate, and how that changes the recommendations around those options. But when the feature goes GA, the section would mostly become a stub and just say "if you used to be overriding these options, then you can probably stop".
@sftim @reylejano given that tomorrow Dec-8 is the 1.26 release day, what typically happens when there is a remaining open docs PR like this one? cc @krol3 |
I'm not sure we have a good precedent for this. The overall feeling with reviews here seems to be that we should aim to document this after the v1.26 release, ideally quite soon. The approach here isn't quite what we want for docs: I try hard to pick headings (and especially fragment identifiers) that will stay valid even after the feature graduates to stable. So, to me, this is a rare case where the docs are better off - at least temporarily - without the change landing. @danwinship the main thing I want to get right is the heading. People may well bookmark this new section as soon as we publish it, and I'd like those people to be able to see those bookmarks keep working in the next few releases. I hope you'll be willing to revise this in light of https://github.com/kubernetes/website/pull/37649/files#r1041229029 and then get the docs change live soon after v1.26 is out. You should be able to rebase against main once the release has happened, too. |
@mickeyboxell as we discuss in the kubernetes sync of today. This doc pr will be after the release to have time to review it. We have an issue to track it. |
Yes - please do that. We don't mind dropping documentation sections when code improvements mean you need to no longer need to do a thing, and we have a process for helping readers work out what's happened there. |
We might also backport the new section into the v1.25 docs. |
Hi. This now needs to target main /lgtm cancel OK to re-LGTM this once the base branch is changed (and the reviewer is happy with the new commits!) |
WIPfor kubernetes/enhancements#3453TL;DR: it makes iptables kube-proxy a lot faster in big clusters. But it's alpha because it needs more testing (and if it turns out to have bugs, it may seriously break your cluster, so don't test it out in production without testing it out in a dev/testing environment first).
But I'm not sure to what extent we document alpha features?