Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 3453 iptables performance alpha feature docs #37649

Closed
wants to merge 1 commit into from

Conversation

danwinship
Copy link
Contributor

@danwinship danwinship commented Nov 1, 2022

WIP for kubernetes/enhancements#3453

TL;DR: it makes iptables kube-proxy a lot faster in big clusters. But it's alpha because it needs more testing (and if it turns out to have bugs, it may seriously break your cluster, so don't test it out in production without testing it out in a dev/testing environment first).

But I'm not sure to what extent we document alpha features?

@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Nov 1, 2022
@danwinship danwinship marked this pull request as draft November 1, 2022 14:44
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 1, 2022
@netlify
Copy link

netlify bot commented Nov 1, 2022

Pull request preview available for checking

Built without sensitive environment variables

Name Link
🔨 Latest commit 5fc166e
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-main-staging/deploys/638e148c1e5fee0008502a0f
😎 Deploy Preview https://deploy-preview-37649--kubernetes-io-main-staging.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

@k8s-triage-robot
Copy link

Unknown CLA label state. Rechecking for CLA labels.

Send feedback to sig-contributor-experience at kubernetes/community.

/check-cla
/easycla

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Nov 1, 2022
@sftim
Copy link
Contributor

sftim commented Nov 1, 2022

I'm not sure to what extent we document alpha features?

The level of documentation we ask for for alpha is “some”. For beta, we want a good attempt at production ready docs; a few snags are OK.

@sftim
Copy link
Contributor

sftim commented Nov 1, 2022

If we want people to try things out, we should document them well!

@reylejano
Copy link
Member

Before setting this PR ready for review, please target the dev-1.26 branch

@krol3
Copy link
Contributor

krol3 commented Nov 18, 2022

Hi @danwinship! The deadline for the PR ready for review was this Tuesday 15th November 2022. Please review it.
cc: @reylejano @cathchu @leonardpahlke

@danwinship
Copy link
Contributor Author

I am not sure where this would be documented... I can't find any other discussion of kube-proxy feature gates (or really even kube-proxy options, other than in the autogenerated "man page").

The only place I can find that really talks about the kube-proxy iptables mode specifically is content/en/docs/concepts/services-networking/service.md, but it only barely mentions it, and it doesn't seem like a comment about an alpha feature gate would really fit there.

I'm OK if this doesn't get documented in this release; we don't have any feedback on it yet from testing anyway, so all we could really say is "it's supposed to make things faster but we don't know how much faster and oh by the way it's possible that it totally breaks things".

@leonardpahlke
Copy link
Member

Thanks, @danwinship, for taking a look. Since this looks like mostly internal K8s work, I'm ok with dropping the documentation if sig-docs and sig-network are ok with it too (with a +1) @reylejano @sftim & @thockin, @aojea. If not (sig-docs / sig-network is -1) and the documentation is still needed, perhaps @reylejano / @sftim know a good place to add the documentation?

Thanks all

@sftim
Copy link
Contributor

sftim commented Nov 22, 2022

I am not sure where this would be documented... I can't find any other discussion of kube-proxy feature gates (or really even kube-proxy options, other than in the autogenerated "man page").

#36675 aims to create that new home (and I think is worth merging, although I haven't looked recently for feedback).

@krol3
Copy link
Contributor

krol3 commented Nov 28, 2022

Hi @danwinship! This PR needs a doc review by Mon Nov 28th to get this into the release. Please reach out to required SIGs to get their review. Thank you!

@sftim
Copy link
Contributor

sftim commented Nov 28, 2022

@danwinship would you be willing to make a PR based on the work in #36675 to update v1.26 accordingly? You'll need to rebase if / when that lands.

Sorry we can't give you a better picture of how the base branch will look.

@sftim
Copy link
Contributor

sftim commented Nov 29, 2022

FYI @danwinship, #36675 just merged.

@leonardpahlke
Copy link
Member

@danwinship ^

The 1.26 release is in less than a week so I would love to get this documented first

@k8s-ci-robot k8s-ci-robot added needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. area/blog Issues or PRs related to the Kubernetes Blog subproject area/release-eng Issues or PRs related to the Release Engineering subproject language/en Issues or PRs related to English language size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/release Categorizes an issue or PR as relevant to SIG Release. and removed size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. labels Dec 5, 2022
@danwinship danwinship changed the base branch from main to dev-1.26 December 5, 2022 15:56
@k8s-ci-robot k8s-ci-robot removed the size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. label Dec 5, 2022
@k8s-ci-robot k8s-ci-robot removed area/blog Issues or PRs related to the Kubernetes Blog subproject area/release-eng Issues or PRs related to the Release Engineering subproject sig/release Categorizes an issue or PR as relevant to SIG Release. needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. labels Dec 5, 2022
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I recommend adding this into https://k8s.io/docs/reference/networking/virtual-ips/ once the main branch has merged into dev-1.26

If we miss the release, we can instead add these changes into the live documentation post release.

@krol3
Copy link
Contributor

krol3 commented Dec 5, 2022

I recommend adding this into https://k8s.io/docs/reference/networking/virtual-ips/ once the main branch has merged into dev-1.26

If we miss the release, we can instead add these changes into the live documentation post release.

Thank you! @sftim I created this issue to track your suggestion. #38286

@palnabarun
Copy link
Member

@danwinship @sftim -- So, if I understand correctly, we are not merging the docs changes for this before the release?

@sftim
Copy link
Contributor

sftim commented Dec 5, 2022

#38242 is about to land. If you rebase on upstream's dev-1.26 then we should be able to get this in the release.

If not, it's not a huge problem because this repo is continously deployed: update the PR to target main, go through code review, and we can get an update out, pronto.

@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 5, 2022
@netlify
Copy link

netlify bot commented Dec 6, 2022

👷 Deploy Preview for kubernetes-io-vnext-staging processing.

Name Link
🔨 Latest commit e117286
🔍 Latest deploy log https://app.netlify.com/sites/kubernetes-io-vnext-staging/deploys/638f4facf35f6d0008350c31

@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Dec 6, 2022
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign onlydole for approval by writing /assign @onlydole in a comment. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@mickeyboxell
Copy link
Contributor

To add to what was said earlier in the thread, I think this content should be updated as a fast follow after the release. For example, we should update this line to be more direct: "(and should probably not be enabled in production clusters yet)." Given that #38286 is open to do so, this looks good to me.
/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 6, 2022
Copy link
Contributor

@sftim sftim left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @danwinship

What might be nice is a blog article about the 3 modes (iptables, ipvs and kernelspace), and the best way to tune each of them for performance.

We should mention this within the docs too - all features and all feature gates need some documentation - but right now this isn't ready to merge. I've explained more inline.

Comment on lines +181 to +189
The latest version of kube-proxy has some performance improvements in
this code, but they are not enabled by default (and should probably
not be enabled in production clusters yet). To try them out, enable
the `MinimizeIPTablesRestore` [feature
gate](/docs/reference/command-line-tools-reference/feature-gates/) for
kube-proxy with `--feature-gates=MinimizeIPTablesRestore=true,…`. If
you were previously overriding the `--iptables-min-sync-period`
option, you should also try removing that override and letting it use
the default value ("`1s`").
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not keen on this. It'll date quickly (“the latest version” will be wrong as soon as v1.27 is out) and is framed as the change more than documenting the behavior.

The heading should describe the optimization (eg: Optimization for iptables-restore) and let the reader infer that:

  • we saw that as an improvement
  • it applies to iptables mode

If the feature gate is really experimental, let's mention that right in the heading, but set a manual fragment identifier so that links to this still work for the beta. I can provide an example once we know what the heading text is going to be.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is framed as the change more than documenting the behavior

Right, because KEP 3453 is just a change, not a new feature. Once it goes to beta, there's really nothing to say about it any more.

(Yes, if we enable the feature gate by default in 1.28 and it breaks some clusters, then those users should file bugs and disable the feature gate. But if that happened, there is no reason they would end up finding this particular section of the documentation; the warning would need to be in the release notes, not here. And then once the feature goes to GA, there's definitely nothing to document here, since at that point, even if there are still bugs, they're just bugs in kube-proxy, not bugs in a particular feature.)

This is an unusual KEP, and it was not even originally supposed to have been a KEP. I had already implemented the change, and then Tim suggested that we put it through the KEP process, mostly because he wanted to make sure that we had thought about all of the things that the KEP template forces you to think about. (And also so we'd have a feature gate to make it opt-in/out-able.)


I guess rather than documenting KEP 3453, we could add an "Optimizing the iptables mode" section, which would document the current behavior of --iptables-sync-period and --iptables-min-sync-period, and then mention the new feature gate, and how that changes the recommendations around those options. But when the feature goes GA, the section would mostly become a stub and just say "if you used to be overriding these options, then you can probably stop".

@mickeyboxell
Copy link
Contributor

@sftim @reylejano given that tomorrow Dec-8 is the 1.26 release day, what typically happens when there is a remaining open docs PR like this one? cc @krol3

@sftim
Copy link
Contributor

sftim commented Dec 7, 2022

I'm not sure we have a good precedent for this. The overall feeling with reviews here seems to be that we should aim to document this after the v1.26 release, ideally quite soon.

The approach here isn't quite what we want for docs: I try hard to pick headings (and especially fragment identifiers) that will stay valid even after the feature graduates to stable. So, to me, this is a rare case where the docs are better off - at least temporarily - without the change landing.

@danwinship the main thing I want to get right is the heading. People may well bookmark this new section as soon as we publish it, and I'd like those people to be able to see those bookmarks keep working in the next few releases. I hope you'll be willing to revise this in light of https://github.com/kubernetes/website/pull/37649/files#r1041229029 and then get the docs change live soon after v1.26 is out. You should be able to rebase against main once the release has happened, too.

@krol3
Copy link
Contributor

krol3 commented Dec 8, 2022

@sftim @reylejano given that tomorrow Dec-8 is the 1.26 release day, what typically happens when there is a remaining open docs PR like this one? cc @krol3

@mickeyboxell as we discuss in the kubernetes sync of today. This doc pr will be after the release to have time to review it. We have an issue to track it.

@sftim
Copy link
Contributor

sftim commented Dec 8, 2022

I guess rather than documenting KEP 3453, we could add an "Optimizing the iptables mode" section, which would document the current behavior of --iptables-sync-period and --iptables-min-sync-period, and then mention the new feature gate, and how that changes the recommendations around those options. But when the feature goes GA, the section would mostly become a stub and just say "if you used to be overriding these options, then you can probably stop".

Yes - please do that. We don't mind dropping documentation sections when code improvements mean you need to no longer need to do a thing, and we have a process for helping readers work out what's happened there.

@sftim
Copy link
Contributor

sftim commented Dec 8, 2022

We might also backport the new section into the v1.25 docs.

@sftim
Copy link
Contributor

sftim commented Dec 10, 2022

Hi. This now needs to target main

/lgtm cancel

OK to re-LGTM this once the base branch is changed (and the reviewer is happy with the new commits!)

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 10, 2022
@tengqm tengqm deleted the branch kubernetes:dev-1.26 December 10, 2022 03:41
@tengqm tengqm closed this Dec 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. language/en Issues or PRs related to English language sig/docs Categorizes an issue or PR as relevant to SIG Docs. sig/network Categorizes an issue or PR as relevant to SIG Network. size/S Denotes a PR that changes 10-29 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants