Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Should k0s enable hairpin-mode by default for kube-router? #1953

Closed
twz123 opened this issue Jul 28, 2022 · 13 comments · Fixed by #2417
Closed

Should k0s enable hairpin-mode by default for kube-router? #1953

twz123 opened this issue Jul 28, 2022 · 13 comments · Fixed by #2417
Assignees
Labels
area/network component/kube-router enhancement New feature or request question Further information is requested

Comments

@twz123
Copy link
Member

twz123 commented Jul 28, 2022

Is your feature request related to a problem? Please describe.

When using kube-router, it is not possible for a pod that is part of a service to reach out to its service IP. This setup is called hairpinning and is supported in kube-router via some config flag which defaults to off. The docs say

Communication from a Pod that is behind a Service to its own ClusterIP:Port is not supported by default.

I suspect this is quite surprising for users, as they will probably assume that hairpinning just works.

Describe the solution you would like

There's #1937 which is about to add support for this config option to k0s. Would it be a better UX if k0s would enable hairpinning by default?

It's not entirely clear to me what this entails. There's probably a reason why kube-router has chosen to disable it by default. The doc string on the --hairpin-mode CLI flag says

Add iptables rules for every Service Endpoint to support hairpin traffic.

So this sounds costly, as managing iptables rules will slow doẃn significantly as soon as the number of rules reaches a certain size. This might start to matter on larger setups, with a lot of Services and NetworkPolicies.

Describe alternatives you've considered

Don't change the default. Folks can simply enable it if they need it, either on a service-level via annotations (i.e. kubectl annotate service my-service "kube-router.io/service.hairpin="), or cluster-wide via the new knobs introduced in #1937.

In my experiments, as soon as I add a second Pod to a Service, I am able to connect to the ClusterIP, although connection times vary and feel sluggish (think: seconds instead of milliseconds), so the actual impact of hairpinning being disabled is apparently even weirder.

In any case, this needs some investigation and IMO it's in order to write some docs about this, so that k0s users are aware.

Additional context

No response

@twz123 twz123 added question Further information is requested area/network component/kube-router labels Jul 28, 2022
@hercherf
Copy link
Contributor

Actually, there a two knobs involved:

  • CNI-Config (hairpinMode: true)
  • kube-router parameter (--hairpin-mode=true)

Enabling hairpinning via annotation (kube-router.io/service.hairpin=) per service will only work if the CNI-Config allows hairpinning. I tried to keep it simple and didn't differentiate - maybe we should allow three states:

  1. disable hairpinning globally
  2. allow hairpinning, but don't enable it globally - let the user do this via annotation per service
  3. enable hairpinning globally

@github-actions
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Aug 28, 2022
@twz123 twz123 removed the Stale label Aug 29, 2022
@github-actions
Copy link
Contributor

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Sep 28, 2022
@twz123 twz123 removed the Stale label Sep 29, 2022
@mtb-xt
Copy link

mtb-xt commented Oct 8, 2022

Just encountered this and was surprised. In my humble opinion, please enable hairpin by default in kube-router with a disclaimer? For small setups, you quite often might have only 1 replica of something (linstor-cs controller in my case) which tries to talk to itself via Service Cluster IP - and that fails without hairpin mode. A solution is to spin up more replicas, but again - for a small, homelab setup, its overkill and resource consumption by iptables is not such a big deal, so maybe for default setups hairpin should be on by default, and if people require tuning - they can disable it or switch to other CNI plugin?

@github-actions
Copy link
Contributor

github-actions bot commented Nov 7, 2022

The issue is marked as stale since no activity has been recorded in 30 days

@github-actions github-actions bot added the Stale label Nov 7, 2022
@jnummelin jnummelin added enhancement New feature or request and removed Stale labels Nov 8, 2022
@juanluisvaladas juanluisvaladas self-assigned this Nov 14, 2022
@juanluisvaladas
Copy link
Contributor

juanluisvaladas commented Nov 14, 2022

I would enable it by default myself. Now the question is that given the k0s' choice of not being very opinionated if it makes sense to enable it by default when kube-router disables it.

Impact on enabling it by default if someone needs to disable it later.

The impact on small clusters is negligible and having it disabled by default isn’t expected at all. In fact I was surprised to find out this as well.
The impact on large clusters is unknown, but the consequences are a big deal. If someone needs to disable it, it’s more serious than having it disabled by default and needing to enable it. Not having this working by default is annoying, but if your applications rely on it, the administrator disables it and all of a sudden random applications are broken the impact is just massive.

If a big cluster has it, it will be very hard for administrators to know what applications are using hairpining. No one will have a list of services that need it and they will need to trace their applications to find out. This process is not fast or easy. Another option that they have is to remove it and wait for random stuff to break but that’s not really suitable for a real production of a real business.

So at this point I think it’s reasonable to say:
1- Removing it from someone who has it configured is a bigger deal than removing it from someone who doesn’t
2- Only large clusters with a lot of endpoint changes will need this removed.

Performance impact

I doubt that this has a big performance impact, I couldn’t find many references in their issues to the tables lock being held, I also see that kube-router uses iptables in a way which is pretty performant and smart in networkpolicy but hairpining is done in a very naive way. This makes me think that they never had issues with this section of the code. I also see the hairpining code is very old, this makes me think performance is not very important here because it doesn’t have a big impact to begin with.

What others are doing

Also I asked in the kube-router slack channel and I if I don't get a reply today I'll open an issue asking for opinions of the developers.

@juanluisvaladas
Copy link
Contributor

I was told in the kube-router slack by one of the main contributors:

We run large-ish clusters and we use global hairpin FWIW. It does increase the number of rules that get written into the NAT table, but this shouldn’t be overburdensome.
Also note that these global rules will only apply to ClusterIP and not ExternalIP or LoadBalancerIP

I think it's fairly safe to add this by default, now it's a philosophical on whether we want to do it or not, but I say enabling it is a safer default than not enabling it.

@juanluisvaladas
Copy link
Contributor

@twz123 @jnummelin any comments on this?

@twz123
Copy link
Member Author

twz123 commented Nov 16, 2022

I'm fine with enabling this. Concerning the migration path from enabling to disabling it: I think that it's not that of a big deal, as, IIUK, the annotations can also be used to opt-out of hairpinning, so cluster admins can scan for Service objects without that annotation and ask the owners of those to evaluate if they need hairpinning or not, and add the annotation. So they can gradually enforce a deliberate decision on a per-service basis.

In order for k0s to support this, k0s probably needs to properly distinguish the two knobs mentioned in #1953 (comment).

@juanluisvaladas
Copy link
Contributor

In order for k0s to support this, k0s probably needs to properly distinguish the two knobs mentioned in #1953 (comment).

I´m 99% certain that this is the case. If I understand corretly the CNI option sets the bridge mode to hairpin 1 2 3. So we need to set this option to allow hairpin in any of its ways (either allowing it explicly with the annotation or to enable it by default. I guess a good option name would be AllowHairpin.

The flag creates the iptables rules to make hairpin enabled to every service by default. I would call it GlobalHairpin. Of course GlobalHairpin would require AllowHairpin

@hercherf
Copy link
Contributor

In my opinion it would be clearer to make it one setting with three values:

hairpinning:

  • enabled, which globally enables and activates hairpinning (kube-router and cni-config)
  • allowed, which just enables hairpinning in the cni-config and leaves it to the user to annotate services which require hairpinning
  • disabled

@juanluisvaladas
Copy link
Contributor

Yes, you are right, it makes more sense that way.

@twz123
Copy link
Member Author

twz123 commented Nov 16, 2022

Having a tri-state setting sounds good. We still need to support the old settings as a fallback for backwards compatibility, though.

juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Nov 16, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn´t reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Nov 16, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn´t reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Nov 16, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Nov 16, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Nov 18, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com>
Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Dec 14, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com>
Signed-off-by: Juan-Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Dec 14, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Dec 14, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Dec 15, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com>
Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
juanluisvaladas added a commit to juanluisvaladas/k0s that referenced this issue Dec 15, 2022
We agreed that the most logical decision was to enable hairpin by
default. The old configuration doesn't reflect our real needs, for this
reason we decided to deprecate the old configuration and add a new
configuration. This commit handles all these details.

Fixes k0sproject#1953

Co-authored-by: Tom Wieczorek <twz123@users.noreply.github.com>
Signed-off-by: Juan Luis de Sousa-Valadas Castaño <jvaladas@mirantis.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/network component/kube-router enhancement New feature or request question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants