Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend internal GCP Shoot cluster capabilities #56

Open
dkistner opened this issue Mar 27, 2020 · 13 comments
Open

Extend internal GCP Shoot cluster capabilities #56

dkistner opened this issue Mar 27, 2020 · 13 comments
Assignees
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage)

Comments

@dkistner
Copy link
Member

What would you like to be added:
The Gardener GCP provider allow already to pass configuration for services of type LoadBalancer which are backed by internal load balancers. The user can specify in the infra config via .internal a cidr range within the vpc which should be used to pick an ip addresses for the internal load balancer service.
The extension will create a subnet in the vpc with the .internal cidr.

I propose to extend this approach to allow users to specify an existing subnet, which can be as well used to pick ip addresses for internal load balancer services.
In addition it could also make sense to deploy Shoot cluster only with internal load balancer services as there could be scenarios which require that for security/isolation reasons (those scenarios would of course require that the control plane hosting seed can access these enviroments).

The InfrastructureConfig could look like this:

apiVersion: gcp.provider.extensions.gardener.cloud/v1beta1
kind: InfrastructureConfig
networks:
  vpc:
    name: myvpc
  ...
internal:
  cidr: 10.251.0.0/16       
  subnet: subnet-name-in-myvpc   
  internalOnly: true|false
  localAccess: true|false
...

Either .internal.cidr or .internal.subnet can be specified.
The .internal.internalOnly flag specify that all load balancer services in the cluster need to be internal ones (including vpn-shoot). That can be enforces and/or validated via webhooks.
The .internal.localAccess flag allow to specify could be used to limit the access to the internal load balancers only within vpc.

The following annotation on the services need to be set:

networking.gke.io/load-balancer-type: Internal
networking.gke.io/internal-load-balancer-allow-global-access: true      # enforced false when `.internal.localAccess == true` via webhook
networking.gke.io/internal-load-balancer-subnet: subnet-name-in-myvpc.  # subnet-name enforced when `.internal.subnet` is set via webhook

The annotation networking.gke.io/internal-load-balancer-subnet is currently available as alpha feature.
To enable it on the cloud provider config passed to the GCP cloud-controller-manager need to contain alpha-features="ILBCustomSubnet".

Why is this needed:
There are scenarios where user need to create upfront a vpc with a subnet inside which is routable in other context e.g. internal networks etc.

cc @DockToFuture

@dkistner dkistner added the kind/enhancement Enhancement, improvement, extension label Mar 27, 2020
@rfranzke
Copy link
Member

Thanks @dkistner. A few suggestions:

I propose to extend this approach to allow users to specify an existing subnet, which can be as well used to pick ip addresses for internal load balancer services.

  1. Sure, makes perfect sense. Shouldn't we then also allow the same for the worker subnets? Also, instead of .internal.subnet I would call it name:
internal:
  cidr: 10.251.0.0/16       
  name: subnet-name-in-myvpc

The .internal.internalOnly flag specify that all load balancer services in the cluster need to be internal ones (including vpn-shoot). That can be enforces and/or validated via webhooks.

  1. Why do we need such thing? Isn't this something that is in the end-users responsibility? If he likes to enforce that only internal LBs can be created then he can install this webhook on his own?

The .internal.localAccess flag allow to specify could be used to limit the access to the internal load balancers only within vpc.

  1. Same like 2., or?

@dkistner
Copy link
Member Author

For 2. and 3. does that not clash with the gardener-resource-manager as there would be a diff in the resource definition, which then will be reverted, again mutated and so on... I think on Gardener managed resources like e.g. the vpn-shoot service. If this wouldn't be the case then I'm also completely with you to let it in the users responsibility.

@rfranzke
Copy link
Member

The gardener-resource-manager applies its desired state every minute, it doesn't check for any actual state and then computes a diff - it simply applies. Hence, any webhook can freely mutate the object as desired. This also allows extensions to implement shoot webhooks.

@stiller-leser
Copy link

Hi @rfranzke,

our setup is the basis of what @dkistner described. Esssentially, I agree that this could certainly be put into the hand of the user. I think it is more a question of what benefit has Gardener from (not offering) it, compared to how much strain does it put on the user. In our team the Kubernetes knowledge is quite high, however we had to spend multiple weeks figuering out what is necessary and how the final steup should look (especially in regards to the .internal.localAccess flag, which translates to whether or not global access, i.e. cross zone traffic, is allowed on the LB).

Additionally, from a security point of view, it would give myself some peace of mind knowing that there is no way a user can create an external load balancer. Yes, I could implement it with OPA or pure webhooks, but I still feel it would be more straight forward to offer it from an infrastructure point.

@rfranzke
Copy link
Member

I think it is more a question of what benefit has Gardener from (not offering) it, compared to how much strain does it put on the user.

From my point of view this sounds more like an add-on on top of ANY existing cluster - there is nothing Gardener-specific in it. What do mean with "strain"? Deploying the webhook into the cluster?

In our team the Kubernetes knowledge is quite high, however we had to spend multiple weeks figuering out what is necessary and how the final steup should look (especially in regards to the .internal.localAccess flag, which translates to whether or not global access, i.e. cross zone traffic, is allowed on the LB).

Can you explain more, maybe I haven't understood it completely. From what I got from @dkistner's above description the only thing that needs to be done is registering a mutating webhook for Services, or? Yes, the alpha-feature in the CCM config must somehow be exposed, but there is already this which could be extended from our side.

Additionally, from a security point of view, it would give myself some peace of mind knowing that there is no way a user can create an external load balancer. Yes, I could implement it with OPA or pure webhooks, but I still feel it would be more straight forward to offer it from an infrastructure point.

We/Gardener wouldn't do it any differently. We would also just register a mutating webhook in the shoot.

I am not entirely against it, no worries, I'm just afraid that similar special use-cases reach us/our backlog. If the pull is high enough that's totally fine, but I've never heard of such a requirement from anybody else in the past years, so I guess it's very special to your setup.

@DockToFuture DockToFuture self-assigned this Mar 31, 2020
@zanetworker
Copy link
Contributor

zanetworker commented Mar 31, 2020

@dkistner how will we solve the API-server to VPN connectivity issue, does this mean we will have to provision a dedicated seed in that same network and have every shoot with internalOnly scheduled on that seed? I guess this issue maybe makes sense to realize when we invert the VPN connection direction?

@rfranzke
Copy link
Member

I think yes, at least as long we haven’t inverted the direction of the VPN. Are there any plans to work on this?

@zanetworker
Copy link
Contributor

zanetworker commented Mar 31, 2020

this as in the inversion of the VPN or this feature?

If you mean this features, I would agree with what you said, if there is enough hustle on it, we can start working on it.

If you mean the VPN inversion, yes, but not in the very near future.

@rfranzke
Copy link
Member

No no, I meant the VPN inversion. Alright, thanks.
Regarding the feature, we discussed with @dkistner and @stiller-leser and decided that it makes sense to implement it in the GCP extension as there are things that cannot be influenced from the end-user (e.g., the initial creation of a public LB for the VPN shoot). @DockToFuture offered to start looking into it next week or the week after.

@dkistner
Copy link
Member Author

dkistner commented Apr 6, 2020

@dkistner how will we solve the API-server to VPN connectivity issue, does this mean we will have to provision a dedicated seed in that same network and have every shoot with internalOnly scheduled on that seed? I guess this issue maybe makes sense to realize when we invert the VPN connection direction?

Yes that is needed until the vpn connection is not initialised by the Shoot.

How about structuring the internal section this way?

apiVersion: gcp.provider.extensions.gardener.cloud/v1beta1
kind: InfrastructureConfig
...
internal:
  cidr: 10.251.0.0/16            # cidr of the subnet for the internal lb
  name: subnet-name-in-myvpc.    # name of the subnet for the internal lb
  internalAccess:  
    enabled: true|false             # only internal load balancer services are allowed in the cluster
    denyExternalAccess: false|true  # internal load balancer services deny external traffic from outside of vpc/region

In case the cluster should not be explicitly internal only we can simply omit the .internal.internalAccess section.

@zanetworker
Copy link
Contributor

zanetworker commented Apr 6, 2020

@dkistner but we need some means to identify that this shoot should be an internal shoot and thus scheduled to the internal seed, or?

@stiller-leser
Copy link

Hi,

wouldn't it make sense to name the internalAccess.enabled a little more explicitly? Something along the lines of internalAccess.allowOnlyInternalLoadBalancers? Also, do we need the cidr part, if the name of the subnet is given?

@dkistner
Copy link
Member Author

dkistner commented Apr 6, 2020

@dkistner but we need some means to identify that this shoot should be an internal shoot and thus scheduled to the internal seed, or?

What could be an indicator for that? I think currently the user need to know which Seed in networking wise reachable and need to pin it via .spec.seedName.

wouldn't it make sense to name the internalAccess.enabled a little more explicitly? Something along the lines of internalAccess.allowOnlyInternalLoadBalancers? Also, do we need the cidr part, if the name of the subnet is given?

I'm ok with more explicit names. A user can either specify .name or .cidr. In case the user specify a cidr then a subnet will be created by the gardener-extension (this is what we already have today). Otherwise it will just use the specified existing subnet.

@gardener-robot gardener-robot added the lifecycle/stale Nobody worked on this for 6 months (will further age) label Jun 6, 2020
@gardener-robot gardener-robot added lifecycle/rotten Nobody worked on this for 12 months (final aging stage) and removed lifecycle/stale Nobody worked on this for 6 months (will further age) labels Sep 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement Enhancement, improvement, extension lifecycle/rotten Nobody worked on this for 12 months (final aging stage)
Projects
None yet
Development

No branches or pull requests

6 participants