Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add optional firewall support #452

Closed
Dev25 opened this issue Mar 11, 2020 · 4 comments · Fixed by #470
Closed

Add optional firewall support #452

Dev25 opened this issue Mar 11, 2020 · 4 comments · Fixed by #470

Comments

@Dev25
Copy link
Contributor

Dev25 commented Mar 11, 2020

Consider adding a opt in method for this module to create common/useful firewalls related to the cluster. I tend to always have to create 2 firewall rules due to running in a slightly more restricted VPC or for certain workloads (Admission Controllers).

Explict Egress rule for cluster traffic.

GKE already creates a Ingress rule for the cluster and then relies on the fact there is a implicit allow-all-egress rule for VPCs by default. That assumption doesn't work when you are running locked down VPCs that create a default-deny-all egress rule (also w/ intranode visibility enabled)

resource "google_compute_firewall" "egress" {
  name      = "gke-${var.name}-allow-intra-cluster-egress"
  project   = var.network_project_id
  network   = var.network
  direction = "EGRESS"
  priority  = 2000
  target_tags = ["gke-${var.name}"]  
  destination_ranges = [
    # GKE Master - Only for Private clusters!
    var.master_ipv4_cidr_block,
    # GKE Node Subnet
    data.google_compute_subnetwork.gke_subnetwork.ip_cidr_range,
    # GKE Alias Ranges
    local.alias_ranges_cidr[var.ip_range_pods],
    local.alias_ranges_cidr[var.ip_range_services]
  ]

  allow { protocol = "tcp"}
  allow { protocol = "udp"}
  allow { protocol = "icmp"}
}

Allow non 443 ports for GKE Master -> nodes due to admission controllers

This is for when cluster workloads include webhooks/admission controllers that run on non 443 ports. Without this you will end up seeing timeouts and pods are blocked from being created.

kubernetes/kubernetes#79739

Examples: 9443 (Elasticsearch Operator elastic/cloud-on-k8s#896), 15017 (Istio 1.5+ istio/istio#19532)

# Allow GKE master to hit non 443 ports for Webhooks/Admission Controllers
# https://github.com/kubernetes/kubernetes/issues/79739
resource "google_compute_firewall" "master_webhooks" {
  name      = "gke-${var.name}-allow-master-webhooks"
  project   = var.network_project_id
  network   = var.network
  direction = "INGRESS"
  priority  = 2000

  allow {
    protocol = "tcp"
    ports    = var.webhook_ports
  }

  source_ranges = [var.master_ipv4_cidr_block] # Public Cluster: "${module.gke.endpoint}/32"
  target_tags   = ["gke-${var.name}"]
}
@morgante
Copy link
Contributor

Your second rule definitely sounds like a good addition (assuming we have variables to control it).

For your first rule, what will break if you don't have this rule? Will GKE itself break, or just some applications? I'm reticent to override firewall rules when many organizations have alternative/preferred egress paths.

@Dev25
Copy link
Contributor Author

Dev25 commented Mar 12, 2020

The TF examples above were more or less pulled out of my current usage, for upstreaming it most would be controlled by variables and opt in e.g.

firewall_create_intra_cluster_egress_rule = false # Create 1. if true
firewall_allow_ingress_webhook_ports = [] # Create 2. if non empty w/given ports
firewall_priority = 2000

Regarding 1, my scenario was with a VPC with a default-deny-all-egress + allowing common outbound ports (80/443), so its not a completely locked down VPC.

Spinning up a GKE cluster with enable_intranode_visibility will break the cluster. I've not yet tested without IV, but i suspect it will still break to a lesser degree for traffic between nodes rather then all pod traffic.

Specifically i was seeing kube-system workloads getting CrashLoopBackOff, DNS will break along with any pod <-> pod traffic.

e.g. DNS breaks since you can't hit UDP/53 to the kube-dns pods, the kube-dns pods themselves manage to work fine because VPC firewalls always allow access to the metadata server.

Example of getting I/O timeouts during DNS resolution

connection error: desc = "transport: Error while dialing dial tcp: lookup istio-galley.istio-system.svc on 10.250.0.10:53: read udp 10.240.0.45:47715->10.250.0.10:53: i/o timeout"

I did raise a support case last summer regarding the intranode/egress stuff, i don't think that got anywhere unfortunately. Ideally GKE itself creates a Egress rule for internal cluster traffic just like how it manages the Ingress rules.

This rule workaround is similar to those Ingress rules managed by GKE because it restricts the destination to the cluster CIDR + cluster specific network tags. In fact we could probably restrict it to just the pod + master CIDR blocks and it should work fine (no need for node/service ranges)

@morgante
Copy link
Contributor

morgante commented Mar 12, 2020

Got it, in that case I think both firewall rules make sense (provided they can be disabled). Happy to accept a PR.

@haf
Copy link

haf commented Mar 31, 2020

1000+ for this; kubernetes/kubernetes#79739 (comment) — this comment fairly accurately makes explicit what problem people (including me) run into any time a new cluster is provisoined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants