Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GKE LoadBalancer doesn't work with service deployed by Skaffold #887

Closed
thesandlord opened this issue Aug 8, 2018 · 15 comments
Closed

GKE LoadBalancer doesn't work with service deployed by Skaffold #887

thesandlord opened this issue Aug 8, 2018 · 15 comments
Assignees
Labels
area/deploy area/labels deploy-health-check deploy-labeller help wanted We would love to have this done, but don't have the bandwidth, need help from contributors kind/bug Something isn't working priority/p1 High impact feature/bug.

Comments

@thesandlord
Copy link

I have a service with type: Loadbalancer that I deploy with Skaffold. The service creates fine, the load balancer shows as healthy on the GCP console, but when I do kubectl get svc the External IP address never gets resolved and is stuck in <pending>. Everything works if I deploy same service using kubectl apply.

I actually have this on video as well: https://youtu.be/JUFIF9QMN9M?t=1630

This has happened multiple times with multiple clusters, projects, and services. @ahmetb is experiencing the same issue as well.

Right now, I'm thinking there is something Skaffold does to the service (labels?) which is preventing the service from getting the external IP address.

Information

  • Skaffold version: v.0.11.0
  • Operating system: Linux
  • Contents of skaffold.yaml:

Service YAML

apiVersion: v1
kind: Service
metadata:
  name: uptimecheck
  labels:
    app: uptimecheck
spec:
  type: LoadBalancer
  ports:
  - port: 80
    targetPort: 3000
    protocol: TCP
    name: http
  selector:
    app: "uptimecheck"

Skaffold YAML

apiVersion: skaffold/v1alpha2
kind: Config
build:
  artifacts:
  - imageName: gcr.io/xxx/xxx
deploy:
  kubectl:
    manifests:
      - svc.yaml

Steps to reproduce the behavior

skaffold dev
@ahmetb
Copy link
Contributor

ahmetb commented Aug 8, 2018

I am seeing the same.

Unless I use static IP, Service type=LoadBalancer never gets an IP on vanilla GKE cluster:

  • if I go to Google Cloud Console, I see an IP for the LB
  • but the IP is actually not associated with the LB on Kubernetes API
  • overall, hitting the IP doesn't work even though it shows up on the UI

I know at least one more person who deployed the https://github.com/GoogleCloudPlatform/microservices-demo/ and reproed it. So that might be the easiest repro available in open source.

YAML:

apiVersion: v1
kind: Service
metadata:
  annotations:
    kubectl.kubernetes.io/last-applied-configuration: |
      {"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"frontend-external","namespace":"default"},"spec":{"ports":[{"name":"http","port":80,"targetPort":8080}],"selector":{"app":"frontend"},"type":"LoadBalancer"}}
  creationTimestamp: 2018-07-17T19:12:13Z
  labels:
    cleanup: "true"
    deployed-with: skaffold
    docker-api-version: "1.38"
    skaffold-builder: local
    skaffold-deployer: kubectl
    skaffold-tag-policy: git-commit
  name: frontend-external
  namespace: default
  resourceVersion: "4524845"
  selfLink: /api/v1/namespaces/default/services/frontend-external
  uid: 50092bab-89f5-11e8-a2bb-42010a80009c
spec:
  clusterIP: 10.19.250.58
  externalTrafficPolicy: Cluster
  ports:
  - name: http
    nodePort: 30751
    port: 80
    protocol: TCP
    targetPort: 8080
  selector:
    app: frontend
  sessionAffinity: None
  type: LoadBalancer
status:
  loadBalancer: {}

describe output:

Name:                     frontend-external
Namespace:                default
Labels:                   cleanup=true
                          deployed-with=skaffold
                          docker-api-version=1.38
                          skaffold-builder=local
                          skaffold-deployer=kubectl
                          skaffold-tag-policy=git-commit
Annotations:              kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"name":"frontend-external","namespace":"default"},"spec":{"ports":[{"name":"http","por...
Selector:                 app=frontend
Type:                     LoadBalancer
IP:                       10.19.250.58
Port:                     http  80/TCP
TargetPort:               8080/TCP
NodePort:                 http  30751/TCP
Endpoints:                10.16.2.99:8080
Session Affinity:         None
External Traffic Policy:  Cluster
Events:                   <none>

@balopat balopat added the kind/bug Something isn't working label Aug 8, 2018
@ahmetb
Copy link
Contributor

ahmetb commented Aug 17, 2018

I got another person repro this too.

@balopat balopat self-assigned this Aug 17, 2018
@ahmetb
Copy link
Contributor

ahmetb commented Aug 17, 2018

Progress debugging this: So if I do skaffold delete wait 5 mins (so underlying GCE networking resources deleted) and redeploy with skaffold run I can repro this 100%.

+Bonus: if I do kubectl get -o=yaml service/frontend-external | kubectl apply -f- which causes a "re-apply", then it gets the EXTERNAL-IP right away.

@balopat
Copy link
Contributor

balopat commented Aug 17, 2018

I confirm that this is an issue with how we do the labelling, we update the labels immediately after the service was deployed which then confuses the loadbalancer.

image

@ahmetb
Copy link
Contributor

ahmetb commented Aug 18, 2018

AWESOME! Thanks @balopat .

I was seeing the last-applied-configuration even on a clean skaffold run which got me thinking whether skaffold is applying things twice.

Then I thought "I guess this annotation just exists when you deploy things with kubectl-apply". I shouldn't have thought that. Well at least now we know what to fix. 🥇

@balopat
Copy link
Contributor

balopat commented Aug 22, 2018

an update: we are thinking about how to get around the labelling issue, some of the crappy alternatives that came up are:

  1. don't label services at all (works, but not ideal as it's inconsistent)
  2. label loadbalancer services only after external ip is assigned (there might be other issues preventing)
  3. label loadbalancer services after a certain timeout (e.g. 2 minutes is mostly good for GKE)
  4. maybe 2 with a timeout and then 3 combined?
  5. look again deeper into the design of labelling and rethink it (needs more time)

@ahmetb
Copy link
Contributor

ahmetb commented Aug 22, 2018

I think ideally this should be fixed in Kubernetes core. The service controller should not be easily confused and get stuck. If you have a reliable repro, please open an issue to kubernetes/kubernetes.

@balopat
Copy link
Contributor

balopat commented Aug 23, 2018

I don't think this is Kubernetes core specific, this looks like a GKE LoadBalancer specific issue. I will open an issue with them though.

@balopat
Copy link
Contributor

balopat commented Aug 23, 2018

repro is super easy:

export app=mysvc; kubectl run $app --image nginx && kubectl expose deployment/$app --port 80 --type LoadBalancer && kubectl edit svc/$app

add a label in the edit command and you'll get the same issue

@ahmetb
Copy link
Contributor

ahmetb commented Aug 23, 2018

Kubernetes core specific

Service controller (+cloudprovder support) is in Kubernetes core (https://github.com/kubernetes/kubernetes/tree/master/pkg/controller/service and https://github.com/kubernetes/kubernetes/blob/master/pkg/cloudprovider/providers/gce/gce_loadbalancer_external.go), therefore I recommend opening a GitHub issue. (:

@jrbaudin
Copy link

jrbaudin commented Sep 7, 2018

Just wanted to throw in a little +1 on this, experiencing the same issue

@ahmetb
Copy link
Contributor

ahmetb commented Sep 7, 2018

It's fixed in kubernetes/kubernetes#68087 it's currently not picked into any of the 1.12 releases.

Since this is in GKE master and GKE tends to pick up the new k8s versions through a long vetting process (i.e. today the default gke version is 1.9.7, and k8s just released 1.12.0-beta.1), it's unlikely that this will be fixed in GKE in the next 3 months.

It might be worth considering to patch this somehow in Skaffold for the short-term.

@balopat
Copy link
Contributor

balopat commented Jul 16, 2019

@tejal29 this will be solved by swithcing over to helm template as well if we reopen/rebase #2105

@balopat balopat added the priority/p1 High impact feature/bug. label Jul 16, 2019
@nkubala
Copy link
Contributor

nkubala commented Aug 15, 2019

I believe this is fixed with #2568 - I'm not able to reproduce this locally on the latest version (v0.34.0). @thesandlord @ahmetb @balopat could one of you test out and make sure it's working for you as well?

@balopat
Copy link
Contributor

balopat commented Aug 15, 2019

confirmed, this should work now!

@balopat balopat closed this as completed Aug 15, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/deploy area/labels deploy-health-check deploy-labeller help wanted We would love to have this done, but don't have the bandwidth, need help from contributors kind/bug Something isn't working priority/p1 High impact feature/bug.
Projects
None yet
Development

No branches or pull requests

6 participants