Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proxying doesn't work for services with different port and targetPort #299

Closed
Lookyan opened this issue Mar 23, 2018 · 14 comments
Closed

Proxying doesn't work for services with different port and targetPort #299

Lookyan opened this issue Mar 23, 2018 · 14 comments
Labels
kind/bug Categorizes issue or PR as related to a bug.
Milestone

Comments

@Lookyan
Copy link
Contributor

Lookyan commented Mar 23, 2018

I found interesting problem. When we have kubernetes service like this:

apiVersion: v1
kind: Service
metadata:
  name: ...
  labels:
    service: ...
    app: service
spec:
  type: ClusterIP
  ports:
    - port: 8890
      targetPort: 8888
  selector:
    service: ...
    app: service

Then it doesn't work because, as far as I understand, envoy tries to proxy to the port 8890, but endpoint listens 8888 port. So, we get an answer from envoy, that this service doesn't have healthy endpoints.

@davecheney
Copy link
Contributor

I've had several goes at fixing this problem and I was hopeful I nailed it last time. Can you please grab a copy of /clusters (or run the contour cli cds k exec trick) and the service document from the api and I'll look into it.

Thanks

@davecheney
Copy link
Contributor

@davecheney davecheney added the kind/bug Categorizes issue or PR as related to a bug. label Mar 24, 2018
@davecheney davecheney added this to the 0.5.0 milestone Mar 24, 2018
@Lookyan
Copy link
Contributor Author

Lookyan commented Mar 24, 2018

Maybe I've mistaken about the reason of problem. Because eds gives me right endpoint (it works):

resources: <
  [type.googleapis.com/envoy.api.v2.ClusterLoadAssignment]: <
    cluster_name: "kube-system/kubernetes-dashboard"
    endpoints: <
      lb_endpoints: <
        endpoint: <
          address: <
            socket_address: <
              address: "10.7.40.134"
              port_value: 9090
            >
          >
        >
      >
    >
  >
>

Cluster:

resources: <
  [type.googleapis.com/envoy.api.v2.Cluster]: <
    name: "kube-system/kubernetes-dashboard/80"
    type: EDS
    eds_cluster_config: <
      eds_config: <
        api_config_source: <
          api_type: GRPC
          cluster_names: "contour"
        >
      >
      service_name: "kube-system/kubernetes-dashboard"
    >
    connect_timeout: <
      nanos: 250000000
    >
  >
>

And route:

    virtual_hosts: <
      name: "dashboard.host"
      domains: "dashboard.host"
      routes: <
        match: <
          prefix: "/"
        >
        route: <
          cluster: "kube-system/kubernetes-dashboard/80"
        >
      >
    >

describe svc

kubectl describe svc -n kube-system kubernetes-dashboard

Name:              kubernetes-dashboard
Namespace:         kube-system
Labels:            addonmanager.kubernetes.io/mode: Reconcile k8s-app: kubernetes-dashboard kubernetes.io/cluster-service: true
Annotations:       kubectl.kubernetes.io/last-applied-configuration={"apiVersion":"v1","kind":"Service","metadata":{"annotations":{},"labels":{"k8s-app":"kubernetes-dashboard"},"name":"kubernetes-dashboard","namespace":...
Selector:          k8s-app=kubernetes-dashboard
Type:              ClusterIP
IP:                10.100.126.135
Port:              <unset>  80/TCP
TargetPort:        9090/TCP
Endpoints:         10.7.40.134 <unset>, 9090, TCP

But eventually I get this from envoy in its logs:

source/common/upstream/cluster_manager_impl.cc:755] no healthy host for HTTP connection pool

@davecheney
Copy link
Contributor

davecheney commented Mar 25, 2018 via email

@Lookyan
Copy link
Contributor Author

Lookyan commented Mar 26, 2018

It talks http.

@davecheney
Copy link
Contributor

davecheney commented Mar 27, 2018

Can you port forward to the [envoy admin interface][0] and grab the relevant portion of the /cluster output. I'm interested to see if there are no endpoints registered, or if they are, but are not responding.

[0] https://github.com/heptio/contour/blob/master/docs/troubleshooting.md#access-the-envoy-admin-interface-remotely

@Lookyan
Copy link
Contributor Author

Lookyan commented Mar 27, 2018

This is /cluster grep dashboard:

kube-system/kubernetes-dashboard/80::default_priority::max_connections::1024
kube-system/kubernetes-dashboard/80::default_priority::max_pending_requests::1024
kube-system/kubernetes-dashboard/80::default_priority::max_requests::1024
kube-system/kubernetes-dashboard/80::default_priority::max_retries::3
kube-system/kubernetes-dashboard/80::high_priority::max_connections::1024
kube-system/kubernetes-dashboard/80::high_priority::max_pending_requests::1024
kube-system/kubernetes-dashboard/80::high_priority::max_requests::1024
kube-system/kubernetes-dashboard/80::high_priority::max_retries::3
kube-system/kubernetes-dashboard/80::added_via_api::true

And this is stats of dashboard with one failed request:

cluster.kube-system/kubernetes-dashboard/80.bind_errors: 0
cluster.kube-system/kubernetes-dashboard/80.internal.upstream_rq_503: 1
cluster.kube-system/kubernetes-dashboard/80.internal.upstream_rq_5xx: 1
cluster.kube-system/kubernetes-dashboard/80.lb_healthy_panic: 1
cluster.kube-system/kubernetes-dashboard/80.lb_local_cluster_not_ok: 0
cluster.kube-system/kubernetes-dashboard/80.lb_recalculate_zone_structures: 0
cluster.kube-system/kubernetes-dashboard/80.lb_subsets_active: 0
cluster.kube-system/kubernetes-dashboard/80.lb_subsets_created: 0
cluster.kube-system/kubernetes-dashboard/80.lb_subsets_fallback: 0
cluster.kube-system/kubernetes-dashboard/80.lb_subsets_removed: 0
cluster.kube-system/kubernetes-dashboard/80.lb_subsets_selected: 0
cluster.kube-system/kubernetes-dashboard/80.lb_zone_cluster_too_small: 0
cluster.kube-system/kubernetes-dashboard/80.lb_zone_no_capacity_left: 0
cluster.kube-system/kubernetes-dashboard/80.lb_zone_number_differs: 0
cluster.kube-system/kubernetes-dashboard/80.lb_zone_routing_all_directly: 0
cluster.kube-system/kubernetes-dashboard/80.lb_zone_routing_cross_zone: 0
cluster.kube-system/kubernetes-dashboard/80.lb_zone_routing_sampled: 0
cluster.kube-system/kubernetes-dashboard/80.max_host_weight: 0
cluster.kube-system/kubernetes-dashboard/80.membership_change: 0
cluster.kube-system/kubernetes-dashboard/80.membership_healthy: 0
cluster.kube-system/kubernetes-dashboard/80.membership_total: 0
cluster.kube-system/kubernetes-dashboard/80.retry_or_shadow_abandoned: 0
cluster.kube-system/kubernetes-dashboard/80.update_attempt: 1
cluster.kube-system/kubernetes-dashboard/80.update_empty: 0
cluster.kube-system/kubernetes-dashboard/80.update_failure: 0
cluster.kube-system/kubernetes-dashboard/80.update_rejected: 0
cluster.kube-system/kubernetes-dashboard/80.update_success: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_active: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_close_notify: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_connect_attempts_exceeded: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_connect_fail: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_connect_timeout: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_destroy: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_destroy_local: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_destroy_local_with_active_rq: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_destroy_remote: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_destroy_remote_with_active_rq: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_destroy_with_active_rq: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_http1_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_http2_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_max_requests: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_none_healthy: 1
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_overflow: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_protocol_error: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_rx_bytes_buffered: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_rx_bytes_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_tx_bytes_buffered: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_cx_tx_bytes_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_flow_control_backed_up_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_flow_control_drained_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_flow_control_paused_reading_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_flow_control_resumed_reading_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_503: 1
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_5xx: 1
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_active: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_cancelled: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_maintenance_mode: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_pending_active: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_pending_failure_eject: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_pending_overflow: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_pending_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_per_try_timeout: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_retry: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_retry_overflow: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_retry_success: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_rx_reset: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_timeout: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_total: 0
cluster.kube-system/kubernetes-dashboard/80.upstream_rq_tx_reset: 0
cluster.kube-system/kubernetes-dashboard/80.version: 0

And envoy logs with trace type at the moment of request:

[2018-03-27 07:58:20.056][53598][debug][router] source/common/router/router.cc:250] [C18][S10501861201508902460] cluster 'kube-system/kubernetes-dashboard/80' match for URL '/'
[2018-03-27 07:58:20.056][53598][debug][upstream] source/common/upstream/cluster_manager_impl.cc:755] no healthy host for HTTP connection pool
[2018-03-27 07:58:20.056][53598][debug][http] source/common/http/conn_manager_impl.cc:935] [C18][S10501861201508902460] encoding headers via codec (end_stream=false):
[2018-03-27 07:58:20.056][53598][debug][http] source/common/http/conn_manager_impl.cc:940] [C18][S10501861201508902460]   ':status':'503'
[2018-03-27 07:58:20.056][53598][debug][http] source/common/http/conn_manager_impl.cc:940] [C18][S10501861201508902460]   'content-length':'19'
[2018-03-27 07:58:20.056][53598][debug][http] source/common/http/conn_manager_impl.cc:940] [C18][S10501861201508902460]   'content-type':'text/plain'
[2018-03-27 07:58:20.056][53598][debug][http] source/common/http/conn_manager_impl.cc:940] [C18][S10501861201508902460]   'date':'Tue, 27 Mar 2018 04:58:19 GMT'
[2018-03-27 07:58:20.056][53598][debug][http] source/common/http/conn_manager_impl.cc:940] [C18][S10501861201508902460]   'server':'envoy'
[2018-03-27 07:58:20.056][53598][trace][connection] source/common/network/connection_impl.cc:323] [C18] writing 134 bytes, end_stream false
[2018-03-27 07:58:20.056][53598][trace][http] source/common/http/conn_manager_impl.cc:999] [C18][S10501861201508902460] encoding data via codec (size=19 end_stream=true)

@davecheney
Copy link
Contributor

davecheney commented Mar 27, 2018 via email

@Lookyan
Copy link
Contributor Author

Lookyan commented Mar 27, 2018

How can I debug it and how can I understand what's wrong with EDS communication? Only using tcpdump and see what's going there?

@davecheney
Copy link
Contributor

davecheney commented Mar 27, 2018 via email

@Lookyan
Copy link
Contributor Author

Lookyan commented Mar 28, 2018

It seems like it's related with #291
Because when I had decreased the number of clusters in envoy, the same kube configuration started to work as expected.

@davecheney
Copy link
Contributor

davecheney commented Mar 28, 2018

Can you please set up port forwarding to the contour admin api 0 then paste the results on this command

curl http://127.0.0.1:9001/clusters | grep ^contour

@davecheney
Copy link
Contributor

Thanks to Alexander Lukyanchenko (@Lookyan) we have increased the general gRPC limits on both the Envoy client and Contour server well above anything that should be an issue for the immediate future.

The symptoms of hitting gRPC limits vary, but are basically "envoy doesn't see changes in the API server until I restart ". The underlying cause is likely to be that you have a large (more than 100, possibly 200, the exact limit is not precisely known) number of Service objects in your cluster -- these don't have to be associated with an Ingress. Currently Contour creates a CDS Cluster record for any Service object it learns about through the API, see #298. Each CDS record will cause Envoy to open a new EDS stream, one per Cluster, which can blow through the default limits that Envoy, as the gRPC client, and Contour, as the gRPC server, have set.

One of the easiest ways to detect if this issue is occuring in your cluster is too look for lines about "cluster warming"

[2018-04-03 03:34:16.920][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:388] add/update cluster test2/reverent-noether/80 starting warming
[2018-04-03 03:34:16.922][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:388] add/update cluster test2/serene-bohr/80 starting warming
[2018-04-03 03:34:16.924][1][info][upstream] source/common/upstream/cluster_manager_impl.cc:388] add/update cluster test2/sleepy-hugle/80 starting warming

Without a matching "warming complete" message.

We believe we have addressed this issue, #291, and the fixes are available now to test.

These changes are in master now, and available in the gcr.io/heptio-images/contour:master image for you to try.

This has been backported the release-0.4 branch and are available in a short lived image gcr.io/heptio-images/contour:release-0.4 (it's not going to be deleted, but don't expect it to continue to be updated beyond the 0.4.1 release).

@davecheney
Copy link
Contributor

Not sure why github won't auto close this, but it's fixed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
None yet
Development

No branches or pull requests

2 participants