Linkerd stable-2.13.4 - linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused #11156

valentinwidmer · 2023-07-24T10:08:51Z

What is the issue?

I have installed Linkerd over Helm (1.12.5) on an EKS cluster and observing the errors listed below..

How can it be reproduced?

Installing Linkerd stable-2.13.4

Logs, error output, etc

linkerd-destination

Defaulted container "linkerd-proxy" out of: linkerd-proxy, destination, sp-validator, policy, linkerd-init (init)
[     0.002839s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.003837s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.003868s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.003872s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.003875s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.003878s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.003881s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.003885s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[     0.004410s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.017185s]  INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.110593s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.316984s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.722792s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     1.223601s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     1.724346s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     2.226078s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     2.727832s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     3.228568s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     3.729321s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     4.231132s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

Workload which has proxy injected

[333255.038731s]  INFO ThreadId(01) linkerd_stack::failfast: Service has recovered
[333256.099452s]  WARN ThreadId(01) outbound:proxy{addr=172.16.64.160:8081}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=10.129.110.112:8090}: linkerd_reconnect:Failed to connect error=endpoint 10.129.110.112:8090: connect timed out after 1s error.sources=[connect timed out after 1s]
[334074.421535s]  WARN ThreadId(01) outbound:proxy{addr=172.16.0.1:443}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=10.129.111.71:8090}: linkerd_reconnect: Service failed error=endpoint 10.129.111.71:8090: channel closed error.sources=[channel closed]
[334074.513021s]  WARN ThreadId(01) outbound:proxy{addr=172.16.12.201:6379}:balance{addr=argo-cd-redis-ha-haproxy.argocd.svc.cluster.local:6379}:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=10.129.111.71:8086}: linkerd_reconnect: Service failed error=endpoint 10.129.111.71:8086: channel closed error.sources=[channel closed]
[334074.530198s]  WARN ThreadId(01) outbound:proxy{addr=172.16.0.1:443}:controller{addr=linkerd-policy.linkerd.svc.cluster.local:8090}:endpoint{addr=10.129.111.71:8090}: linkerd_reconnect: Failed to connect error=endpoint 10.129.111.71:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[334074.622297s]  WARN ThreadId(01) outbound:proxy{addr=172.16.12.201:6379}:balance{addr=argo-cd-redis-ha-haproxy.argocd.svc.cluster.local:6379}:controller{addr=linkerd-dst-headless.linkerd.svc.cluster.local:8086}:endpoint{addr=10.129.111.71:8086}: linkerd_reconnect: Failed to connect error=endpoint 10.129.111.71:8086: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

output of `linkerd check -o short`

Status check results are √

Environment

Kubernetes version: v1.26.6-eks-a5565ad
Cluster env: EKS
Host OS: Linux
Linkerd Version: stable-2.13.4
CNI: AWS VPC CNI v1.26.2-eksbuild.1

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

None

The text was updated successfully, but these errors were encountered:

adleong · 2023-07-26T23:04:37Z

Hi @valentinwidmer

The message

Failed to connect error=endpoint 127.0.0.1:8090

Suggests that the destination controller is not able to connect to the policy controller for some reason. I'd recommend looking at the logs of the policy container in the linkerd-destination pod to see if there are any errors there:

kubectl -n linkerd logs deploy/linkerd-destination -c policy

bjoernw · 2023-07-26T23:57:08Z

@adleong it's straightforward to replicate this using the load test here: #11055 (comment)

these issues might all be related?

destination proxy logs:

[     0.001961s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.002955s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.002981s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.002983s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.002984s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.002985s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.002986s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.002988s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[     0.003519s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     0.004269s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed t
o resolve SRV record: no record found for Query { name: Name("linkerd-identity-headless.linkerd.svc.cluster.local."), query_type: SRV, query_class: IN }; failed to resolve A record: no record found for Query { name: Name("linkerd-identi
ty-headless.linkerd.svc.cluster.local."), query_type: AAAA, query_class: IN } error.sources=[failed to resolve A record: no record found for Query { name: Name("linkerd-identity-headless.linkerd.svc.cluster.local."), query_type: AAAA, q
uery_class: IN }, no record found for Query { name: Name("linkerd-identity-headless.linkerd.svc.cluster.local."), query_type: AAAA, query_class: IN }]
[     0.005812s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_app_core::control: Failed to resolve control-plane component error=failed SRV and A record lookups: failed t
o resolve SRV record: no record found for Query { name: Name("linkerd-identity-headless.linkerd.svc.cluster.local."), query_type: SRV, query_class: IN }; failed to resolve A record: no record found for Query { name: Name("linkerd-identi
ty-headless.linkerd.svc.cluster.local."), query_type: AAAA, query_class: IN } error.sources=[failed to resolve A record: no record found for Query { name: Name("linkerd-identity-headless.linkerd.svc.cluster.local."), query_type: AAAA, q
uery_class: IN }, no record found for Query { name: Name("linkerd-identity-headless.linkerd.svc.cluster.local."), query_type: AAAA, query_class: IN }]
[     0.111468s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     0.323257s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     0.754783s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     1.255966s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     1.756611s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     2.258819s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     2.760748s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     3.008522s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_stack::failfast: Service entering failfast after 3s
[     3.008786s] ERROR ThreadId(02) identity: linkerd_proxy_identity_client::certify: Failed to obtain identity error=status: Unknown, message: "controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast",
 details: [], metadata: MetadataMap { headers: {} } error.sources=[controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast, service in fail-fast]
[     3.262034s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     3.764521s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     4.266589s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     4.768371s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     5.269691s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     5.770976s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     6.273257s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     6.775043s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     7.276555s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     7.777981s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     8.279963s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     8.783531s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     9.285752s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[     9.789637s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Con
nection refused (os error 111)]
[    10.006307s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}: linkerd_stack::failfast: Service entering failfast after 10s

policy logs:

2023-07-26T23:21:19.465832Z  INFO linkerd_policy_controller: created Lease resource lease=Lease { metadata: ObjectMeta { annotations: None, cluster_name: None, creation_timestamp: Some(Time(2023-07-26T23:21:19Z)), deletion_grace_period_
seconds: None, deletion_timestamp: None, finalizers: None, generate_name: None, generation: None, labels: Some({"linkerd.io/control-plane-component": "destination", "linkerd.io/control-plane-ns": "linkerd"}), managed_fields: Some([Manag
edFieldsEntry { api_version: Some("coordination.k8s.io/v1"), fields_type: Some("FieldsV1"), fields_v1: Some(FieldsV1(Object {"f:metadata": Object {"f:labels": Object {"f:linkerd.io/control-plane-component": Object {}, "f:linkerd.io/cont
rol-plane-ns": Object {}}, "f:ownerReferences": Object {"k:{\"uid\":\"3fdfa0b7-8774-473f-9c77-23cf7ec43c86\"}": Object {}}}})), manager: Some("policy-controller"), operation: Some("Apply"), time: Some(Time(2023-07-26T23:21:19Z)) }]), na
me: Some("policy-controller-write"), namespace: Some("linkerd"), owner_references: Some([OwnerReference { api_version: "apps/v1", block_owner_deletion: None, controller: Some(true), kind: "Deployment", name: "linkerd-destination", uid:
"3fdfa0b7-8774-473f-9c77-23cf7ec43c86" }]), resource_version: Some("724"), self_link: None, uid: Some("ef44e99a-3b32-4ba0-9583-cfbcee1030d6") }, spec: Some(LeaseSpec { acquire_time: None, holder_identity: None, lease_duration_seconds: N
one, lease_transitions: None, renew_time: None }) }
2023-07-26T23:21:19.468485Z  INFO grpc{port=8090}: linkerd_policy_controller: policy gRPC server listening addr=0.0.0.0:8090
2023-07-26T23:21:41.935230Z  INFO authorizationpolicies:apply{ns=linkerd-viz saz=metrics-api}:reindex{ns=linkerd-viz}:pod{pod=metrics-api-78d5c76d8c-7skzb}: linkerd_policy_controller_k8s_index::inbound::index: Illegal AuthorizationPolic
y; ignoring server=metrics-api authorizationpolicy=metrics-api error=could not find MeshTLSAuthentication metrics-api-web in namespace linkerd-viz
2023-07-26T23:21:42.025280Z  INFO pods:apply{ns=linkerd-viz name=tap-5896c4fb56-p87mf}: linkerd_policy_controller_k8s_index::inbound::index: Illegal AuthorizationPolicy; ignoring server=tap-api authorizationpolicy=tap error=could not fi
nd NetworkAuthentication kube-api-server in namespace linkerd-viz
2023-07-26T23:21:42.066387Z  INFO servers:apply{ns=linkerd-viz name=tap-injector-webhook}:reindex{ns=linkerd-viz}:pod{pod=tap-5896c4fb56-p87mf}: linkerd_policy_controller_k8s_index::inbound::index: Illegal AuthorizationPolicy; ignoring
server=tap-api authorizationpolicy=tap error=could not find NetworkAuthentication kube-api-server in namespace linkerd-viz
2023-07-26T23:21:42.071896Z  INFO authorizationpolicies:apply{ns=linkerd-viz saz=tap-injector}:reindex{ns=linkerd-viz}:pod{pod=tap-5896c4fb56-p87mf}: linkerd_policy_controller_k8s_index::inbound::index: Illegal AuthorizationPolicy; igno
ring server=tap-api authorizationpolicy=tap error=could not find NetworkAuthentication kube-api-server in namespace linkerd-viz

valentinwidmer · 2023-07-28T07:49:30Z

Hi @valentinwidmer

The message

Failed to connect error=endpoint 127.0.0.1:8090

Suggests that the destination controller is not able to connect to the policy controller for some reason. I'd recommend looking at the logs of the policy container in the linkerd-destination pod to see if there are any errors there:
kubectl -n linkerd logs deploy/linkerd-destination -c policy

Hi @adleong ,

Thanks a lot for your quick response.

Please find below the logs of the policy controller:

2023-07-26T09:19:35.286371Z  INFO linkerd_policy_controller: Lease already exists, no need to create it
2023-07-26T09:19:35.291752Z  INFO grpc{port=8090}: linkerd_policy_controller: policy gRPC server listening addr=0.0.0.0:8090
2023-07-27T05:48:27.239136Z  WARN services: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:27.240005Z ERROR services: kube_client::client::builder: failed with error connection closed before message completed
2023-07-27T05:48:27.240045Z  INFO services: kubert::errors: stream failed error=failed to start watching object: HyperError: connection closed before message completed
2023-07-27T05:48:27.291039Z  WARN pods: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:27.293979Z ERROR services: kube_client::client::builder: failed with error error trying to connect: Connection reset by peer (os error 104)
2023-07-27T05:48:27.294024Z  INFO services: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: Connection reset by peer (os error 104)
2023-07-27T05:48:43.509105Z  WARN networkauthentications: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:43.509292Z  WARN serverauthorizations: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:43.511921Z  WARN servers: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:43.512479Z  WARN meshtlsauthentications: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:43.512810Z  WARN httproutes: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:43.559371Z  WARN authorizationpolicies: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:43.561063Z ERROR httproutes: kube_client::client::builder: failed with error error trying to connect: Connection reset by peer (os error 104)
2023-07-27T05:48:43.561104Z  INFO httproutes: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: Connection reset by peer (os error 104)
2023-07-27T05:48:43.561396Z ERROR httproutes: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2023-07-27T05:48:43.561422Z  INFO httproutes: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: tcp connect error: Connection refused (os error 111)
2023-07-27T05:48:43.618628Z  INFO servers: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 414267543 (414268048): Expired
2023-07-27T05:48:43.943441Z  INFO serverauthorizations: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 414267566 (414268569): Expired
2023-07-27T05:48:48.564104Z  INFO httproutes: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 414268262 (414268572): Expired

Any idea what could be the issue?

nicklanng · 2023-07-31T10:05:16Z

I have seen everything exactly as @valentinwidmer has described.

This only seems to happen when we deploy a new version of an application, it also seems to only be affecting a specific application.
Deleting or the killing the pods will just bring it back with the same behavior.
What fixes it for me is tagging a new version of the container and deploying that.

adleong · 2023-08-10T18:12:58Z

@valentinwidmer Seeing some warnings for a few seconds after starting a Linkerd controller isn't unexpected since it can take a few seconds for Linkerd to sync it's caches. Is this impacting your traffic or is it just a logs issue?

jsong336 · 2023-08-22T15:07:46Z

@adleong I am getting similar issue but in policy controller pod I get errors

kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: timed out

Sul-ss · 2023-08-30T06:41:12Z

Getting similar issue when a new version of an app is deployed.
Linkerd version 2.13.5 deployed using helm 1.12.5

Workload error:

 ERROR ThreadId(02) identity: linkerd_proxy_identity_client::certify: 
 Failed to obtain identity error=status: Unknown, message: "controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast", details: [], metadata: MetadataMap { headers: {} } error.sources=[controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast, service in fail-fast

Policy container error:

ERROR servers: kube_client::client::builder: failed with error error trying to connect: Connection reset by peer (os error 104)
INFO servers: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: Connection reset by peer (os error 104)
ERROR authorizationpolicies: kube_client::client::builder: failed with error error trying to connect: Connection reset by peer (os error 104)
INFO authorizationpolicies: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: Connection reset by peer (os error 104)
ERROR httproutes: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)

mayank-ag-dev · 2023-09-06T09:01:16Z

i am facing with a similar issue while installing linkerd with version 2.13.6
Logs of linkerd-destination

[     0.003374s]  INFO ThreadId(01) linkerd2_proxy::rt: Using single-threaded proxy runtime
[     0.004492s]  INFO ThreadId(01) linkerd2_proxy: Admin interface on 0.0.0.0:4191
[     0.004516s]  INFO ThreadId(01) linkerd2_proxy: Inbound interface on 0.0.0.0:4143
[     0.004519s]  INFO ThreadId(01) linkerd2_proxy: Outbound interface on 127.0.0.1:4140
[     0.004524s]  INFO ThreadId(01) linkerd2_proxy: Tap DISABLED
[     0.004528s]  INFO ThreadId(01) linkerd2_proxy: Local identity is linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[     0.004530s]  INFO ThreadId(01) linkerd2_proxy: Identity verified via linkerd-identity-headless.linkerd.svc.cluster.local:8080 (linkerd-identity.linkerd.serviceaccount.identity.linkerd.cluster.local)
[     0.004534s]  INFO ThreadId(01) linkerd2_proxy: Destinations resolved via localhost:8086
[     0.005368s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.010692s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     0.010777s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     0.115916s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.119620s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     0.119767s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     0.321645s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.341749s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     0.341880s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     0.726543s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     0.781415s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     0.781558s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     1.228510s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     1.283593s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     1.283688s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     1.730383s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     1.785340s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     1.785436s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     2.231389s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     2.287294s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: rustls::conn: Sending fatal alert BadCertificate
[     2.287394s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: invalid peer certificate contents: invalid peer certificate: UnknownIssuer error.sources=[invalid peer certificate contents: invalid peer certificate: UnknownIssuer]
[     2.732343s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     2.788895s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}:endpoint{addr=10.129.192.22:8080}: linkerd_reconnect: Failed to connect error=endpoint 10.129.192.22:8080: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     3.006358s]  WARN ThreadId(02) identity:controller{addr=linkerd-identity-headless.linkerd.svc.cluster.local:8080}: linkerd_stack::failfast: Service entering failfast after 3s
[     3.006433s] ERROR ThreadId(02) identity: linkerd_proxy_identity_client::certify: Failed to obtain identity error=status: Unknown, message: "controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast", details: [], metadata: MetadataMap { headers: {} } error.sources=[controller linkerd-identity-headless.linkerd.svc.cluster.local:8080: service in fail-fast, service in fail-fast]
[     3.233320s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     3.734831s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     4.236756s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     4.738579s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     5.240583s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     5.742614s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     6.244831s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     6.746087s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     7.247070s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     7.748981s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     8.249872s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     8.751831s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     9.253951s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[     9.755793s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    10.006367s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}: linkerd_stack::failfast: Service entering failfast after 10s
[    10.257199s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    10.758780s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    11.260705s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    11.762684s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    12.264864s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    12.765962s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    13.267769s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    13.507733s]  INFO ThreadId(02) daemon:identity: linkerd_app: Certified identity id=linkerd-destination.linkerd.serviceaccount.identity.linkerd.cluster.local
[    13.769740s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    14.271638s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    14.774180s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[    15.274971s]  WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]

I am installing linked with kubectl
Note:- GKE version is 1.26.5-gke.2700

yoramshai · 2023-09-07T13:30:04Z

Same here ☝🏻
Must mention it is impacting real traffic and it caused a prod incident.

adleong · 2023-09-11T22:26:54Z

@valentinwidmer thanks for providing those logs!

2023-07-27T05:48:27.240005Z ERROR services: kube_client::client::builder: failed with error connection closed before message completed
2023-07-27T05:48:27.240045Z  INFO services: kubert::errors: stream failed error=failed to start watching object: HyperError: connection closed before message completed
2023-07-27T05:48:27.291039Z  WARN pods: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2023-07-27T05:48:27.293979Z ERROR services: kube_client::client::builder: failed with error error trying to connect: Connection reset by peer (os error 104)

This suggests that the problem is that policy controller is failing to connect to the kubernetes API. When it attempts to establish a connection, it receives an EOF from the kubernetes server, causing the connection to fail. It's not clear why this is happening. The next things to look at would be the controller metrics. This should give us some information about how many watches the control plane is maintaining, how many connections it has to the k8s API, etc. That way, we can determine if these failed connections are due to, for example, a connection limit being reached or something like that.

You can fetch the controller metrics by running linkerd diagnostics controller-metrics.

omidraha · 2023-09-14T16:43:31Z

It seems I have similar issue.

mayank-ag-dev · 2023-09-15T11:23:12Z

@omidraha did you find anything because it Is affecting production?

omidraha · 2023-09-15T19:03:47Z

No I just added more info.

mayank-ag-dev · 2023-09-18T10:21:03Z

@adleong could you look into this it is impacting the production I must be sure that this is the issue related to the compatibility of the linked with the k8s version because v1.10 of linkerd works fine with gke v1.24 but when I upgraded to v.127 it throws an error

bjoernw · 2023-09-19T20:24:25Z

@adleong Is there a way to exempt the destination-controller from failfast? Right now it seems like as soon as a couple of pods have problems reaching the destination-controller it's deemed to be in failfast and that exacerbates the problem. If it is indeed an issue with the number of watches etc. then it would be nice if the healthcheck failed and the pod is replaced.

When it goes into failfast in the middle of a canary rollout it just causes chaos.

mayank-ag-dev · 2023-09-25T02:18:50Z

@omidraha Seems linkerd is now working fine initially, the inkerd-proxy container is trying to hit the policy container in the same container, And the policy container needs some time to spin up so we got the error once the policy container is up service will be recovered automatically as seen in log
attaching logs FYR
linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 8.782819s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 9.284685s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 9.786430s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 10.004825s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}: linkerd_stack::failfast: Service entering failfast after 10s
[ 10.286995s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 10.788804s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 11.290575s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 11.791364s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 12.292052s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 12.793850s] WARN ThreadId(01) watch{port=9997}:controller{addr=localhost:8090}:endpoint{addr=127.0.0.1:8090}: linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused (os error 111) error.sources=[Connection refused (os error 111)]
[ 13.295349s] INFO ThreadId(01) linkerd_stack::failfast: Service has recovered

jessectl · 2023-09-29T15:12:27Z

Any update on this? We are also facing similar issue.

ThomasCardin · 2023-10-16T19:53:37Z

We are also facing this similar issue just after installing linkerd.

kflynn · 2023-10-18T18:28:05Z

@ThomasCardin @JesseAhh @bjoernw @omidraha Could I repeat the plea for the controller metrics from anyone who's seeing this? If you run linkerd diagnostics controller-metrics that should give you exactly what we're after: that should give us some more information about what's up with talking to the K8s APIServer. Many thanks in advance!!!

omidraha · 2023-10-18T19:49:50Z

The complete log is provided here:
#10994 (comment)
and I successfully deployed Linkerd .
The deployment code using Python and Pulumi is available below:
https://github.com/omidraha/pulumi_example/tree/main/linkerd

kflynn · 2023-11-02T16:54:43Z

@omidraha, are you on the Linkerd Slack? if so, can you ping me (@flynn) there? Thanks!

danny-devops · 2023-11-07T10:51:58Z

Hey guys,

We are also experiencing the same issue.

We upgraded linkerd-control-plane-chart to v1.16.3. When this upgrade was applied to one of our Kubernetes clusters we observed the destination service being oomkilled. We increased resources to accommodate this however in the logs we still observed linkerd_reconnect: "Failed to connect","error" and Connection refused (os error 111) error.sources=[Connection refused (os error 111)].

We have rolled back linkerd-control-plane-chart to v1.16.2 to stabilise the cluster.
Please can you provide an update on this?

Thanks

DavidMcLaughlin · 2023-11-08T17:44:42Z

@danny-devops That sounds like a separate issue to anything mentioned here given you are seeing OOM kills and are not using 2.13.x. Do you mind creating a new ticket so we can triage it there?

Nutties93 · 2023-12-11T03:59:51Z

Any ways to resolve this? I am facing similar issues when trying to install linkerd on EKS. #11697

stale · 2024-03-10T06:36:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

valentinwidmer added the bug label Jul 24, 2023

mateiidavid assigned adleong Sep 7, 2023

adleong removed their assignment Oct 26, 2023

ThomasCardin mentioned this issue Oct 30, 2023

Configurer/Deployer Linkerd ClubCedille/Plateforme-Cedille#32

Closed

stale bot added the wontfix label Mar 10, 2024

stale bot closed this as completed Mar 30, 2024

github-actions bot locked as resolved and limited conversation to collaborators Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Linkerd stable-2.13.4 - linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused #11156

Linkerd stable-2.13.4 - linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused #11156

valentinwidmer commented Jul 24, 2023 •

edited by alpeb

Loading

adleong commented Jul 26, 2023

bjoernw commented Jul 26, 2023 •

edited

Loading

valentinwidmer commented Jul 28, 2023 •

edited by alpeb

Loading

nicklanng commented Jul 31, 2023 •

edited

Loading

adleong commented Aug 10, 2023

jsong336 commented Aug 22, 2023

Sul-ss commented Aug 30, 2023

mayank-ag-dev commented Sep 6, 2023 •

edited

Loading

yoramshai commented Sep 7, 2023

adleong commented Sep 11, 2023

omidraha commented Sep 14, 2023

mayank-ag-dev commented Sep 15, 2023

omidraha commented Sep 15, 2023

mayank-ag-dev commented Sep 18, 2023

bjoernw commented Sep 19, 2023

mayank-ag-dev commented Sep 25, 2023

jessectl commented Sep 29, 2023

ThomasCardin commented Oct 16, 2023 •

edited

Loading

kflynn commented Oct 18, 2023

omidraha commented Oct 18, 2023

kflynn commented Nov 2, 2023

danny-devops commented Nov 7, 2023

DavidMcLaughlin commented Nov 8, 2023

Nutties93 commented Dec 11, 2023

stale bot commented Mar 10, 2024

Linkerd stable-2.13.4 - linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused #11156

Linkerd stable-2.13.4 - linkerd_reconnect: Failed to connect error=endpoint 127.0.0.1:8090: Connection refused #11156

Comments

valentinwidmer commented Jul 24, 2023 • edited by alpeb Loading

What is the issue?

How can it be reproduced?

Logs, error output, etc

output of linkerd check -o short

Environment

Possible solution

Additional context

Would you like to work on fixing this bug?

adleong commented Jul 26, 2023

bjoernw commented Jul 26, 2023 • edited Loading

valentinwidmer commented Jul 28, 2023 • edited by alpeb Loading

nicklanng commented Jul 31, 2023 • edited Loading

adleong commented Aug 10, 2023

jsong336 commented Aug 22, 2023

Sul-ss commented Aug 30, 2023

mayank-ag-dev commented Sep 6, 2023 • edited Loading

yoramshai commented Sep 7, 2023

adleong commented Sep 11, 2023

omidraha commented Sep 14, 2023

mayank-ag-dev commented Sep 15, 2023

omidraha commented Sep 15, 2023

mayank-ag-dev commented Sep 18, 2023

bjoernw commented Sep 19, 2023

mayank-ag-dev commented Sep 25, 2023

jessectl commented Sep 29, 2023

ThomasCardin commented Oct 16, 2023 • edited Loading

kflynn commented Oct 18, 2023

omidraha commented Oct 18, 2023

kflynn commented Nov 2, 2023

danny-devops commented Nov 7, 2023

DavidMcLaughlin commented Nov 8, 2023

Nutties93 commented Dec 11, 2023

stale bot commented Mar 10, 2024

valentinwidmer commented Jul 24, 2023 •

edited by alpeb

Loading

output of `linkerd check -o short`

bjoernw commented Jul 26, 2023 •

edited

Loading

valentinwidmer commented Jul 28, 2023 •

edited by alpeb

Loading

nicklanng commented Jul 31, 2023 •

edited

Loading

mayank-ag-dev commented Sep 6, 2023 •

edited

Loading

ThomasCardin commented Oct 16, 2023 •

edited

Loading