Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Random readiness/liveness probe failures on linkerd-destination pod #9602

Closed
AlexGoris-KasparSolutions opened this issue Oct 12, 2022 · 3 comments

Comments

@AlexGoris-KasparSolutions

What is the issue?

I upgraded our linkerd installation from 2.11 to 2.12.1 2 days ago. Over these 2 days I've been seeing random readiness and/or liveness probe failures from the linkerd-destination pod.
The pod always seems to recover in time, but the events show up in our monitoring dashboard and I'd like to investigate whether we can safely ignore these or whether something is wrong.

How can it be reproduced?

No idea how to reproduce it, these events happen at random times throughout the day, including at night while nothing is going on on our cluster.

Logs, error output, etc

These are the various events we're seeing:

Oct 12, 2022 @ 04:16:35.751	Readiness probe failed: Get "http://172.16.36.138:9990/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Oct 12, 2022 @ 04:02:25.753	Readiness probe failed: Get "http://172.16.36.138:9990/ready": dial tcp 172.16.36.138:9990: i/o timeout (Client.Timeout exceeded while awaiting headers)
Oct 12, 2022 @ 04:02:25.744	Liveness probe failed: Get "http://172.16.36.138:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Oct 12, 2022 @ 02:34:15.744	Liveness probe failed: Get "http://172.16.36.138:9990/live": context deadline exceeded (Client.Timeout exceeded while awaiting headers)
Oct 12, 2022 @ 00:21:45.744	Readiness probe failed: Get "http://172.16.36.138:9990/ready": context deadline exceeded (Client.Timeout exceeded while awaiting headers)

I've looked in the logs of the concerned pod, and only found erroneous logs in the policy container. I've pasted the entire log, since it also contains 'info' logs which have message that sound quite erroneous:
(timestamps below must be calculated +2h to match the event timestamps)

2022-10-10T13:31:18.651751Z  INFO grpc{port=8090}: linkerd_policy_controller: gRPC server listening addr=0.0.0.0:8090
2022-10-10T13:35:23.608556Z  INFO meshtlsauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:35:23.976584Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:35:24.076786Z  INFO authorizationpolicies: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120218487 (120221055): Expired
2022-10-10T13:35:24.560669Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:35:24.728935Z  INFO httproutes: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120218487 (120221058): Expired
2022-10-10T13:35:24.949501Z  INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:35:25.566212Z  INFO servers: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:35:25.840335Z  INFO servers: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120218487 (120221063): Expired
2022-10-10T13:35:38.219469Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:39:29.366869Z  INFO meshtlsauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:39:29.735517Z  INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:39:34.074919Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:39:34.875888Z  INFO networkauthentications: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120218487 (120222748): Expired
2022-10-10T13:39:35.303290Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:43:41.022288Z  INFO meshtlsauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T13:43:44.877451Z  INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T14:26:30.355741Z  INFO meshtlsauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T14:47:23.025370Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T18:57:58.177987Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T23:05:45.769316Z  INFO pods: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-10T23:09:51.423789Z  INFO pods: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T01:48:46.123539Z  INFO meshtlsauthentications: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120484163 (120486907): Expired
2022-10-11T02:55:48.803715Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T02:59:55.241793Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T05:16:31.956125Z  INFO pods: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T05:26:21.995063Z  INFO pods: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: timed out
2022-10-11T05:30:31.730465Z  INFO pods: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T07:46:22.178523Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T07:50:33.695771Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T15:38:03.061691Z  INFO server{port=9443}:conn{client.ip=172.16.36.138 client.port=46290}: kubert::server: Connection lost error=connection error: unexpected end of file
2022-10-11T21:47:22.726596Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:01:39.622617Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:04:19.318943Z  WARN meshtlsauthentications: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:19.332348Z  WARN authorizationpolicies: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.827919Z  WARN pods: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.882406Z  WARN meshtlsauthentications: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.892552Z  WARN serverauthorizations: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.892625Z  WARN authorizationpolicies: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.894460Z  WARN networkauthentications: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.894648Z  WARN httproutes: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.894713Z  WARN servers: kube_client::client: eof in poll: error reading a body from connection: error reading a body from connection: unexpected EOF during chunk size line
2022-10-11T22:04:34.901507Z ERROR authorizationpolicies: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.901526Z  INFO authorizationpolicies: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.901842Z ERROR networkauthentications: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.901938Z  INFO networkauthentications: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.902569Z ERROR serverauthorizations: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.902847Z  INFO serverauthorizations: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.903652Z ERROR servers: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:34.903667Z  INFO servers: kubert::errors: stream failed error=failed to start watching object: HyperError: error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:36.748702Z  INFO meshtlsauthentications: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120928647 (120930940): Expired
2022-10-11T22:04:36.756656Z ERROR meshtlsauthentications: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:36.756837Z  INFO meshtlsauthentications: kubert::errors: stream failed error=failed to perform initial object list: HyperError: error trying to connect: tcp connect error: Connection refused (os error 111)
2022-10-11T22:04:40.050739Z  INFO networkauthentications: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120929532 (120930967): Expired
2022-10-11T22:04:40.155520Z  INFO servers: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120929405 (120930854): Expired
2022-10-11T22:04:41.172888Z  INFO httproutes: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120929533 (120930898): Expired
2022-10-11T22:04:41.250555Z  INFO serverauthorizations: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120929406 (120930969): Expired
2022-10-11T22:04:41.372005Z  INFO authorizationpolicies: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120927410 (120930873): Expired
2022-10-11T22:09:13.428716Z  INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:09:13.542826Z  INFO servers: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:09:13.857861Z  INFO servers: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120931016 (120932649): Expired
2022-10-11T22:09:14.549683Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:09:14.743844Z  INFO httproutes: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120931023 (120932662): Expired
2022-10-11T22:09:14.770412Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:09:14.982397Z  INFO serverauthorizations: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120931025 (120932665): Expired
2022-10-11T22:13:40.120252Z  INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:13:41.688142Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:13:46.533429Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:13:46.646938Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T22:17:53.027175Z  INFO serverauthorizations: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T23:21:00.884975Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-11T23:21:06.032865Z  INFO authorizationpolicies: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120956659 (120959083): Expired
2022-10-12T00:09:23.556347Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-12T00:09:59.689070Z  INFO meshtlsauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-12T00:14:35.778036Z  INFO meshtlsauthentications: kubert::errors: stream failed error=error returned by apiserver during watch: too old resource version: 120975018 (120978685): Expired
2022-10-12T01:21:31.361181Z  INFO networkauthentications: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-12T07:14:28.606800Z  INFO httproutes: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-12T08:36:40.356749Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)
2022-10-12T08:41:08.717814Z  INFO authorizationpolicies: kubert::errors: stream failed error=watch stream failed: Error reading events stream: error reading a body from connection: error reading a body from connection: Connection reset by peer (os error 104)

output of linkerd check -o short

Linkerd core checks
===================

linkerd-version
---------------
‼ cli is up-to-date
    is running version 2.11.1 but the latest stable version is 2.12.1
    see https://linkerd.io/2.11/checks/#l5d-version-cli for hints

control-plane-version
---------------------
‼ viz extension proxies are up-to-date
    some proxies are not running the current version:
        * metrics-api-595c7b564-7ls6t (stable-2.11.4)
        * prometheus-77b9558b4b-4nqjm (stable-2.11.4)
        * tap-7f8f67546f-x624j (stable-2.11.4)
        * tap-injector-6b6c5c86d4-cqsv5 (stable-2.11.4)
        * web-6756f5956c-z4kdl (stable-2.11.4)
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cp-version for hints
‼ viz extension proxies and cli versions match
    grafana-db56d7cb4-qm44p running  but cli running stable-2.11.1
    see https://linkerd.io/2.11/checks/#l5d-viz-proxy-cli-version for hints

Status check results are √

Environment

  • Kubernetes version: 1.24.3
  • Cluster environment: AKS
  • Host OS: Linux (Ubuntu 18.04)
  • Linkerd version: 2.12.1

Possible solution

No response

Additional context

No response

Would you like to work on fixing this bug?

No response

@adleong
Copy link
Member

adleong commented Oct 13, 2022

Hi @AlexGoris-KasparSolutions

Based on those logs, it seems like the Kubernetes API is refusing connections from the policy controller:

2022-10-11T22:04:34.901842Z ERROR networkauthentications: kube_client::client::builder: failed with error error trying to connect: tcp connect error: Connection refused (os error 111)

It's unclear why this would happen and we haven't seen this in our testing. If you have concrete steps to reliably reproduce this, we can investigate. Otherwise, I would recommend increasing the log level to see if there are any more clues or using tools such as tcpdump to try to understand why the connection is being refused.

@stale
Copy link

stale bot commented Jan 11, 2023

This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 14 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the wontfix label Jan 11, 2023
@stale stale bot closed this as completed Jan 26, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Feb 26, 2023
@alpeb
Copy link
Member

alpeb commented Mar 6, 2023

@AlexGoris-KasparSolutions any luck with this using the latest stable versions? We added an improvement in 2.12.4 that allowed probes to behave nicer in particular in clusters with lots of resources deployed. If you still experience the issue, it'd also be very helpful to get more detailed information like suggested by @adleong.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants