Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[question] Is there any way to disable TCP Sidecar Listening Check? #9773

Closed
leonardobsjr opened this issue Jan 11, 2021 · 14 comments · Fixed by #10531
Closed

[question] Is there any way to disable TCP Sidecar Listening Check? #9773

leonardobsjr opened this issue Jan 11, 2021 · 14 comments · Fixed by #10531
Assignees
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration type/enhancement

Comments

@leonardobsjr
Copy link
Contributor

Nomad version

Nomad 1.0.1
[Consul 1.8.6]

Operating system and Environment details

Linux Ubuntu 24.04

Issue

Both Nomad and Consul are running on, respectively, -dev-connect and -dev. Is there any way for Nomad to disable the creation of the Sidecar Listening Check? It's possible through Consul (just by not passing any checks), but neither sidecar_task nor sidecar_service accepts checks. I have an old service that keeps logging it as a failed connection.

I know that this is not exactly Nomad-related, but if there's no way to do it on Nomad, can you explain how does the check occurs? It's something controlled by Consul or it's relegated to Envoy as an Envoy Health check? I was thinking about changing the Envoy image to somehow block the tcp pinging, but I don't really know how...

@shoenig
Copy link
Member

shoenig commented Jan 11, 2021

The Connect Sidecar Listening check is injected by Consul when registering a service with a Connect sidecar (as Nomad does when using the connect stanza).

https://github.com/hashicorp/consul/blob/master/agent/sidecar_service.go#L172

I have an old service that keeps logging it as a failed connection.

Are you saying you're setting sidecar_task with something that is unable to respond to the check?

The check serves a purpose, combined with Connect Sidecar Aliasing this is how Consul determines whether the service is unhealthy due to sidecar problems.

@shoenig shoenig added theme/consul/connect Consul Connect integration type/question labels Jan 11, 2021
@leonardobsjr
Copy link
Contributor Author

leonardobsjr commented Jan 12, 2021

Are you saying you're setting sidecar_task with something that is unable to respond to the check?

Hey @shoenig thanks a lot for the fast answer! Really appreciate it.

Nope, the check works fine. However, this particular application keeps logging the sidecar ping connection somehow, reporting it as an error (failed attempt to connect). Maybe due to the sudden connection closing, I don't really know. On the Consul side, everything is fine tho.

So, since I have no control whatsoever regarding the application, I want to remove (or replace) the sidecar check with something else. So, from Consul's side, as said in the line that you provided, that is possible by passing any check to the sidecar service because the sidecar ping is created only if no checks are given. However, since there's no way to pass checks to the sidecar service from Nomad (neither sidecar_task nor sidecar_service accepts checks), I don't see a straightforward way to do it.

I was even considering changing the Envoy image to somehow block those health checks, but after some consideration, I think that these checks aren't registered as Envoy health checks, but they're somehow managed by Consul (might be wrong tho).

@shoenig
Copy link
Member

shoenig commented Jan 12, 2021

The check is executed by Consul, and should only be establishing a TCP connection with envoy (or whatever your sidecar_task is). If your application is detecting the pings, that's quite perplexing! And suggests there's probably some misconfiguration or very weird iptables rules in play.

@leonardobsjr
Copy link
Contributor Author

leonardobsjr commented Jan 13, 2021

The check is executed by Consul, and should only be establishing a TCP connection with envoy (or whatever your sidecar_task is). If your application is detecting the pings, that's quite perplexing! And suggests there's probably some misconfiguration or very weird iptables rules in play.

@shoenig don't think so... Like I said before, I don't have the details of the application, but as far as I know the application keeps listening for a TCP connection with some specific data. The TCP check from Consul is proxied through Envoy and since it's not a valid request, just an empty TCP Ping, the application think it's an error and logs it. So that's why I want to disable and pass a custom check to the sidecar service or at least change the sidecar check port - Consul could ping a different port instead of pinging the service port.

I had some hope that it could be disabled from Nomad, since it's possible to disable the TCP ping check for the native build-in proxy with the disable_tcp_check configuration, but I have found nothing. I don't think it's even possible to use the built-in proxy of Consul from Nomad (is it?).

Before trying to use the Ingress Gateway we had a setup with a F5 load balancing that also had a health check, and I verified why I think that F5's healthcheck doesn't trigger the error: instead of fully opening a connection, it uses a TCP half-open connection. So since it's not a fully formed TCP connection, nothing gets logged.

@tgross tgross added the hcc/cst Admin - internal label Jan 26, 2021
@AndrewChubatiuk
Copy link
Contributor

AndrewChubatiuk commented Feb 4, 2021

@shoenig
I have problem with sidecar TCP checks when using host_network.
Problem is that consul registers healthchecks on localhost but a sidecar is not available from localhost, it is only available from host_network interface.
I tried to set local_service_address equals to an interface address. In this case healthchecks are passing but sidecar tries to use an interface address as well to connect to a service that it hides and is unable to connect.
I have no idea how to solve it but see there only several options how to fix it:

  • allow to set another address for sidecar healthchecks, which should be supported by consul as well
  • allow to override sidecar healthchecks in nomad
  • expose a task that is running behind a sidecar proxy to a host

@tgross tgross added this to Needs Triage in Nomad - Community Issues Triage Feb 12, 2021
@tgross
Copy link
Member

tgross commented Feb 16, 2021

Hi @leonardobsjr! It's probably not feasible to provide that particular knob, but what you're describing also shouldn't be happening either so that suggests there might be a bug at play here somewhere (maybe Nomad, maybe Consul, maybe Envoy!). Can you provide a jobspec that shows what your Connect configuration looks like? That would help us figure out what's going on there.

@AndrewChubatiuk can you please open a new issue for that? That doesn't seem to be directly related to @leonardobsjr's issue and I want to make sure we don't lose track of things.

@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Feb 16, 2021
@tgross tgross self-assigned this Feb 16, 2021
@AndrewChubatiuk
Copy link
Contributor

@tgross my PR with a fix for this issue was just merged

@tgross
Copy link
Member

tgross commented Feb 16, 2021

Ha! Shows what I know! 😊

That was #9975 and it'll land in Nomad 1.0.4. Going to keep this issue open for @leonardobsjr unless @shoenig thinks that should cover this as well?

@tgross tgross closed this as completed Feb 16, 2021
Nomad - Community Issues Triage automation moved this from In Progress to Done Feb 16, 2021
@tgross tgross added this to the 1.0.4 milestone Feb 16, 2021
@tgross tgross reopened this Feb 16, 2021
Nomad - Community Issues Triage automation moved this from Done to Needs Triage Feb 16, 2021
@tgross tgross removed this from the 1.0.4 milestone Feb 16, 2021
@tgross tgross moved this from Needs Triage to In Progress in Nomad - Community Issues Triage Feb 16, 2021
@leonardobsjr
Copy link
Contributor Author

leonardobsjr commented Feb 16, 2021

@tgross Does @AndrewChubatiuk PR enables overriding Consul checks from Nomad?

@shoenig
Copy link
Member

shoenig commented Feb 16, 2021

Yeah the original issue is legitimate; there is currently no way to not have this check enabled when using Connect from Nomad. Consul injects this TCP check automagically if there are no checks defined on the sidecar service. Nomad doesn't let you specify checks one way or the other for sidecar_service, which means the Consul will always set it. We should probably add sidecar_service.checks so the default of no checks can be overridden, causing Consul to not inject the default check.

@shoenig shoenig added stage/accepted Confirmed, and intend to work on. No timeline committment though. type/enhancement and removed stage/waiting-reply type/question labels Feb 16, 2021
@tgross tgross moved this from In Progress to Needs Roadmapping in Nomad - Community Issues Triage Feb 16, 2021
@tgross tgross removed their assignment Feb 16, 2021
@leonardobsjr
Copy link
Contributor Author

leonardobsjr commented Feb 19, 2021

Hi @leonardobsjr! It's probably not feasible to provide that particular knob, but what you're describing also shouldn't be happening either so that suggests there might be a bug at play here somewhere (maybe Nomad, maybe Consul, maybe Envoy!). Can you provide a jobspec that shows what your Connect configuration looks like? That would help us figure out what's going on there.

Sure @tgross , it's actually very very easy to replicate. The attached deployment file deploys go-echo, which is a very simple service that listens to a port expecting a connection and echoes whatever you send after connecting. After you deploy it, check the logs on docker and you will see that Consul is opening and closing a connection every 10s. In my case, the application understands that every connection is a business request and expects some stuff to be sent to proceed. Since it is just a check instead of a valid request, nothing is sent, and the app logs it as an error. Pretty annoying.

go-echo-svc.hcl.txt

@tgross tgross removed this from Needs Roadmapping in Nomad - Community Issues Triage Mar 3, 2021
@Legogris
Copy link

Legogris commented Mar 17, 2021

Can confirm this becomes an issue for many services that I wouldn't even call legacy - each check yields up to 4 log lines for some things, and that's not even on debug level.

@schoenig's suggestion to add a sidecar_service.checks stanza sounds right on point

@tgross
Copy link
Member

tgross commented May 7, 2021

Fix merged in #10531 and will ship in the upcoming Nomad 1.1.0-rc1

@github-actions
Copy link

I'm going to lock this issue because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active issues.
If you have found a problem that seems similar to this, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Oct 20, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
hcc/cst Admin - internal stage/accepted Confirmed, and intend to work on. No timeline committment though. theme/consul/connect Consul Connect integration type/enhancement
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants