-
Notifications
You must be signed in to change notification settings - Fork 4.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sidecar proxy with TLSV1_ALERT_PROTOCOL_VERSION error #17362
Comments
I realise I missed the port configs as http and grpc are disabled in consul with the following port numbers:
|
Ive spent a little more time this evening looking into the issue at least, From a few checks and tests, If I drop the minimum supported tls version down to
Going into the side car using exec and checking out the contents of
Checking the paths define for -grpc-ca-file and -ca-file within the container leads to no file being present which might explain the reason its not able to verify the certificate authority Should these CA files be mounted into the sidecar containers when running the nomad consul integration with |
Hi @CarbonCollins, did the CA and certs come from the I'm not sure the CA path is relevant here; the envoy bootstrap command turns the CA into a string directly embedded in the envoy bootstrap config file. |
The CA and cert for the grpc endpoint should have come from the consul tls command, the ca and certs for the https was created separately / self-signed |
so just to follow up on the previous message, I re-generated my consul agent CA and server Certs and ended up with the same error. For generating the certs I used:
server:
so I can confirm that the CA and certs for the grpc endpoint are from the consul tls command |
Been further looking into this myself to see if I missed anything in the logs, checking in the consul server and client logs this time and found: Consul Server Logs
Consul Client Logs where sidecar is running
From this info does it seem like the problem stems more from Consul rather than Nomad? |
Just to update this issue with where I am currently. I have upgraded from Nomad I have also updated how Consul is getting consul CA certs now. Its using the vault provider now using the following config Consul server connect block
After this change I am no longer seeing any leaf cert issues that I mentioned in my previous message now Looking in Vault I see the certificates being generated as I am letting consul be the one to manage these certs. Within the sidecar I'm seeing logs similar to what was reported in: hashicorp/nomad#15360 where the sidecar starts but reports connection refused errors and have checked again to ensure the steps in: https://developer.hashicorp.com/consul/docs/upgrading/upgrade-specific#service-mesh-compatibility-1 were done. Checking within the sidecar in I did also stumble across #15913 which does not have any thing I need to actively do to fix afaik, but checking the file changes in there and comparing my envoy_bootstrap.json to the golden one defined for tests seems to at least have the correct format bar a few id changes, service names, and ports. The only one that maybe looks a bit odd (but I have no idea if its a problem or not) is the path to the console.sock file which in my config is I'm going to continue attempting to figure out what is going on whenever I have a bit of time but its not been a fun time getting my cluster back up and running after it being offline for a good 3/4 of a year 😅 |
Ok so some more info, I moved onto Nomad and getting its RPC set up with TLS, after deploying all of my Nomad servers and clients with tls enabled on rpc (using the nomad tls cli commands), my sidecars are now showing a different behaviour. They are no longer complaining about the TLSV1_ALERT_PROTOCOL_VERSION and instead have now changed to the following:
The side car is also throwing a ENVOY_SIGTERM after failing to validate the certificate and restarting periodically now. Did I miss a requirement for this when setting up consul connect with nomad where if consul has TLS enabled on its RPC then Nomad also needs its own RPC enabled with TLS? |
Nomad never talks to Envoy. It only spins up the sidecar task and plumbs thru configuration values that end up in its CLI args. So any TLS issues you're seeing can only be between Consul and Envoy. Consul is the I'm sorry to do this to you but at this point I feel fairly confident this isn't an issue with Nomad. I'm going to move the issue into the Consul repository so that it can get some attention from folks who have more expertise in Consul/Envoy TLS configuration than I do. |
No need to be sorry, Coincidentally today I did figure out and confirm that it was a Consul issue... specifically with my client nodes. I updated the nomad configuration to point directly to the consul server instead of the local agent and everything is happy as can be... so it currently seems that my local consul agents are not handing TLS correctly right now... |
FYI, also running into this and with trace logging on the Consul agent that is on the Nomad client I see a bunch of errors relating to the TLS version.
Dropping
and in Envoy
Edit: ahh which seems to be related to #13124 - where the current workaround is to turn off |
Nomad version
Output from
nomad version
Output from
consul version
Operating system and Environment details
Ubuntu 22.04.2
Issue
When attempting to use consul connect on a job the side car never reaches a healthy state and prints out every few seconds:
Reproduction steps
Have consul connect enabled and configured, deploy a job which uses an envoy sidecar for consul connect and start the job. After an init period the sidecar will start printing out the warning messages every few seconds. Both Consul and Nomad have ACL enabled.
Consul TLS, Address, and Ports configs (These are templated from ansible):
Consul server additonally has:
Nomad Consul config:
Expected Result
Sidecar registers service with Consul and becomes healthy ready to serve traffic
Actual Result
Sidecar registers service with Consul and then does not come to a healthy state, Instead prints out warnings every few seconds like:
Job file (if appropriate)
I'm currently attempting to run this job file:
https://gitlab.com/carboncollins-cloud/monitoring/log-management/-/blob/main/job.template.nomad.hcl
Sidecar Logs in full:
Nomad Server logs (if appropriate)
Nomad Client logs (if appropriate)
I noticed that the client seems to be refusing the rpc connection but the port looks correct...
The text was updated successfully, but these errors were encountered: