Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide clearer error message when enrollment fails with TLS/SSL enabled #4505

Open
romain-chanu opened this issue Apr 2, 2024 · 3 comments
Labels
enhancement New feature or request Team:Elastic-Agent Label for the Agent team

Comments

@romain-chanu
Copy link

Describe the enhancement:

Provide clearer error message when enrollment fails with TLS/SSL is enabled.

It has been observed that Elastic Agent could fail to enroll with the following error message:

C:\Users\<Domain Admin Account>\Downloads\elastic-agent-8.12.2-windows-x86_64>.\elastic-agent.exe install --url=https://<Fleet Server IP>:8220 --enrollment-token=<Token Value>
Elastic Agent will be installed at C:\Program Files\Elastic\Agent and will run as a service. Do you want to continue? [Y/n]:Y
[=== ] Service Started  [40s] Elastic Agent successfully installed, starting enrollment.
[=   ] Waiting For Enroll...  [41s] {"log.level":"info","@timestamp":"2024-03-22T10:09:37.927+0800","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":496},"message":"Starting enrollment to URL: https://<Fleet Server IP>:8220/","ecs.version":"1.6.0"}
[=== ] Waiting For Enroll...  [57s] {"log.level":"info","@timestamp":"2024-03-22T10:09:53.677+0800","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":505},"message":"1st enrollment attempt failed, retrying for 10m0s, every 1m0s enrolling to URL: https://<Fleet Server IP>:8220/","ecs.version":"1.6.0"}
Error: fail to enroll: fail to execute request to fleet-server: EOF
For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.12/fleet-troubleshooting.html

Error: fail to enroll: fail to execute request to fleet-server: EOF does not provide any clue regarding the problem encountered and what caused the problem.

In the reported situation, the problem seemed to be related to TLS/SSL and the problem was worked around by running the install / enroll command with --certificate-authorities=<path_to_your_ca.pem>

.\elastic-agent.exe install --url=https://<fleet_server_ip:port> --enrollment-token=<token> --certificate-authorities=<path_to_your_ca.pem>

Note that the observed problem was observed on Windows Server 2016 / 2019 but not on Windows Server 2022. Attempt to reproduce the issue was unsuccessful.

It has not been concluded if the issue was related to:

  • networking/firewall issues?
  • Windows Group policies (e.g policies related to TLS/SSL, certificates, ciphers, etc.)?
  • other?

Describe a specific use case for the enhancement or feature: To help users determine root cause of enrollment failures and guide them to resolution.

What is the definition of done? Error message should clearly state why the enrollment failed. If it is due to TLS/SSL connection failing, provide a clear root cause and guide users to resolution.

@pierrehilbert pierrehilbert added the Team:Elastic-Agent Label for the Agent team label Apr 2, 2024
@elasticmachine
Copy link
Collaborator

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pierrehilbert pierrehilbert added the enhancement New feature or request label Apr 2, 2024
@cmacknz
Copy link
Member

cmacknz commented Apr 2, 2024

We would be getting the EOF error directly from the HTTP client we use to make the request:

resp, err := e.client.Send(ctx, "POST", p, nil, headers, bytes.NewBuffer(b))

The EOF alone doesn't give our code any hints about what the problem might be. It could be TLS, it could be something else.

The TLS handshake failing with a non-descriptive EOF error is linked to several upstream Go issues. golang/go#19874 for one example.

It looks like newer versions of Go have tried to improve that but I don't see the additional error context here. https://go-review.googlesource.com/c/go/+/299449/4/src/crypto/tls/conn.go

So I 100% agree we should improve our error messaging, but the Go TLS package isn't giving us much help here in providing more error context.

Without error context from the underlying TLS implementation the best we could do is something like fail to execute request to fleet-server: EOF. Please verify your TLS settings. to at least point people at the TLS configuration as a possible cause.

@FlorianHeigl
Copy link

FlorianHeigl commented Apr 8, 2024

@cmacknz how about some test connect that helps discern a few error classes? As this should also happen for a random GET /foobar/random that should become a 404 in a working setup. currently you seem to have the problem that a failed (server fail) and an impossible request (handshake fail) and possibly more do look the same. so maybe it would help to first check if the path is clear.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

5 participants