Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reset gateway error on successful request #2194

Closed
michalpristas opened this issue Jan 26, 2023 · 2 comments · Fixed by #2203
Closed

Reset gateway error on successful request #2194

michalpristas opened this issue Jan 26, 2023 · 2 comments · Fixed by #2203
Assignees
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.6.0

Comments

@michalpristas
Copy link
Contributor

when gateway encounters error e.g dns name resolution

{"log.level":"warn","@timestamp":"2023-01-26T08:46:31.607Z","log.origin":{"file.name":"fleet/fleet_gateway.go","file.line":190},"message":"Possible transient error during checkin with fleet-server, retrying","error":{"message":"fail to checkin to fleet-server: all hosts failed: 1 error occurred:\n\t* requester 0/1 to host [https://<redacted>:443/](https:/<redacted>/) errored: Post \"[https://<redacted>/api/fleet/agents/7c2067ff-30ff-4628-b8cf-af36fa56a524/checkin](https://<redacted>/api/fleet/agents/7c2067ff-30ff-4628-b8cf-af36fa56a524/checkin)?\": lookup <redacted>: Temporary failure in name resolution\n\n"},"request_duration_ns":846590,"failed_checkins":2,"retry_after_ns":150942692753,"ecs.version":"1.6.0"}

it will report error to coordinator
this error is never reset so agent stays in failed state until restarted.

@michalpristas michalpristas added bug Something isn't working Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team Team:Elastic-Agent Label for the Agent team labels Jan 26, 2023
@michalpristas michalpristas self-assigned this Jan 26, 2023
@michalpristas
Copy link
Contributor Author

looks like this one was not properly migrated to v2: #1152
we should not report unhealthy due to fleet server connectivity issues as for #1148

that still stands right @cmacknz ?

@cmacknz
Copy link
Member

cmacknz commented Jan 26, 2023

Yes a single or a handful of network errors should not cause us to go unhealthy. There is logic in Fleet to detect that the agent is offline, we can rely on that to surface this problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team Team:Elastic-Agent-Control-Plane Label for the Agent Control Plane team v8.6.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants