You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When starting the Agent with Fleet-Server with the setup command, in case the Fleet Server can not be set up correctly, the Elastic Agent does not expose the health check endpoint and eventually shuts down. This happens for example when the package registry is (temporarily) unavailable causing issues for the Fleet setup, or when trying to enroll into an Agent Policy without a Fleet Server package.
The Cloud setup is using the Fleet preconfiguration API to set up the Cloud agent policy. When the package registry is not reachable, the agent policy is still created, but it doesn't contain a Fleet Server package policy .
Expected Behavior
The Elastic Agent should always expose the healthcheck endpoint /processes and listen on the configured port. It should not shut down because of issues in one of its subprocesses, not even when it is the Fleet Server. The health check is designed to always return a 200 for the Elastic Agent itself and a list of subprocesses that are expected to be running. For every expected subprocess a pid is returned, indicating if it is up or not.
While the agent is not usable when the Fleet Server is not up, this might be a temporary issue and therefore should not shut down the agent. It should be up to the orchestrator to make decisions about shutting down the whole agent based on its health check response.
Why is this a problem
This behavior is causing problems on Cloud when the initial setup fails and the agent shuts down, for example on all ECE air gapped deployments >= 7.14. An unhealthy Fleet Server should not impact a standalone APM Server, but with the above mentioned behavior it does.
The text was updated successfully, but these errors were encountered:
Current Behavior
When starting the Agent with Fleet-Server with the setup command, in case the Fleet Server can not be set up correctly, the Elastic Agent does not expose the health check endpoint and eventually shuts down. This happens for example when the package registry is (temporarily) unavailable causing issues for the Fleet setup, or when trying to enroll into an Agent Policy without a Fleet Server package.
The Cloud setup is using the Fleet preconfiguration API to set up the Cloud agent policy. When the package registry is not reachable, the agent policy is still created, but it doesn't contain a Fleet Server package policy .
Expected Behavior
The Elastic Agent should always expose the healthcheck endpoint
/processes
and listen on the configured port. It should not shut down because of issues in one of its subprocesses, not even when it is the Fleet Server. The health check is designed to always return a200
for the Elastic Agent itself and a list of subprocesses that are expected to be running. For every expected subprocess apid
is returned, indicating if it is up or not.While the agent is not usable when the Fleet Server is not up, this might be a temporary issue and therefore should not shut down the agent. It should be up to the orchestrator to make decisions about shutting down the whole agent based on its health check response.
Why is this a problem
This behavior is causing problems on Cloud when the initial setup fails and the agent shuts down, for example on all ECE air gapped deployments
>= 7.14
. An unhealthy Fleet Server should not impact a standalone APM Server, but with the above mentioned behavior it does.The text was updated successfully, but these errors were encountered: