-
Notifications
You must be signed in to change notification settings - Fork 145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows Agent Left Unhealthy After Removing Endpoint Integration #1262
Comments
Hi Team
Build Details: We will revalidate this issue once it will be fixed. |
After talking to @ferullo today he mentioned that we need to give endpoint at least 90 seconds to stop. Looking into how to customize this settings with endpoint spec. Might need to add some code that continues checking endpoint status longer and recover the agent state to healthy. |
I played a bit with different approaches to improve the situation here. Think it would be good to make a note on some caveats. Initial thinking was to increase the timeout and if we still time out then use the platform service apis to stop the service and monitor the status of the service and eventually setting it to stopped either based on the status or after some grace period, thus allowing the agent to recover to the healthy state.
So the plan is:
If there are any objections let me know. |
@bjmcnic do you know the answer to @aleksmaus 's question? |
@aleksmaus ...
|
@aleksmaus ... In regards to the Disabled Endpoint state requiring a reboot. That's likely the result of a failed install. That most commonly occurs when we try to install a test signed Endpoint in protect mode on a host without test signing enabled. The installer is unable to start the service as PPL and attempts to cleanup the failed install. But because the service has been marked as PPL with the SCM, and we weren't able to execute as PPL, we aren't able to uninstall the service at run time and require a reboot to clean that up. The service being left that does indeed interfere with subsequent install attempts prior to reboot, but it's likely they were going to fail for the same reasons. If you're aware of another way to cause this state, please let me know and we can likely find a way to prevent it. |
Hi Team We have revalidated this issue on 8.5.1 BC1 Kibana cloud-production environment and found it fixed now. Observations:
Build details: Hence marking this issue as QA:Validated. |
Version: 8.4.1
Operating System: Windows Server 2012 R2 Datacenter
Steps to Reproduce:
Error messages in the logs from Agent:
On the host, the Endpoint is no longer running and is no longer on disk (there are three empty directories left behind Endpoint->State->Logs).
Running agent status produces:
I have waited up to 24 hours and the Unhealthy status does not resolve. It does get resolved by restarting the Agent from the host.
The text was updated successfully, but these errors were encountered: