Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Agent health, integration update availability alerts #124240

Closed
Tracked by #78188
philippkahr opened this issue Feb 1, 2022 · 12 comments
Closed
Tracked by #78188

[Fleet] Agent health, integration update availability alerts #124240

philippkahr opened this issue Feb 1, 2022 · 12 comments
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@philippkahr
Copy link
Contributor

Describe the feature:

TLDR;

  1. Create default rules for Fleet
  2. Give us the option to access the agent health
  3. Give us the option to access the integration policies, updates and information

Within the Fleet UI you can sort, select and search for the agents and also see how many are unhealthy, healthy, or have and have not responded within x minutes.

Within the Stack Monitoring for the Beats agents it is possible to create a custom alerting rule that uses the Elasticsearch Query inside Kibana and query the .monitoring-beats... indices to check if a certain beat is alive and sending in data.

Sometimes there is a not so dynamical infrastructure involved and an alert for certain Elastic Agents might be of interest. Currently there is no possibility to alert based on the health of the agent. E.g. interesting would be if my on-premise fleet server is healthy, if that breaks I want to be alerted immediately, since this can introduce cascading errors, like policies not updating, all agents becoming unhealthy.

A good way would be to allow some default rules, like we have in the Stack Monitoring, where I can select give me an alert every 12 hours to Slack with all unhealthy agents. This way I would get information if my infrastructure has some issues, there are some changes and I might want to perform cleanups and throw the unhealthy agents away.

Default rules in Fleet UI

  1. Give me a rule that would alert me if an integration is available for update

As of now, I have no idea when an integration has a new version and I would need to look into the agent policy, then check if there is an update, or even go one step deeper into the integration itself and update there first. This was commented here after I created an issue that I could not see the update on the agent.

An alert that would run once a day and sends me a mail with integration xyz is ready to update, no breaking changes would be good.

  1. Give me a rule where I can select the agents that I want to be alerted if they go unhealthy.

This is needed to give me an alert if my fleet server goes down. Currently I am using a heartbeat that does an http request against the fleet and I have an status alert set. However, that involves me running additional software on different hardware, whilst the data is already available within Elasticsearch.

So the possibility to give me an alert on an agent that goes unhealthy immediately would be good. Furthermore a second rule that gives me a status report once a day with 10 agents unhealthy ... list of agent names would be interesting for me to clean up.

@botelastic botelastic bot added the needs-team Issues missing a team label label Feb 1, 2022
@joshdover joshdover added the Team:Fleet Team label for Observability Data Collection Fleet team label Feb 1, 2022
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@botelastic botelastic bot removed the needs-team Issues missing a team label label Feb 1, 2022
@joshdover joshdover added the enhancement New value added to drive a business result label Feb 1, 2022
@elastic elastic deleted a comment from pozezanc Mar 30, 2022
@philippkahr
Copy link
Contributor Author

Also the scaling described in here: https://www.elastic.co/guide/en/fleet/7.17/fleet-server-scalability.html#scaling-recommendations should be available as an alert, thus when reaching 2000 agents, there should be an alert that the current sizing might not be adequate enough and a change to xyz should be performed.

@nimarezainia
Copy link
Contributor

Team is working on an infrastructure that would allow Status (be it agent status or status of the inputs/integrations ) to be propagated up to the Fleet UI. In this process the status changes will be stored in a specific datastream. The user will then have the flexibility to build these alerts based on the documents we store. I'm sure we can also create pre-baked alerts out of the box.

@ghost
Copy link

ghost commented Sep 12, 2022

I will +1 this as a feature request.

My clusters are all on-prem. Having a way to easily send notifications when agents/fleet are unhealthy is something that is sorely needed.

@mukeshelastic mukeshelastic changed the title [Fleet] Default Rules and Alerting [Fleet] Agent health, integration update availability alerts Jan 4, 2023
@jeffvestal
Copy link

@mukeshelastic I'm guessing this is no longer a "8.6" candidate sine that has been released?

@defutek-tj
Copy link

@nimarezainia - is this on the product roadmap for the near future? I have a customer requesting this feature.

@nimarezainia
Copy link
Contributor

@nimarezainia - is this on the product roadmap for the near future? I have a customer requesting this feature.

@defutek-tj it is one of our higher priority items for the users of platform however currently not slated for delivery due to other higher priority items on that list.

@jpsep-elastic
Copy link

Hi @joshdover who should I speak with to get more info on this? - fellow Elastician here!

JP

@nimarezainia
Copy link
Contributor

@jpsep-elastic happy to discuss.

@defutek-tj
Copy link

@nimarezainia - any updates as to when this feature might be available?

@nimarezainia
Copy link
Contributor

@defutek-tj our 8.9 release brings agent health including reporting on the health of inputs/integrations (see the agent details page). We don;t have alerts as yet built on the status changes however.

@nimarezainia
Copy link
Contributor

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New value added to drive a business result Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

No branches or pull requests

7 participants