[Fleet] Agent health, integration update availability alerts #124240

philippkahr · 2022-02-01T14:28:57Z

Describe the feature:

TLDR;

Create default rules for Fleet
Give us the option to access the agent health
Give us the option to access the integration policies, updates and information

Within the Fleet UI you can sort, select and search for the agents and also see how many are unhealthy, healthy, or have and have not responded within x minutes.

Within the Stack Monitoring for the Beats agents it is possible to create a custom alerting rule that uses the Elasticsearch Query inside Kibana and query the .monitoring-beats... indices to check if a certain beat is alive and sending in data.

Sometimes there is a not so dynamical infrastructure involved and an alert for certain Elastic Agents might be of interest. Currently there is no possibility to alert based on the health of the agent. E.g. interesting would be if my on-premise fleet server is healthy, if that breaks I want to be alerted immediately, since this can introduce cascading errors, like policies not updating, all agents becoming unhealthy.

A good way would be to allow some default rules, like we have in the Stack Monitoring, where I can select give me an alert every 12 hours to Slack with all unhealthy agents. This way I would get information if my infrastructure has some issues, there are some changes and I might want to perform cleanups and throw the unhealthy agents away.

Default rules in Fleet UI

Give me a rule that would alert me if an integration is available for update

As of now, I have no idea when an integration has a new version and I would need to look into the agent policy, then check if there is an update, or even go one step deeper into the integration itself and update there first. This was commented here after I created an issue that I could not see the update on the agent.

An alert that would run once a day and sends me a mail with integration xyz is ready to update, no breaking changes would be good.

Give me a rule where I can select the agents that I want to be alerted if they go unhealthy.

This is needed to give me an alert if my fleet server goes down. Currently I am using a heartbeat that does an http request against the fleet and I have an status alert set. However, that involves me running additional software on different hardware, whilst the data is already available within Elasticsearch.

So the possibility to give me an alert on an agent that goes unhealthy immediately would be good. Furthermore a second rule that gives me a status report once a day with 10 agents unhealthy ... list of agent names would be interesting for me to clean up.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2022-02-01T15:57:56Z

Pinging @elastic/fleet (Team:Fleet)

philippkahr · 2022-04-11T11:27:42Z

Also the scaling described in here: https://www.elastic.co/guide/en/fleet/7.17/fleet-server-scalability.html#scaling-recommendations should be available as an alert, thus when reaching 2000 agents, there should be an alert that the current sizing might not be adequate enough and a change to xyz should be performed.

nimarezainia · 2022-05-25T14:55:50Z

Team is working on an infrastructure that would allow Status (be it agent status or status of the inputs/integrations ) to be propagated up to the Fleet UI. In this process the status changes will be stored in a specific datastream. The user will then have the flexibility to build these alerts based on the documents we store. I'm sure we can also create pre-baked alerts out of the box.

ghost · 2022-09-12T14:37:14Z

I will +1 this as a feature request.

My clusters are all on-prem. Having a way to easily send notifications when agents/fleet are unhealthy is something that is sorely needed.

jeffvestal · 2023-01-12T20:47:56Z

@mukeshelastic I'm guessing this is no longer a "8.6" candidate sine that has been released?

defutek-tj · 2023-04-12T15:07:36Z

@nimarezainia - is this on the product roadmap for the near future? I have a customer requesting this feature.

nimarezainia · 2023-04-18T04:02:14Z

@nimarezainia - is this on the product roadmap for the near future? I have a customer requesting this feature.

@defutek-tj it is one of our higher priority items for the users of platform however currently not slated for delivery due to other higher priority items on that list.

jpsep-elastic · 2023-05-02T17:00:21Z

Hi @joshdover who should I speak with to get more info on this? - fellow Elastician here!

JP

nimarezainia · 2023-05-03T07:13:02Z

@jpsep-elastic happy to discuss.

defutek-tj · 2023-07-28T13:52:37Z

@nimarezainia - any updates as to when this feature might be available?

nimarezainia · 2023-07-31T05:44:03Z

@defutek-tj our 8.9 release brings agent health including reporting on the health of inputs/integrations (see the agent details page). We don;t have alerts as yet built on the status changes however.

nimarezainia · 2024-03-28T04:14:05Z

https://www.elastic.co/guide/en/fleet/8.13/monitor-elastic-agent.html#fleet-alerting

botelastic bot added the needs-team Issues missing a team label label Feb 1, 2022

joshdover added the Team:Fleet Team label for Observability Data Collection Fleet team label Feb 1, 2022

botelastic bot removed the needs-team Issues missing a team label label Feb 1, 2022

joshdover added the enhancement New value added to drive a business result label Feb 1, 2022

elastic deleted a comment from pozezanc Mar 30, 2022

joshdover mentioned this issue Mar 31, 2022

[Fleet] Improve agent observability #78188

Open

13 tasks

nimarezainia added the 8.6 candidate label Jun 8, 2022

mukeshelastic changed the title ~~[Fleet] Default Rules and Alerting~~ [Fleet] Agent health, integration update availability alerts Jan 4, 2023

nimarezainia removed the 8.6 candidate label Jan 17, 2023

jen-huang mentioned this issue Apr 19, 2023

Trigger based action using Elastic Agent #153417

Open

nimarezainia closed this as completed Mar 28, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Fleet] Agent health, integration update availability alerts #124240

[Fleet] Agent health, integration update availability alerts #124240

philippkahr commented Feb 1, 2022

elasticmachine commented Feb 1, 2022

philippkahr commented Apr 11, 2022

nimarezainia commented May 25, 2022

ghost commented Sep 12, 2022

jeffvestal commented Jan 12, 2023

defutek-tj commented Apr 12, 2023

nimarezainia commented Apr 18, 2023

jpsep-elastic commented May 2, 2023

nimarezainia commented May 3, 2023

defutek-tj commented Jul 28, 2023

nimarezainia commented Jul 31, 2023

nimarezainia commented Mar 28, 2024

[Fleet] Agent health, integration update availability alerts #124240

[Fleet] Agent health, integration update availability alerts #124240

Comments

philippkahr commented Feb 1, 2022

elasticmachine commented Feb 1, 2022

philippkahr commented Apr 11, 2022

nimarezainia commented May 25, 2022

ghost commented Sep 12, 2022

jeffvestal commented Jan 12, 2023

defutek-tj commented Apr 12, 2023

nimarezainia commented Apr 18, 2023

jpsep-elastic commented May 2, 2023

nimarezainia commented May 3, 2023

defutek-tj commented Jul 28, 2023

nimarezainia commented Jul 31, 2023

nimarezainia commented Mar 28, 2024