Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Metrics Alerts][discuss] Alert History #58295

Closed
Zacqary opened this issue Feb 21, 2020 · 9 comments
Closed

[Metrics Alerts][discuss] Alert History #58295

Zacqary opened this issue Feb 21, 2020 · 9 comments
Assignees
Labels
discuss Feature:Alerting Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@Zacqary
Copy link
Contributor

Zacqary commented Feb 21, 2020

Being able to look at the history of a metric alert would be very useful. A user should be able to go back and time and see:

  • When the alert was previously in an OK or Alert state
  • What the value of the monitored metric was at these times

There are two things we need to implement to make this happen:

  • A UI to view this data
  • A backend to store this data (the Alerting plugin only natively supports storing the current alert state, not its state history over time) (Currently tracked as the event log project)

Maybe it would be useful for all alerts, not just metric alerts, to have this historical data available, which is why I'm pinging the Alerting team on this. We can definitely just build it out for Metrics. But if there's an easy way to incorporate storing state history directly into the Alerting plugin, that would save us a lot of work having to maintain our own SavedObjects.

@Zacqary Zacqary added discuss Feature:Alerting Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 21, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member

pmuellr commented Feb 24, 2020

Great timing, as we haven't yet added eventLog support to alerting yet, and so it's a perfect time to get more requirements.

The basic story is that the eventLog is used to write interesting historic events, for later querying as you've mentioned. For alerting, the plan is to write the following entries out, per alert:

  • alert executor function ran (alert)
  • alert scheduled actions to run (alert, instanceId, action)
  • alert / alert instance state changes (mute, in the future snooze/ack/etc)

What other data we provide in these records is TBD. My current thinking is that we won't actually write out any of the alert-specific info (eg, cpu usage value) into the record, since it's an arbitrary shape - if we were to write something we'd have to JSON.stringify() it and store it as a non-searchable string. We'll see. I'm hoping that even if we don't have any alert-specific info in there, an app should be able to query the info it needs based on the date/instanceId in the event records. I'm imagining the app would display it's usual view given the date/instanceIds, and then can add annotation points for the associated event records found.

From a query point of view, we'll be tying the search to the associated SO (in this case, the alert SO), so you'll pass in the alert SO type/id, and then will be able to specify the date range, event type, etc, to get the relevant records back.

@peterschretlen
Copy link
Contributor

Linking related kibana alerting issues: #55636, #55633

@jasonrhodes
Copy link
Member

@peterschretlen @pmuellr are there plans for the Alerting Core team to build some kind of central history UI? Is that on any kind of roadmap? I think all of observability will want something like this (I imagine most alerting users will) so we want to make sure we understand how to avoid duplication as much as possible. Thanks for any info you've got!

@chrisronline
Copy link
Contributor

It's not UI-specific, but there is another ticket discussing what should live in this event log: #63257 (comment)

@pmuellr
Copy link
Member

pmuellr commented Apr 22, 2020

are there plans for the Alerting Core team to build some kind of central history UI? Is that on any kind of roadmap?

We've been mainly focusing on getting the data into the event log right now - I thought I opened an issue with some progressive enhancements to the alert details page, to pull data from the event log instead of just the current state. I'll open one up, can't seem to find it right now, maybe I never created it.

One of the issues around viewing the history is that we require access to the alert ids (saved object type/id, to be precise) to get the history, and currently the API only takes a single alert id. We probably need to change that to take multiple alert ids, to cut down on ES requests. Or somehow get the list of all the alerts the current user can see (in the current space). TBD.

You can access this history using the event_log plugin: https://github.com/elastic/kibana/tree/master/x-pack/plugins/event_log

It's not documented in the README, but we also provide an eventLogClient as part of the plugin start contract:

this.eventLogClientService = new EventLogClientService({
esContext: this.esContext,
savedObjectsService: core.savedObjects,
});
return this.eventLogClientService;

It provides an interface to search for event log documents:

export interface IEventLogClientService {
getClient(request: KibanaRequest): IEventLogClient;
}
export interface IEventLogClient {
findEventsBySavedObject(
type: string,
id: string,
options?: Partial<FindOptionsType>
): Promise<QueryEventsBySavedObjectResult>;
}

@pmuellr
Copy link
Member

pmuellr commented Apr 22, 2020

The issue tracking enhancing the alert details view is here: #57446

@sgrodzicki sgrodzicki self-assigned this Apr 27, 2020
@sorantis
Copy link

sorantis commented May 5, 2020

After a discussion with @arisonl, closing this in favor of #62221 as it's already on the alerting team's radar.

Alert History would benefit all solutions, not just metrics, therefore it's best that the initiative is handled in a more general way, with solutions being key stakeholders.

@sorantis sorantis closed this as completed May 5, 2020
@zube zube bot removed the [zube]: Done label Oct 13, 2020
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discuss Feature:Alerting Feature:Metrics UI Metrics UI feature Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
Development

No branches or pull requests

9 participants