Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Alert instances view to display data from event log instead of current instances #57446

Closed
mikecote opened this issue Feb 12, 2020 · 8 comments · Fixed by #68437
Closed

Alert instances view to display data from event log instead of current instances #57446

mikecote opened this issue Feb 12, 2020 · 8 comments · Fixed by #68437
Assignees
Labels
Feature:Alerting ReleaseStatus Item of high enough importance that it should be called out in release status meetings Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)

Comments

@mikecote
Copy link
Contributor

mikecote commented Feb 12, 2020

We currently display the current alert instances now that #56842 got merged. This issue is the following step which would be to display from history with a start and stop column showing the duration of each instance.

@mikecote mikecote added Feature:Alerting Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) labels Feb 12, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@mikecote mikecote changed the title Alert instances view to display data from history instead of current instances Alert instances view to display data from event log instead of current instances Apr 16, 2020
@pmuellr
Copy link
Member

pmuellr commented Apr 16, 2020

Seems like an nice "extension" to the current UI would be to show all the alert instances seen over some interval like a day (fixed for now, customizable later) and their most recent start/stop/durations.

Here's the current view:

image

Changes would be that you would see more instances - every instance that scheduled actions within the last 24 hours. And we'd start seeing inactive instances, not just active ones.

This will likely end up making us add some filtering/sorting - you might want to sort by instance/status/start/duration, and you might want to filter by instance and status, as a separate issue/PR ...

We'll have a semantic issue if we find an instance that was resolved within the time period we query over, but not the start of that instance. For these we're back into an "unknown" state, though we do know the minimum amount of time it has been active (the resolved time - the start time of the query we run) - duration could be something like "at least two hours" or "over two hours", kinda thing.

@mikecote
Copy link
Contributor Author

mikecote commented May 9, 2020

@mdefazio to provide latest mockup of what the alert instances look like in the alert details page once data is pulled from event log.

@pmuellr
Copy link
Member

pmuellr commented May 11, 2020

I'd be happy to pair with someone on this; I'm very familiar with the event log :-)

@gmmorris
Copy link
Contributor

gmmorris commented May 13, 2020

I took a quick look at the current code to get a mental model of the pieces we need to put together:

  1. We currently have a component called AlertInstancesRoute that wraps the AlertInstances and has one job - which is to load the state of the alert and pass it into the AlertInstances when it's mounted. I think it would make sense to start by adding an event-log api to with_bulk_alert_api_operations.
  2. Once we have a new api in with_bulk_alert_api_operations we can use it in AlertInstancesRoute to fetch the default events and pass a function through to AlertInstances that will allow us to refresh the events that are passed to the AlertInstances whenever the filtering props are changed.

Hopefully by then we'll have a fresh design and we can change the table in AlertInstances to render the events rather than the state.

One problem we need to address is that we would, presumably, calculate the duration by looking back along the api response at how long an instance persists across execution cycles. The problem here is that if the time window specified by the user dictates how many events we fetch from the API then we can only ever evaluate duration as far back as that time window. This means that if, for example, the user specifies they want to see a time frame of "last 15 minutes" then the longest duration any single instance could have is 15 minutes, even if it has been going off for an hour.

@mdefazio We need to think about how we might want to express that a specific instance runs all the way back to the edge of the time frame... possibly exceeding it. 🤔

@mdefazio
Copy link
Contributor

Sorry I'm a bit late on this. Here's the current-ish mockup. (Not showing Andrea's updates to the top section). Let me know what makes sense to show in the table and we can update the mockup.
Alert-Detail-Instance

@arisonl
Copy link
Contributor

arisonl commented Jun 23, 2020

  • Should the status values correspond to the states offered by the alert (e.g. ok, warning, minor, major etc.) rather than active/inactive?
  • What does Duration show if the selected period from the time picker includes multiple occurrences (start-end) of the same instance (e.g. long selected period)? Are we only showing the last one? Conversely, what does Duration show if it includes a partial occurence of an instance (e.g. short selected period)? Should we have an End field next to Start as well?

@arisonl
Copy link
Contributor

arisonl commented Jun 24, 2020

Notes on the chart: #56280 (comment)

pmuellr added a commit to pmuellr/kibana that referenced this issue Aug 10, 2020
resolves elastic#57446

Adds a new API (AlertClient and HTTP endpoint) `getAlertStatus()` which returns
data calculated from the event log.

The data returned in this PR is fairly minimal - just enough to replace the
current instance details view data.  In the future, we can add the following
sorts of things:

- alert execution errors
- counts of alert execution
- sequences of active instance time spans
- if we can get the alert SO into the action execution, we could also
  provide action execution errors
@mikecote mikecote mentioned this issue Aug 11, 2020
36 tasks
pmuellr added a commit that referenced this issue Aug 14, 2020
resolves #57446

Adds a new API (AlertClient and HTTP endpoint) `getAlertStatus()` which returns
alert data calculated from the event log.
pmuellr added a commit to pmuellr/kibana that referenced this issue Aug 14, 2020
…#68437)

resolves elastic#57446

Adds a new API (AlertClient and HTTP endpoint) `getAlertStatus()` which returns
alert data calculated from the event log.
mikecote pushed a commit that referenced this issue Aug 14, 2020
…#75036)

resolves #57446

Adds a new API (AlertClient and HTTP endpoint) `getAlertStatus()` which returns
alert data calculated from the event log.
@stacey-gammon stacey-gammon added the ReleaseStatus Item of high enough importance that it should be called out in release status meetings label Sep 17, 2020
@kobelb kobelb added the needs-team Issues missing a team label label Jan 31, 2022
@botelastic botelastic bot removed the needs-team Issues missing a team label label Jan 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature:Alerting ReleaseStatus Item of high enough importance that it should be called out in release status meetings Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams)
Projects
None yet
8 participants