Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rename alert status OK to Recovered and fix some UX issues around disabling a rule while being in an error state #98135

Merged
merged 11 commits into from
May 13, 2021

Conversation

mikecote
Copy link
Contributor

@mikecote mikecote commented Apr 23, 2021

Resolves #87055
Resolves #98073

In this fix-it PR, I'm fixing a couple of UX issues around the rule details page.

  1. When a rule is in an error state (based on lastExecutionStatus), the error banner will re-display after the user disables then enables the rule if the banner was previously dismissed. This way, if ever the rule does encounter a different error after being enabled, the error gets displayed regardless if they chose to dismiss the previous error or not.
  2. The rule details page no longer displays the error message when the rule is disabled when in an error state (based on lastExecutionStatus). A banner is already displayed to indicate the rule is disabled.
  3. When enabling a rule, the execution status gets updated to pending just like when a rule is created. This removes the confusion of seeing an error message in the rule details page when the rule didn't finish running yet (after being re-enabled).
  4. Renames alert instance status "OK" to "Recovered".

@mikecote mikecote added Feature:Alerting v8.0.0 release_note:skip Skip the PR/issue when compiling release notes UX Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) v7.14.0 labels Apr 23, 2021
@mikecote mikecote self-assigned this Apr 23, 2021
@mikecote mikecote marked this pull request as ready for review April 23, 2021 13:24
@mikecote mikecote requested a review from a team as a code owner April 23, 2021 13:24
@elasticmachine
Copy link
Contributor

Pinging @elastic/kibana-alerting-services (Team:Alerting Services)

@pmuellr
Copy link
Member

pmuellr commented Apr 23, 2021

I can't find an issue for this, but I've been thinking it would be more useful for us to show instances that have recovered with a visual status of "recovered", instead of "ok" as it does today. Makes it more clear that it WAS active, previously. And set expectations that you aren't seeing all the potential range of alert instances, only those that have "alerted".

Haven't looked at the size of this issue yet, but if we're doing a couple clean ups of this kind of stuff, this PR could be a place to put this. I think it involves just a change in the text we display in the instance table.

Copy link
Contributor

@YulNaumenko YulNaumenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mikecote
Copy link
Contributor Author

mikecote commented Apr 23, 2021

@pmuellr it was pretty easy to change, I did the change within c7b2485.

Screen Shot 2021-04-23 at 2 40 41 PM

@pmuellr
Copy link
Member

pmuellr commented Apr 23, 2021

re: ok -> recovered. I happened to think - "do we have doc to fix? maybe a screen shot?". Took a brief look - nope! Did I miss it? It ain't right that we don't have doc on the alert details UI! Let's open a new doc issue if we don't.

I think this is probably release-note worthy though. It's possible some customers may be confused by the change of "OK" to "Recovered". We should note that the semantics have not changed; these have always been alert instances which were previously active but are no longer active, we just changed the label to better reflect that state.

@@ -953,6 +953,11 @@ export class AlertsClient {
),
updatedBy: username,
updatedAt: new Date().toISOString(),
executionStatus: {
status: 'pending',
lastExecutionDate: new Date().toISOString(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Does it make sense to reset the lastExecutionDate here? I guess it would quickly be overwritten the next time the rule runs, but technically, this date is not the last time the rule was executed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we do set that field to the same "current date" for other cases of status: 'pending'. So, it's consistent. I think we picked the name lastExecutionDate before starting to use pending, and so ... the name kinda doesn't make sense any more. Would be better described (I think) as "lastStateUpdateDate" or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be better described (I think) as "lastStateUpdateDate" or something.

I think that makes sense. Should I create an issue for 8.0 release? This way we can make it a breaking change.

I think copying how it's done on create is ok for now but definitely strange.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DateDateDate

I'm not against changing the name to better reflect reality, especially as an 8.0 thing. Though I wonder if lastStateUpdate makes sense - the date value is supposed to reflect the last execution, except we knew we didn't have that date in some previous 7.x release, and obviously don't have one for newly created rules. So it does kind of make sense to continue with the current name, with the caveat that you need to check to see if the status is 'pending' which means "it hasn't run yet". Basically, if we call it lastStateUpdate, would folks expect other changes might also change this date? What happens to these fields when disabling a rule?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I read this as "the execution status is pending, and it's been pending since lastExecutionDate". So any name that can reflect that and last time the rule executed I'm +1. Seems like a separate issue to be created?

@@ -463,6 +465,74 @@ describe('disable button', () => {
handler!({} as React.FormEvent);
expect(enableAlert).toHaveBeenCalledTimes(1);
});

it('should bring back error banner after re-enable if previously dismissed', async () => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it expected that I will see the error banner without refreshing the page after reenabling? I see the error banner disappear after disabling but after enabling, I have to refresh the page to see the error banner again

Apr-26-2021 08-23-42

Copy link
Contributor Author

@mikecote mikecote Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After re-enabling the rule, it would have a pending status which I believe explains why the banner doesn't re-appear until you refresh the page (or click the refresh button). Because by then, the rule had a chance to run once and update the execution status. Does this experience seem strange?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. I think that's fine. The test name made me think the error banner should reappear right away but I think it's ok if it doesn't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. I fixed the test name in this commit: 624ae75.

@ymao1
Copy link
Contributor

ymao1 commented Apr 26, 2021

re: ok -> recovered. I happened to think - "do we have doc to fix? maybe a screen shot?". Took a brief look - nope! Did I miss it? It ain't right that we don't have doc on the alert details UI! Let's open a new doc issue if we don't.

https://www.elastic.co/guide/en/kibana/master/rule-details.html We do have a screenshot for this

@mikecote
Copy link
Contributor Author

re: ok -> recovered. I happened to think - "do we have doc to fix? maybe a screen shot?". Took a brief look - nope! Did I miss it? It ain't right that we don't have doc on the alert details UI! Let's open a new doc issue if we don't.

https://www.elastic.co/guide/en/kibana/master/rule-details.html We do have a screenshot for this

I will update those for sure!

@mikecote mikecote added release_note:fix and removed release_note:skip Skip the PR/issue when compiling release notes labels Apr 27, 2021
@mikecote mikecote changed the title Fix some UX issues around rules in error state and being disabled Rename alert status OK to Recovered and fix some UX issues around disabling a rule while being in an error state Apr 27, 2021
@mikecote
Copy link
Contributor Author

I think this is probably release-note worthy though. It's possible some customers may be confused by the change of "OK" to "Recovered". We should note that the semantics have not changed; these have always been alert instances which were previously active but are no longer active, we just changed the label to better reflect that state.

I went ahead and updated the PR title and release note label for this 🙏

@@ -119,6 +119,7 @@ export default function createEnableAlertTests({ getService }: FtrProviderContex
.auth(user.username, user.password)
.expect(200);
expect(typeof updatedAlert.scheduled_task_id).to.eql('string');
expect(updatedAlert.execution_status.status).to.eql('pending');
Copy link
Contributor Author

@mikecote mikecote Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this is flaky in the scenario the rule finished running before the API call is made.

Error: expected 'ok' to sort of equal 'pending'
    at Assertion.assert (/dev/shm/workspace/parallel/16/kibana/node_modules/@kbn/expect/expect.js:100:11)
    at Assertion.eql (/dev/shm/workspace/parallel/16/kibana/node_modules/@kbn/expect/expect.js:244:8)
    at Context.<anonymous> (test/alerting_api_integration/spaces_only/tests/alerting/enable.ts:52:55)
    at Object.apply (/dev/shm/workspace/parallel/16/kibana/node_modules/@kbn/test/src/functional_test_runner/lib/mocha/wrap_function.js:73:16) {
  actual: 'ok',
  expected: 'pending',
  showDiff: true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to remove the integration test: cfb1c25. It will always be a race condition between enabling the rule and checking if the execution status is pending. So we'll have to rely on unit tests.

@@ -49,6 +49,7 @@ export default function createEnableAlertTests({ getService }: FtrProviderContex
.set('kbn-xsrf', 'foo')
.expect(200);
expect(typeof updatedAlert.scheduled_task_id).to.eql('string');
expect(updatedAlert.execution_status.status).to.eql('pending');
Copy link
Contributor Author

@mikecote mikecote Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to self: this is flaky in the scenario the rule finished running before the API call is made.

Error: expected 'ok' to sort of equal 'pending'
    at Assertion.assert (/dev/shm/workspace/parallel/16/kibana/node_modules/@kbn/expect/expect.js:100:11)
    at Assertion.eql (/dev/shm/workspace/parallel/16/kibana/node_modules/@kbn/expect/expect.js:244:8)
    at Context.<anonymous> (test/alerting_api_integration/spaces_only/tests/alerting/enable.ts:52:55)
    at Object.apply (/dev/shm/workspace/parallel/16/kibana/node_modules/@kbn/test/src/functional_test_runner/lib/mocha/wrap_function.js:73:16) {
  actual: 'ok',
  expected: 'pending',
  showDiff: true
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had to remove the integration test: cfb1c25. It will always be a race condition between enabling the rule and checking if the execution status is pending. So we'll have to rely on unit tests.

@mikecote
Copy link
Contributor Author

re: ok -> recovered. I happened to think - "do we have doc to fix? maybe a screen shot?". Took a brief look - nope! Did I miss it? It ain't right that we don't have doc on the alert details UI! Let's open a new doc issue if we don't.

https://www.elastic.co/guide/en/kibana/master/rule-details.html We do have a screenshot for this

I will update those for sure!

This is updated in 8b41644.

@mikecote mikecote requested review from pmuellr and ymao1 April 29, 2021 17:05
@mikecote
Copy link
Contributor Author

@pmuellr @ymao1 this PR is good for another review 🙂

Copy link
Contributor

@ymao1 ymao1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@mikecote
Copy link
Contributor Author

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Metrics [docs]

Async chunks

Total size of all lazy-loaded chunks that will be downloaded as the user navigates the app

id before after diff
triggersActionsUi 1.5MB 1.5MB +93.0B
Unknown metric groups

References to deprecated APIs

id before after diff
canvas 29 25 -4
crossClusterReplication 8 6 -2
fleet 4 2 -2
globalSearch 4 2 -2
indexManagement 12 7 -5
infra 5 3 -2
licensing 18 15 -3
monitoring 109 56 -53
total -73

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

cc @mikecote

@mikecote mikecote added the auto-backport Deprecated - use backport:version if exact versions are needed label May 13, 2021
@mikecote mikecote merged commit bb7057c into elastic:master May 13, 2021
kibanamachine added a commit to kibanamachine/kibana that referenced this pull request May 13, 2021
…abling a rule while being in an error state (elastic#98135)

* Fix UX when alert is disabled and in an error state

* Reset executionStatus to pending after enabling an alert

* Renames alert instance status OK to Recovered

* Fix end to end test

* Update doc screenshot

* Fix confusing test name

* Remove flakiness in integration test

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
@kibanamachine
Copy link
Contributor

💚 Backport successful

Status Branch Result
7.x

This backport PR will be merged automatically after passing CI.

kibanamachine added a commit that referenced this pull request May 13, 2021
…abling a rule while being in an error state (#98135) (#100063)

* Fix UX when alert is disabled and in an error state

* Reset executionStatus to pending after enabling an alert

* Renames alert instance status OK to Recovered

* Fix end to end test

* Update doc screenshot

* Fix confusing test name

* Remove flakiness in integration test

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>

Co-authored-by: Mike Côté <mikecote@users.noreply.github.com>
gmmorris added a commit to chrisronline/kibana that referenced this pull request May 14, 2021
* master: (27 commits)
  Disable contextMenu when event is not event.kind=event (elastic#100027)
  Updates the monorepo-packages list (elastic#100096)
  Removes circular deps for lists in tooling and bumps down byte limit for lists (elastic#100082)
  [Security Solutions] Breaks down the io-ts packages to decrease plugin size (elastic#100058)
  fix-typo: Use of `than` instead of `then` (elastic#100030)
  [Fleet] Fix error when searching for keys whose names have spaces (elastic#100056)
  [Workplace Search] Fix bug when transitioning to personal dashboard (elastic#100061)
  [index pattern field editor] Update runtime field painless docs url (elastic#100014)
  [QA] Switch tests to use importExport - visualize (elastic#98063)
  [Canvas] Remove unused legacy autocomplete component (elastic#99215)
  Re-enable formerly flaky shareable test (elastic#98826)
  [Uptime] [Synthetics Integration] ensure that proxy url is not overwritten (elastic#99944)
  [Security Solutions][Lists] Trims down list plugin size by breaking out the exception builder into chunks by using react lazy loading (elastic#99989)
  [Uptime] Increase debounce and add immediate submit to `useQueryBar` (elastic#99675)
  chore(NA): moving @kbn/docs-utils into bazel (elastic#100051)
  [Enterprise Search] Fix SchemaFieldTypeSelect axe issues (elastic#100035)
  Remove outdated comment about schema validation not working (it does work now). (elastic#100055)
  Rename alert status OK to Recovered and fix some UX issues around disabling a rule while being in an error state (elastic#98135)
  [Fleet] Do not use async method in plugin setup|start (elastic#100033)
  Skip flaky functional test suite
  ...
yctercero pushed a commit to yctercero/kibana that referenced this pull request May 25, 2021
…abling a rule while being in an error state (elastic#98135)

* Fix UX when alert is disabled and in an error state

* Reset executionStatus to pending after enabling an alert

* Renames alert instance status OK to Recovered

* Fix end to end test

* Update doc screenshot

* Fix confusing test name

* Remove flakiness in integration test

Co-authored-by: Kibana Machine <42973632+kibanamachine@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
auto-backport Deprecated - use backport:version if exact versions are needed Feature:Alerting release_note:enhancement Team:ResponseOps Label for the ResponseOps team (formerly the Cases and Alerting teams) UX v7.14.0 v8.0.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Alerting] UX error after disabling Issues with alert statuses UI
6 participants