Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Stack Monitoring] Change out of the box alerts to be opt-in, rather than auto-created #100133

Closed
jasonrhodes opened this issue May 14, 2021 · 35 comments · Fixed by #101565
Closed
Assignees
Labels
Epic: Stack Monitoring Alerting Alignment Feature:Alerting Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Monitoring Stack Monitoring team

Comments

@jasonrhodes
Copy link
Member

jasonrhodes commented May 14, 2021

There have been a number of issues where users have been surprised or frustrated by the auto-creation of alerts in Stack Monitoring.

Some examples:

We should make these alerts opt-in per space and remove the space configuration setting from #99128, so that each user can choose to create these alerts in their space, if they want them (but no alerts would be auto-created).

AC:

  • Stack monitoring alerts are no longer auto-created, but instead can be created by opting in via the UI.
    • To start out, this should be all or nothing, i.e. "create all SM alerts". We can improve this so that users can choose specific alerts later.
  • If alerts are already created, we don't need to show the opt-in decision (alerts can be removed from the central management, I think -- need to confirm)

Related:

@jasonrhodes jasonrhodes added Team:Monitoring Stack Monitoring team Feature:Alerting Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Feature:Stack Monitoring labels May 14, 2021
@elasticmachine
Copy link
Contributor

Pinging @elastic/stack-monitoring (Team:Monitoring)

@elasticmachine
Copy link
Contributor

Pinging @elastic/logs-metrics-ui (Team:logs-metrics-ui)

@jasonrhodes
Copy link
Member Author

My first thought was just an opt-in button, but if we want to give the user more guidance, we can additionally show a modal on first visit to the UI that would explain the alerts that are about to be created and give the user the option to choose from options such as "Yes", "Not now", or "Don't show this again". We may also want to give the users a way to access this modal again (with just the "yes" / "not now" options at that point) in case they decide they would like to enable the out of the box alerts later, after choosing "Don't show this again".

@ravikesarwani
Copy link
Contributor

We should discuss the user experience when we are ready to implement the change keeping in mind the 80-20 rule.
80% of the folks want the rules/alerts to be deployed by default. So we should optimize the experience for those users.

A starter flow I envision could be something like this:
When user visits the SM UI where out-of-the box rules aren't deployed we bring a large pop up dialog:
Stack monitoring comes with many out-of-the box rules to notify you of common issues around cluster health, resource utilization and errors or exceptions. Learn more...
Create these out-of-the box rules? Yes, No
The default selection is "yes". Maybe this is also a modal kind of dialog where user needs to "ok" before they can browse SM UI pages.

If user selected No we saved the user preference.
In this case a button in the corner (Create out-of-the box rules) will be available in subsequent visits so that users can enable out-of-the box rules if they want at any time later.

@jasonrhodes
Copy link
Member Author

jasonrhodes commented May 24, 2021

80% of the folks want the rules/alerts to be deployed by default.

Yeah, that's the part I want to make sure we're extra clear about, agreed. I can see the possibility that "80% of users want to have rules and alerts set up for stack monitoring" would be true, but "on by default, without any interaction from me, the user" feels much less likely, to me. Especially since actions need to be set up so that the notifications go to the right place, and other threshold tweaks that are likely to be needed.

Personally, I think the goal should be "Stack Monitoring alerts should be incredibly simple to turn on, edit, adjust, and/or turn off." So that users who want them can get them with almost no trouble at all, and users who don't want them or who want them but not with the default settings are never surprised by them being on. And once a user has indicated somehow that they don't want them, we have to make sure we don't continue to re-create them.

I think all of these can be solved with a one-time-per-space modal interaction when the user visits the SM UI. @katefarrar do we have instances of a full-page modal meant to require input from the user before they move on to viewing the UI? I think the modal would need 3 options:

  1. Yes (turn on alerts, perhaps with some lightweight configuration available if they want it),
  2. No, hide this message
  3. Not right now (in this case we will bring the modal up again next time)

Then we'll also need to know what to do for the following scenarios:

  1. In case (2) above, how does a user change their mind later and turn the alerts on?
  2. In cases (1), (3), and (4), how does a user turn the alerts off?
  3. In case (5), how does a user turn alerts back on? This is probably the same as (4).

Unfortunately, I don't think this can wait for a full design cycle, so we'll need to see if we can use EUI to do something simple and iterate on better design as we go.

@katefarrar
Copy link
Contributor

@jasonrhodes we should just be able to use the EUI modal (which will grow to fit the contents).

Since we don't have a specific Settings page for Stack Monitoring, for now it might work to have an alerts on / off toggle when a user enters Setup Mode.

@jasonrhodes
Copy link
Member Author

Perfect, I think the only other thing to figure out will be how to store state about when to not show the modal anymore, and whether we just want to use localStorage (per user/browser) or a saved object (slower, but per space).

@estermv estermv self-assigned this May 31, 2021
@estermv
Copy link
Contributor

estermv commented Jun 1, 2021

It seems that alerts are created when the user visits the Cluster Overview page and the Listing page. Do we want to show the modal on both pages also? (the first one the user enters on)

From what I see in the code, alerts are not created per cluster, so I was thinking that it could be confusing that you can enable/disable them from the Setup Mode in one cluster (although if it's something temporary it could be ok)

@ravikesarwani
Copy link
Contributor

I think the code skips the cluster listing page (and goes directly to cluster overview) when only 1 cluster is being monitored.

@jasonrhodes
Copy link
Member Author

jasonrhodes commented Jun 1, 2021

@estermv good questions.

It seems that alerts are created when the user visits the Cluster Overview page and the Listing page. Do we want to show the modal on both pages also? (the first one the user enters on)

I think the easiest path right now would be to do it in both places (i.e. anywhere that we would be checking for/creating the alerts). If we can move that up to a wrapper that handles it in one place and wraps both pages, that'd be best?

From what I see in the code, alerts are not created per cluster, so I was thinking that it could be confusing that you can enable/disable them from the Setup Mode in one cluster (although if it's something temporary it could be ok)

I think we should maybe just put an "Alerts" link in the top bar like every other observability app, and then in the popover that shows up on click, we can put the choices in there?

Mouse_Highlight_Overlay

It would be the only item in that menu for us, in Stack Monitoring, but possibly not for long if we move the set up mode toggle up there as well, possibly.

@katefarrar are you okay with that idea for now? Setup Mode is a bit tricky so if we can avoid adding to it for the moment, and try to keep smoothing the differences between apps, this might be best. I am not 100% sure how flexible the inside of that popover is going to be, is the only concern I have here...

@estermv
Copy link
Contributor

estermv commented Jun 2, 2021

I think the code skips the cluster listing page (and goes directly to cluster overview) when only 1 cluster is being monitored.

Yes, that's correct, but when there is more than one cluster, alerts are created on the listing page instead of the cluster overview (if the listing page is the first one the user visits)

I think the easiest path right now would be to do it in both places (i.e. anywhere that we would be checking for/creating the alerts). If we can move that up to a wrapper that handles it in one place and wraps both pages, that'd be best?

I don't feel that the place where currently the alerts are created is the right place to check for alerts and open the modal. Instead of a wrapper like you suggest, I was thinking on encapsulate all this logic in a component and then just include the component in both pages. When I have it more defined I'll sync with @igoristic to check if that would be a good approach or if there is something I'm missing.

I think we should maybe just put an "Alerts" link in the top bar like every other observability app, and then in the popover that shows up on click, we can put the choices in there?

I like the idea! I can see how flexible it is and then we can decide. Just to confirm, that would be in addition to the modal, right?

@jasonrhodes
Copy link
Member Author

In addition, yes. And you can sync with me on the wrapper vs component idea, Whichever you think is better will probably be good. Thanks!

@ravikesarwani
Copy link
Contributor

I like the idea of the "Alerts" link at the top. This can server multiple purpose.
An option under that can provide a way for the users to create/deploy out-of-the-box rules (Create default rules) if they selected "No" in the initial modal.

This can also add another option for "View/Edit rules" instead of what we have currently with the "setup mode".

@katefarrar
Copy link
Contributor

@estermv @jasonrhodes @ravikesarwani I also like the idea of adding the link to the header bar. Wanted to point out this PR the Obs wide effort to unify the language used around Alerts / Rules: #100918

It would be great if we could match what @katrin-freihofner is doing there.

@estermv
Copy link
Contributor

estermv commented Jun 9, 2021

I've been able to have the popover in the top bar. What options does it need to have?
From the comments above I think the options will be:

  • Create default rules (it will appear if the user selected "No/remind later" options in the initial modal)
  • Disable alerts (it will appear if the user selected "Yes" in the initial modal) and the behavior if the user selects this option should be the same as if it checks the disable option for all the alerts:

Screenshot 2021-06-09 at 12 16 44

I would keep the View/Edit rules as a separate issue as it would involve many changes in the UI.

Does this make sense?

Also, does it make sense that this dropdown appears on all pages?

@estermv
Copy link
Contributor

estermv commented Jun 9, 2021

I just realized that on the Listing page we show an "Alert Status":

Screenshot 2021-06-09 at 15 28 03

What should we show in this case, if the user didn't enable the alerts?

@ravikesarwani
Copy link
Contributor

Ester, for me it may help if you can (in short) describe the solution you are trying to build?
I do not fully understand the previous popover question and will need more context.
For the Alert status question we can show (N.A.). We have that in few other places already.

@estermv
Copy link
Contributor

estermv commented Jun 9, 2021

@ravikesarwani that's a good point. I'll try to summarize what I've been building until now.

When users first visit Stack Monitoring, they are going to see a modal similar to this one (still need a design review):
Screenshot 2021-06-09 at 17 02 27

  • If they select "Yes" -> alerts are created, the modal doesn't appear anymore
  • If they select "No" -> alerts are not created, the modal doesn't appear anymore
  • If they select "Remind me later" -> alerts are not created, the modal will appear next time they visit Stack Monitoring

Then, as suggested in #100133 (comment) we talked about the Alerts link in the top bar (that opens a popover on click), mainly to allow users that selected "No" in the alerts to be able to create the out-of-the-box rules (as an alternative to having an on / off toggle somewhere in the page).
So my question was around what options should we display in the alert dropdown.

Now I'm thinking that another option is to show only the Alerts link when users select "No"/"Remind me later", and think about adding more options later.

@ravikesarwani
Copy link
Contributor

Thanks @estermv. This is very helpful and looks great. Good work here!

Some minor feedback to think about on the modal dialog
I was thinking if "Remind me later" implemented as a link is a pattern we use in other places. To me it felt a little less intuitive since its really tied with the "No" option. One thought I was exploring was:
createrules

Thoughts?
@katrin-freihofner any quick feedback from the design side (Kate is on vacation)?
cc: @jasonrhodes

For "Alerts and rules" link I was thinking we should have 2 options:

  • Create default rules
  • View/Edit rules

Create default rules: This will create rules or update them. If user deleted rules in "Rules and connectors" this option can then be used by users to create them again (something that we are doing right now on SM page load and need to exist somewhere).
View/Edit rules: This will replace the current functionality of "Enter setup mode" tied to View/Edit of these rules.
We can show a message that says "No rules exists" if we detect that is the case.
Using the "Enter setup mode" currently to view/modify rules is not working and no one can really find it. And since we don't allow view/edit from "Rules and connectors", users (and internal Elastic folks) are lost how to see the details or edit default rules. We can potentially do this in a separate issue but I think creating the "Alerts and rules" link enables us to tag along and make this change.

@jasonrhodes
Copy link
Member Author

What should we show in this case, if the user didn't enable the alerts?

Here I'd explore a grey circle with "Not enabled"?

For "Remind me later", I like @ravikesarwani's modal idea but I would make "Remind me later" a button sibling to "OK" because the idea is either "Decide yes/no" or "Remind me later" because you don't want to decide.

@jasonrhodes
Copy link
Member Author

jasonrhodes commented Jun 9, 2021

For "Alerts and rules" link I was thinking we should have 2 options:

  • Create default rules
  • View/Edit rules

Create default rules: This will create rules or update them. If user deleted rules in "Rules and connectors" this option can then be used by users to create them again (something that we are doing right now on SM page load and need to exist somewhere).
View/Edit rules: This will replace the current functionality of "Enter setup mode" tied to View/Edit of these rules.
We can show a message that says "No rules exists" if we detect that is the case.
Using the "Enter setup mode" currently to view/modify rules is not working and no one can really find it. And since we don't allow view/edit from "Rules and connectors", users (and internal Elastic folks) are lost how to see the details or edit default rules. We can potentially do this in a separate issue but I think creating the "Alerts and rules" link enables us to tag along and make this change.

I was just talking to @simianhacker and @neptunian about this more broadly, and I think it might be a good idea for now to make this menu provide the following conditional options:

  1. If we detect that you are missing any of the default rules, "Create default rules" or something to that effect
  2. If you have at least 1 rule, "Manage rules" which takes you to the Stack Management UI

For (2) this won't let them edit them yet, but this functionality is coming soon so I'd hate to do too much work here swapping out the Setup Mode stuff and then undo it all when we can just send the to the Stack Management UI.

This doesn't make the edit flow better but it also doesn't make it worse, and allows us to stay streamlined in getting the big things fixed so that the whole experience gets better. Once the edit/create option is turned back on, I think we can have another set of tickets for rethinking setup mode on a deeper level.

@jasonrhodes
Copy link
Member Author

Also, we should centralize on wording for this set of rules. I like "Default rules" better than "out-of-the-box rules", but I think I like "Recommended Stack Monitoring Rules" best.

"Create recommended stack monitoring rules..." etc.

I'd also leave off the "(If you are using different Kibana spaces for monitoring)" from next to the "No" option — I like providing context for the "Yes" option as to why we recommend it, but the "No" could be useful for other reasons we don't anticipate, so I think it'd be best to leave it off and just let it be implicit.

@ravikesarwani
Copy link
Contributor

Thanks Jason for your thoughtful comments. They are really great feedback and I am good with most of those.
Few things I would want us to think and discuss more:

If you have at least 1 rule, "Manage rules" which takes you to the Stack Management UI

For this one am not hyped that users to be sent to Stack Management UI because they can't view or edit SM rules there (right now). Its a dead link for them basically and I am unsure that I would like to implement this even if its for 1 or 2 releases.
This also begs the question: Are we planning to allow viewing/editing rules only from stack management UI in the longer run? I know most apps don't allow viewing/editing in place but I feel that they should (not sure if they have any plans). It's a superior user experience (where SM rules are mapped to the object types visually) that we already have in SM that I would like to see if we can continue with.

Having "Manage rules" execute the current flow of showing available rules and allow viewing/editing in-place is something that I think we should try to keep.

Also, we should centralize on the wording for this set of rules. I like "Default rules" better than "out-of-the-box rules", but I think I like "Recommended Stack Monitoring Rules" best.

I agree we should centralize on the wording. Out-of-the box wording is something that we have used so far, including in docs etc. The only reason I suggested "Create default rules" as the menu option was because its shorter. One option maybe to use "Default rules" and update the docs for that as well. I am good with either.

I am unsure about "recommended" because in literal sense it can be construed as incorrect. Different clusters have different performance characteristics and what we are giving them is a starting point that they are free to modify based on their cluster use cases & load/performance/error acceptability.

@estermv
Copy link
Contributor

estermv commented Jun 10, 2021

Thanks, @ravikesarwani and @jasonrhodes for your answers!
It seems that the "Alerts and rules" dropdown is still a bit far to be ready to be implemented and needs a bit more thinking and discussion around it.

So I would like to suggest decoupling the two streams of work that I see here:

1- Focus on solving the current problem that is described in this issue

There have been a number of issues where users have been surprised or frustrated by the auto-creation of alerts in Stack Monitoring

This would be solved by adding the modal.
The option of selecting "No" in the modal creates another problem:

users should be able to enable out-of-the-box rules if they want at any time later.

For this, I think that we can just go back to a very simple solution and show a "Create default rules" button in the navigation bar that simply appears when any of the default rules is missing (Instead of the dropdown).
Even though is far from perfect it's easy and simple

2- Discuss the options for the alerts dropdown
I think this appeared as a side effect of trying to find a solution for the "Create default rules" button and I feel that needs more thinking. I see both pros and cons in both solutions but I wouldn't rush on making a decision.

If you have at least 1 rule, "Manage rules" which takes you to the Stack Management UI

For (2) this won't let them edit them yet, but this functionality is coming soon so I'd hate to do too much work here swapping out the Setup Mode stuff and then undo it all when we can just send them to the Stack Management UI.

If alerts can't be edited right now in Stack Management I don't see the point on have a link in the dropdown that takes you there as it could be frustrating for the users

Having "Manage rules" execute the current flow of showing available rules and allow viewing/editing in-place is something that I think we should try to keep.

Enter the setup mode seems a big refactor that I don't think is worth it if we want to have users going to Stack Management to edit alerts.
After finishing the implementation of the modal I can investigate how much effort would take to implement this. It seems a big refactor but I'm not familiar enough with the codebase to be sure about it. It would help us to take a more informed decision.

So, we can focus on (1) and solve the original problem and then think about how to add the "Alerts and rules" dropdown.

@ravikesarwani
Copy link
Contributor

I am good with fast incremental progress and focusing on (1) and get that delivered.
I do however feel that it maybe better user experience if we solve that via a menu option (Alerts and rules->Create default rules) and not via a button that appears sometimes. I am a little bit uneasy about the user experience where things appears and disappears based on certain conditions.
I was thinking if we want that option to be present all the time. Executing that option is a no op when all rules exists otherwise it creates the missing rules. It can show a toast message/popup like:

  • Created default rules. Use Enter setup mode to review or update these rules and add additional actions.
  • All default rules exist. Use Enter setup mode to review or update these rules.

Another thing I remember that we should make sure continues to work in this new flow is the API call we make to ES for deleting corresponding default Watches. See elastic/elasticsearch#64373 & #81020. An API was added by ES team and we created corresponding kibana alerts and call this API to delete related Watches. Feels like create default rules will need to execute this API flow as well.

@estermv
Copy link
Contributor

estermv commented Jun 10, 2021

I do however feel that it maybe better user experience if we solve that via a menu option (Alerts and rules->Create default rules) and not via a button that appears sometimes. I am a little bit uneasy about the user experience where things appears and disappears based on certain conditions.
I was thinking if we want that option to be present all the time. Executing that option is a no op when all rules exists otherwise it creates the missing rules. It can show a toast message/popup like:

* Created default rules. Use Enter setup mode to review or update these rules and add additional actions.

* All default rules exist. Use Enter setup mode to review or update these rules.

Yes, that's true. And it's also easy and simple 😊

Another thing I remember that we should make sure continues to work in this new flow is the API call we make to ES for deleting corresponding default Watches. See elastic/elasticsearch#64373 & #81020. An API was added by ES team and we created corresponding kibana alerts and call this API to delete related Watches. Feels like create default rules will need to execute this API flow as well.

I'm calling the same API endpoint used for the auto-creation and I'm planning to use the same one for the "Create default rules", so this should work, but I'll double-check it.

@jasonrhodes
Copy link
Member Author

If alerts can't be edited right now in Stack Management I don't see the point on have a link in the dropdown that takes you there as it could be frustrating for the users

The management UI is where they delete rules for now. Hopefully, we will enable create/edit in that management flow soon, as well, but for now, it's where they can remove them.

I don't think we should allow users to do something we know they can't do, so we can leave the option there "greyed out" if we don't want it to disappear in certain states.

@estermv
Copy link
Contributor

estermv commented Jun 15, 2021

After a quick chat with @jasonrhodes in zoom, we decided that we are going to make the modal space agnostic as we feel that showing it per space could be more confusing for users.
As we have the "Create default rules" option in the navigation bar, they can always create them if they want.

The modal will be shown if the user doesn't have alerts created and they didn't decide anything in the modal (when users decide something we will save it to the localStorage)

@neptunian
Copy link
Contributor

neptunian commented Jun 16, 2021

Not sure if this was taken into consideration, but we will soon enable the possibility for multiple types of alerts to exist. If a user already created, for example, a monitoring_alert_cpu_usage alert, the app would see that as the alert already having been created. Or if they delete their default alert but leave the custom alert. Currently, since we only have one type of rule allowed, that is the default rule. Once there are many types of the same type allowed, there will be no concept of "default" rule or way to differentiate if the "default" exists unless we introduce a way and check for it. Perhaps its okay and we don't care so long as some rule exists for that type.

@jasonrhodes
Copy link
Member Author

Good point, @neptunian -- thanks for calling it out. In my opinion, I think if they are creating their own Stack Monitoring alerts, we can feel confident that they don't need the "default" ones, especially per type. So I'm comfortable with this working the way it is. @ravikesarwani do you have any other concerns about this?

@jasonrhodes
Copy link
Member Author

jasonrhodes commented Jun 17, 2021

One more thing to consider: we currently have a config option that @simianhacker implemented a few weeks ago that restricts the auto-creation of SM alerts to a single space (default), but can be configured to auto-create those alerts in as many spaces as the user likes, listed out by ID.

When we no longer create them by default, I imagine we should just remove this config entirely? As it is, I don't think we'll be using it anywhere at all, since it was only used when deciding whether to create the alerts on behalf of the user...

@ravikesarwani
Copy link
Contributor

I think if they are creating their own Stack Monitoring alerts, we can feel confident that they don't need the "default" ones, especially per type. So I'm comfortable with this working the way it is.

I don't have concerns regarding the topic on this PR since this PR won't enable users to create new custom SM rules. When we fix 91145I think we will need to think through the use case some more to make sure we are covered.

@estermv
Copy link
Contributor

estermv commented Jun 28, 2021

I opened a PR for the first version of this: #101565.

It seems that alerts are created when the user visits the Cluster Overview page and the Listing page. Do we want to show the modal on both pages also? (the first one the user enters on)

I added the modal only on the cluster page. After looking a bit into the code and playing with it, I think I need a little bit more time to further investigate it since if I add it in the same way I added it on the cluster overview page it seems that there is something that is not working as I would expect. I added more technical details in a follow-up issue I just created: #103456

@estermv
Copy link
Contributor

estermv commented Jun 28, 2021

I was also checking the docs, @ravikesarwani, I guess we need to update this page https://www.elastic.co/guide/en/kibana/current/kibana-alerts.html, this sentence in particular:

When you open Stack Monitoring, the preconfigured rules are created automatically. They are initially configured to detect and notify on various conditions across your monitored clusters. You can view notifications for Cluster health, Resource utilization, and Errors and exceptions for Elasticsearch in real-time.

How much detail do we have to add?

@ravikesarwani
Copy link
Contributor

@estermv I would really like the UI to be self sufficient and feel what you have added as a flow shouldn't require too much extra documentation. The only clarification I was thinking was if they selected "No" is it clear right now "That they can create these rules easily later on and how".

I was wondering if we should add an extra line in the modal itself to make this clear. Something like:
"If you selected No, default rules can be created later on from the Alerts and rules menu."

In the document I think we should add Alerts and rules menu and the option Create default rules at the end of the the document just describing in few lines what this option is and what it does.

Alerts and rules
Create default rules
This option can be used to create default rules in this kibana spaces. This is useful for scenarios when you didn't choose to create these default rules initially or anytime later if the rules were accidentally deleted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Epic: Stack Monitoring Alerting Alignment Feature:Alerting Feature:Stack Monitoring Team:Infra Monitoring UI - DEPRECATED DEPRECATED - Label for the Infra Monitoring UI team. Use Team:obs-ux-infra_services Team:Monitoring Stack Monitoring team
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants