-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[APM]: Replace error occurrence watchers with Kibana Alerting #46547
Conversation
Woops, this was supposed to be a draft PR. I'll update the description in a bit. |
💔 Build Failed |
*/ | ||
|
||
export const ERROR_OCCURRENCE_ALERT_TYPE_ID = 'apm.error_occurrence'; | ||
export const NOTIFICATION_EMAIL_ACTION_ID = 'apm.notificationEmail'; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need this right now. I was figuring out if we could create the email action we need when registering the alert type, but because I never got around to implementing email it's not being used.
'elasticsearch', | ||
'xpack_main', | ||
'apm_oss', | ||
'alerting', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if these should be required, or if they are optional. If it's the latter, not sure how we can get access to these plugins on startup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Really good question for platform how to handle optional dependencies. I think it has come up before but can't remember the answer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New platform allows for optional dependencies, but legacy platform does not. If you are depending on something in a legacy plugin that could be disabled, you need to make sure that you implement isEnabled and check that the dependency is enabled, otherwise Kibana will crash.
.map(email => email.trim()) | ||
.filter(email => !!email) | ||
: []; | ||
const email = this.state.actions.email ? this.state.emails : ''; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've simplified this because we can just pass a csv to the action, because it uses nodemailer
under the hood. Never tested it though.
}) | ||
); | ||
|
||
return alertsClient.create({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if we're allowed to do this or if we have to use the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is ok to do, all the business logic currently lives within the client and you're getting the client from the request (which is good). You will just have to handle validation until we have some within the client (if we decided to do so).
import { createRoute } from './create_route'; | ||
import { createAlert } from '../lib/alerting/error_occurrence/create_alert'; | ||
|
||
export const errorOccurrenceAlertRoute = createRoute(core => ({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've created a route for this, because we have to execute several concurrent and sequential requests, and the server is a more robust environment for that kind of dependencies.
} | ||
}; | ||
|
||
const { callWithInternalUser } = elasticsearch.getCluster('admin'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also use services.callCluster(...)
. It will be in the context of the user who created the alert (security wise). The approach you have is fine as well if you don't want that.
f01d96d
to
087d6ec
Compare
💔 Build Failed |
087d6ec
to
c6b3361
Compare
💔 Build Failed |
this.props.onClose(); | ||
const id = 'id' in savedObject ? savedObject.id : NOT_AVAILABLE_LABEL; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe I'm outmoded, but isn't the in
operator generally discouraged since it's at risk for prototype hijacking?
'<br/>' + | ||
'{errorLogMessage}<br/>' + | ||
'{errorCulprit}<br/>' + | ||
'{docCount} occurrences<br/>', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do want to preserve new lines \n
characters in email template like in the slack template?
Closing in favor of #59566. |
This is a proof of concept that replaces the APM error occurrence Watcher (an Elasticsearch feature) with the new Kibana alerting and actions plugin. The Slack action succesfully fires, but I haven't bothered with the Email action because the approach is pretty similar, and we decided to timebox this to two days.
Here's the gist:
Alert Type
for error occurrences. AnAlert Type
is essentially a function (called an executor) that, given a set of parameters, decides whether (and which) actions need to be triggered. Actions are triggered via anAlert Instance
, which captures state of previous executions (for the purpose of this POC, I don't think we need that state).Alert
object on the server. AnAlert
is a configuration object for anAlert Type
, that tells it at what interval it and with what parameters it should execute.Alert
also configures the action groups that can be triggered from theAlert Type
executor. For this POC, we create Slack and Email actions based on the user's input when configuring an alert, and add them to the default group.Alert Type
, that now runs at the configured interval for theAlert
that was created, we run a query for the number of error occurrences and determine whether the threshold was exceeded. If so, we fire the default actions for theAlert
, which can be either the Slack or Email action, or both.To test this, make sure to explicitly enable both required plugins in your kibana config file:
Notes:
Alert
only supportsinterval
as a scheduling option. That means that we cannot run the executor at a given time each day, which is something that our current implementation does support. I've been told expanding the scheduling options is on the roadmap.interval
parameter. However, it doesn't seem like that's automatically available in the executor, so we pass it as a parameter instead. Maybe there's a nicer way to solve this.Some questions/suggestions for the Alerting/Actions team:
secrets
param, but this is not reflected in the documentation (https://github.com/elastic/kibana/blob/master/x-pack/legacy/plugins/actions/README.md)interval
available in the executor as well (if it's not already there and I missed it). Seems like it's a common use case.actionGroups
is for in theAlert Type
. It's seemingly not documented, but it is required.registerType
could be improved by having typedparams
(instead ofRecord<string, any>
). Similar to what we did for APM in [APM] migrate to io-ts #42961 (happy to open a PR if welcome).kbn-action
CLI tool has been super useful, thanks for taking the time to build that.