A Serverless plugin to easily add CloudWatch alarms to functions
npm i serverless-plugin-aws-alerts
New versions of this fork should be published from the master
branch
using npm publish
. Remember to bump the package version as appropriate before publishing.
service: your-service
provider:
name: aws
runtime: nodejs4.3
custom:
alerts:
stages: # Optionally - select which stages to deploy alarms to
- production
- staging
dashboards: true
nameTemplate: $[functionName]-$[metricName]-Alarm # Optionally - naming template for alarms, can be overwritten in definitions
topics:
ok: ${self:service}-${opt:stage}-alerts-ok
alarm: ${self:service}-${opt:stage}-alerts-alarm
insufficientData: ${self:service}-${opt:stage}-alerts-insufficientData
definitions: # these defaults are merged with your definitions
functionErrors:
period: 300 # override period
customAlarm:
description: 'My custom alarm'
namespace: 'AWS/Lambda'
nameTemplate: $[functionName]-Duration-IMPORTANT-Alarm # Optionally - naming template for the alarms, overwrites globally defined one
metric: duration
threshold: 200
statistic: Average
period: 300
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanOrEqualToThreshold
alarms:
- functionThrottles
- functionErrors
- functionInvocations
- functionDuration
plugins:
- serverless-plugin-aws-alerts
functions:
foo:
handler: foo.handler
alarms: # merged with function alarms
- customAlarm
- name: fooAlarm # creates new alarm or overwrites some properties of the alarm (with the same name) from definitions
namespace: 'AWS/Lambda'
metric: errors # define custom metrics here
threshold: 1
statistic: Minimum
period: 60
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanOrEqualToThreshold
You can define several topics for alarms. For example you want to have topics for critical alarms reaching your pagerduty, and different topics for noncritical alarms, which just send you emails.
In each alarm definition you have to specify which topics you want to use. In following example you get an email for each function error, pagerduty gets alarm only if there are more than 20 errors in 60s
custom:
alerts:
topics:
critical:
ok:
topic: ${self:service}-${opt:stage}-critical-alerts-ok
notifications:
- protocol: https
endpoint: https://events.pagerduty.com/integration/.../enqueue
alarm:
topic: ${self:service}-${opt:stage}-critical-alerts-alarm
notifications:
- protocol: https
endpoint: https://events.pagerduty.com/integration/.../enqueue
nonCritical:
alarm:
topic: ${self:service}-${opt:stage}-nonCritical-alerts-alarm
notifications:
- protocol: email
endpoint: alarms@email.com
definitions: # these defaults are merged with your definitions
criticalFunctionErrors:
namespace: 'AWS/Lambda'
metric: Errors
threshold: 20
statistic: Sum
period: 60
evaluationPeriods: 10
comparisonOperator: GreaterThanOrEqualToThreshold
okActions:
- critical
alarmActions:
- critical
nonCriticalFunctionErrors:
namespace: 'AWS/Lambda'
metric: Errors
threshold: 1
statistic: Sum
period: 60
evaluationPeriods: 10
comparisonOperator: GreaterThanOrEqualToThreshold
alarmActions:
- nonCritical
alarms:
- criticalFunctionErrors
- nonCriticalFunctionErrors
If topic name is specified, plugin assumes that topic does not exist and will create it. To use existing topics, specify ARNs or use Fn::ImportValue to use a topic exported with CloudFormation.
custom:
alerts:
topics:
alarm:
topic: arn:aws:sns:${self:region}:${self::accountId}:monitoring-${opt:stage}
custom:
alerts:
topics:
alarm:
topic:
Fn::ImportValue: ServiceMonitoring:monitoring-${opt:stage, 'dev'}
You can configure subscriptions to your SNS topics within your serverless.yml
. For each subscription, you'll need to specify a protocol
and an endpoint
.
The following example will send email notifications to me@example.com
for all messages to the Alarm topic:
custom:
alerts:
topics:
alarm:
topic: ${self:service}-${opt:stage}-alerts-alarm
notifications:
- protocol: email
endpoint: me@example.com
You can configure notifications to send to webhook URLs, to SMS devices, to other Lambda functions, and more. Check out the AWS docs here for configuration options.
You can monitor a log group for a function for a specific pattern. Do this by adding the pattern key. You can learn about custom patterns at: http://docs.aws.amazon.com/AmazonCloudWatch/latest/logs/FilterAndPatternSyntax.html
The following would create a custom metric log filter based alarm named barExceptions
. Any function that included this alarm would have its logs scanned for the pattern exception Bar
and if found would trigger an alarm.
custom:
alerts:
definitions:
barExceptions:
metric: barExceptions
threshold: 0
statistic: Minimum
period: 60
evaluationPeriods: 1
comparisonOperator: GreaterThanThreshold
pattern: 'exception Bar'
bunyanErrors:
metric: bunyanErrors
threshold: 0
statistic: Sum
period: 60
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanThreshold
pattern: '{$.level > 40}'
Note: For custom log metrics, namespace property will automatically be set to stack name (e.g.
fooservice-dev
).
You can define custom naming template for the alarms. nameTemplate
property under alerts
configures naming template for all the alarms, while placing nameTemplate
under alarm definition configures (overwrites) it for that specific alarm only. Naming template provides interpolation capabilities, where supported placeholders are:
$[functionName]
- function name (e.g.helloWorld
)$[functionId]
- function logical id (e.g.HelloWorldLambdaFunction
)$[metricName]
- metric name (e.g.Duration
)$[metricId]
- metric id (e.g.BunyanErrorsHelloWorldLambdaFunction
for the log based alarms,$[metricName]
otherwise)
Note: All the alarm names are prefixed with stack name (e.g.
fooservice-dev
).
The plugin provides some default definitions that you can simply drop into your application. For example:
alerts:
alarms:
- functionErrors
- functionThrottles
- functionInvocations
- functionDuration
If these definitions do not quite suit i.e. the threshold is too high, you can override a setting without creating a completely new definition.
alerts:
definitions: # these defaults are merged with your definitions
functionErrors:
period: 300 # override period
treatMissingData: notBreaching # override treatMissingData
The default definitions are below.
definitions:
functionInvocations:
namespace: 'AWS/Lambda'
metric: Invocations
threshold: 100
statistic: Sum
period: 60
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanOrEqualToThreshold
treatMissingData: missing
functionErrors:
namespace: 'AWS/Lambda'
metric: Errors
threshold: 1
statistic: Sum
period: 60
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanOrEqualToThreshold
treatMissingData: missing
functionDuration:
namespace: 'AWS/Lambda'
metric: Duration
threshold: 500
statistic: Average
period: 60
evaluationPeriods: 1
comparisonOperator: GreaterThanOrEqualToThreshold
treatMissingData: missing
functionThrottles:
namespace: 'AWS/Lambda'
metric: Throttles
threshold: 1
statistic: Sum
period: 60
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanOrEqualToThreshold
treatMissingData: missing
The plugin allows users to provide custom dimensions for the alarm. Dimensions are provided in a list of key/value pairs {Name: foo, Value:bar}
The plugin will always apply dimension of {Name: FunctionName, Value: ((FunctionName))}, except if the parameter omitDefaultDimension: true
is passed. For example:
alarms: # merged with function alarms
- name: fooAlarm
namespace: 'AWS/Lambda'
metric: errors # define custom metrics here
threshold: 1
statistic: Minimum
period: 60
evaluationPeriods: 1
comparisonOperator: GreaterThanThreshold
omitDefaultDimension: true
dimensions:
- Name: foo
Value: bar
'Dimensions': [
{
'Name': 'foo',
'Value': 'bar'
},
]
Statistic not only supports SampleCount, Average, Sum, Minimum or Maximum as defined in CloudFormation here, but also percentiles. This is possible by leveraging ExtendedStatistic under the hood. This plugin will automatically choose the correct key for you. See an example below:
definitions:
functionDuration:
namespace: 'AWS/Lambda'
metric: Duration
threshold: 100
statistic: 'p95'
period: 60
evaluationPeriods: 1
datapointsToAlarm: 1
comparisonOperator: GreaterThanThreshold
treatMissingData: missing
MIT © A Cloud Guru