-
Notifications
You must be signed in to change notification settings - Fork 44
Haystack: Anomaly detection and alerting
The diagram below shows high level overview of haystack integration with expedia's open-source adaptive-alerting and alert-management system.
The ideology for the above design is to decouple haystack with alerting system, so that any alerting system can be integration with haystack as per the needs. Only the integration sub-system within haystack needs to be replaced.
To incorporate the ideology, the sub-system should be based on following principles :
- The sub-system should have a mapper to map haystack-trends to a format understood by alerting system.
- The integration sub-system should implement an interface to integrate alerts/anomalies with Haystack UI.
There will be a sub-system Haystack-AA. The responsibility of the sub-system is
- Map the trends produced by haystack-trends into a format understood by the adaptive-alerting system. Haystack-trends will produce data in metrics 2.0 format and adaptive-alerting system consumes data in the same format.Hence, the mapper is not required and adaptive-alerting can directly consume trends from mdm topic. But, keeping this bridge would enable haystack to replace adaptive-alerting system with another one.
- Implement the interface for Haystack UI to interact with subscription management feature of the Alert management. This would enable haystack consumers to subscribe on alerts.
- Implement an interface for Haystack UI to query alerts so that they can be shown on UI.
Haystack-trends will produce trends in mdm topic which can be mapped by Haystack-AA Mapper to a format understood by adaptive-alerting system. The adaptive-alerting system will consume the mapped trends to produce anomalies. The anomalies will then be consumed by expedia's alert management system to convert them into alerts.
A anomaly is a deviation from normal or expected behaviour. But, not all anomalies are alerts. An alert is an anomaly on which some action needs to be taken. For eg: we might only notify owners of the data about anomalous behaviour in a service only in case of an alert. An alert can be defined as a root cause of the anomalous behaviour of several services.