-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Response Ops][Alerting] Research best practices for bootstrapping alerts as data indices #141146
Comments
Pinging @elastic/response-ops (Team:ResponseOps) |
Linking with #111152. |
Resources that need to be installed for framework alerts-as-data (FAAD??)
Summary of existing resource installation behaviorILM PolicyBoth the Event Log and Rule Registry plugins use a pretty standard ILM policy with a hot phase that rolls over at 50GB or 30 days. Event Log policy dictates a delete after 90 days and Rule Registry has no deletion phase. For framework alerts-as-data, we will likely want to keep the data around indefinitely as well. We should ensure we add Component templatesEvent log does not use component templates. Rule registry installs 2 common component templates at plugin setup (ECS fields and the "technical field map") and then installs solution specific component templates as needed (on write). Framework alerts as data should use component templates and it makes sense to use two, one for ECS (that is auto-generated from the latest ECS fieldset) and one for the default alerts-as-data schema. Index templatesEvent Log Rule Registry Concrete write indicesEvent Log Rule Registry Suggested Framework Alerts-as-Data resource installation behaviorILM PolicyILM policy should follow the rule registry with a hot phase and no delete phase. Ensure that
Component templatesFramework alerts as data should only need two component templates: ECS component template and an alerts-as-data component template. The ECS component template should be auto-generated from ECS with a script, similar to what the event log does, except we need to pull over all ECS fields. When creating an index template composed of these two component templates, we should ensure the ECS component template is last to ensure we are using the official ECS mappings, just in case we define a field in the alerts-as-data component template with the same name. Index template and concrete write indexWe should ensure these settings are in the index template:
Our index strategy will depend on whether we want to allow non-additive schema changes between versions: Only additive changes Non-additive changes allowed RetryWe have seen via various SDHs issues that crop up when the expected resources are not installed as expected due to external (usually ES related) errors. The rule registry currently doesn't contain much retry logic but we've recently added retry logic to event log initialization that we should look into reusing. We can also look at the retry logic used by the Fleet plugin which specifically looks for transient ES errors. |
Great research! There's a lot learned that we'll be able to re-use for the FAAD indices!
Yeah, I could see this approach being beneficial for the scenarios we mutate alerts as data documents. For example, if we add new workflow timestamps (#141464), the update operation could fail if a document gets updated and the corresponding index doesn't have the latest mappings (assuming we're in strict mode). |
Closing as research as complete and can be referenced when we implement index bootstrapping for framework alerts as data. |
The alerting framework plans to start persisting alerts-as-data at a framework level using a single
.alerts-default
index. This index will need to be created on startup with an ILM policy, component templates and aliases. We will also need the ability to update the index mappings with each version. We are currently doing similar things with the event log index and the rule registry alert indices, both of which have run into various issues, large and small. We should consolidate some of our learnings from installing those indices into best practices in order to apply them to the new.alerts-default
index.Related issues and PRs:
require_alias
when indexing event log documents #93971The text was updated successfully, but these errors were encountered: