-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Meta] Logging Projects #60391
Comments
I've been thinking about this more, and I think there is quite a bit of value from having separate indices for different use cases. Other than the security issue, we also can have different retention policies per use case. I think it's very likely that we'll want shorter retention for Alerting history than we would want for Audit logs. If all indices use the same prefix The only real downside I can think of is the operation complexity added by needing to migrate or reindex multiple indices for major version upgrades. However, if all of these indices are using ECS, I believe schema migrations will be quite rare. This reduces the risk of failed upgrades quite a bit. One outstanding question is whether or not these indices need to be "system indices" that are hidden from the user? If so, how much work is needed to support them in the Elasticsearch plugin? @tylersmalley are there any other concerns that I am not thinking of? |
Pinging @elastic/kibana-alerting-services (Team:Alerting Services) |
Pinging @elastic/kibana-operations (Team:Operations) |
Pinging @elastic/kibana-security (Team:Security) |
Pinging @elastic/kibana-platform (Team:Platform) |
I think we should be using filebeat for normal logs and audit logs, but not for populating the "alert history" that will show up integrated within the Alerting application. From my understanding, users should be granted access to the history for an alert if they have access to the alert itself, so these should be stored in "system indices". Is there some benefit that I'm missing from using Filebeat to create these entries as opposed to having Alerting itself insert these documents?
For the normal logs, and the audit logs I think these should be stored either in "hidden indices" or "data indices". Users should be granted access to them using normal Elasticsearch index privileges, and they should be available for use within applications like Discover, Dashboard, Visualize, Logging, etc. As far as I'm aware, data is stored in "system indices" won't ever be accessible directly to end-users using the normal Elasticsearch document APIs. If we want to ship Filebeat with Kibana, and automatically start shipping the logs to Elasticsearch without any user intervention, we should use "hidden indices" as we can make the assumption that the end-user isn't using them for something different. If we optionally want to start shipping the logs to Elasticsearch with some user intervention, we can use "hidden indices" as well. However, we could also potentially use "data indices" and allow the end-user to specify the indices this data should be ingested into. |
cc @pmuellr for the alerting question above |
The only benefits I'm aware of are the ones I listed above in the ingestion section:
We get some benefits out of the box, however there are some drawbacks. For instance, it may be tricky to present feedback to the alerting plugin on the status of history ingestion. That said, I'm supportive of treating alerting history differently. It seems to have enough different requirements to not be using the same mechanics, at least for the time being. Maybe later down the line it makes sense to consolidate these systems, but I don't think we're there yet.
I think what is key here is that we don't need to start ingesting logs right away for the Audit logging MVP. That can be an add on feature in the future. There's also some UI features we've talked about using log data for but I think we should evaluate those from first principles. For example, SO history should probably be part of a larger versioning feature rather than just showing edit log events in the UI. I think separating these efforts from the start, and then identifying overlap later may be the quicker path to delivering on all of these fronts. There are some obvious things that should share infrastructure and we should leverage those when we can, but I don't want to serialize everything artificially if it does not provide significant value |
I think this is a good path forward. You brought up some good points regarding the buffering and exponential backoffs which Filebeat has implemented. I don't want to be entirely glossing over them. @elastic/kibana-alerting-services do you know how many documents we're talking about being created every time that an alert runs? |
Is the plan to have filebeat authenticate as the |
@elastic/kibana-security are you going to use a custom appender?
I'm not sure it's a blocker for Audit Logging. Does the Audit Logger depend on built-in ES logging format? They overlap, but they have different requirements and output format. @joshdover
|
To clarify, an appender is used to direct logs to a specific output (such as a file), right? Are we allowed to create our own instances of the built-in appenders? If so, that might be sufficient. We could direct audit logs to a specific file appender (separate from the "main" file appender the rest of Kibana is using).
Not that I'm aware of. @jportner what do you think? |
I don't think that's a blocker for security audit logging. We could certainly augment security audit logs if additional performance data was available, but that's not part of our MVP. |
I was assuming that an extended version of the ECS layout may be necessary for adding the extra ECS fields. It's possible we can make the regular ECS layout support everything, but the data is just only populated in the Audit log records.
Good point, but audit logging will need the audit events to be emitted from the ES service. We should have separate issues for those.
👍
I don't think this is a hard requirement for the first phase Audit Logging, but will be needed in the second phase. |
I thought we decided to configure an existing
Yes, Layout defines the output format, not the content. So |
Some thoughts on the alerting event log, based on conversations above.
Ya, probably. I was thinking the shape of the logging service might be like saved objects, where I can create a separate ES index to serve as the storage, but reuse high-level APIs in the logging service. Even if alerting event log doesn't end up needing a separate index, not hard to imagine some other solution wanting one in the future. Something to keep in mind anyway.
One of the design alternatives for the event log was to write to a file and ingest w/filebeat, and I think this could be made to work. I ruled it out as the first version of our log as I didn't want to be the plugin that added a 10MB binary to Kibana :-) The main reason to use filebeat is to get our logger out of the business of buffering log events, in case ES goes down. But that's about it. Our planned story is to buffer a small (eg, 100) events in memory, throwing out the oldest events when the buffer gets full. So, kinda lossy. OTOH, if ES is actually down, then alerts and actions aren't going to be running either, so it's not even really clear we'd need this buffer for issues with ES being down. I'm expecting the primary benefit for the internal buffering write events is that we can bulk write them (every couple of seconds) rather than doing write-per-event. I think alerting would be perfectly fine having a nice filebeat ingestion story, once we get there, and can get by with our current "write log entries with JS calls" till then.
I think the idea of migrating logging indices, or even re-indexing, is something we want to avoid. We are currently going with a story where we create per-stack-version indices, eg,
Probably true that they should be in system indices, however we're currently going to be in crunch mode for 7.8, where we REALLY need to have some amount of the event log operational, and it doesn't feel like we could ship this as system indices for 7.8. System indices seems like the right way to go, long-term. Purpose-built API would be nice. We'd need to figure out some kind of "ILM"-ish story - could be pretty simple like a map of time durations and states - "warm storage after 1 days, cold storage after 1 week, delete after 2 weeks" kinda thing. We'd manage an actual ILM policy from a constrained API surface we exposed to users.
Concur. We're on a tight schedule for 7.8 anyway, so seems unlikely we'd have a generic logger fully operational by then, that would also be suitable for alerting. We should start thinking about what it would take to converge these separate efforts (or maybe just alerting and everything else) into a single story. And I think the "hardest" part will be converging on an extended ECS schema for Kibana. We probably want to start thinking about what our extension story is, we can make sure it will work for alerting, etc. We already have a few Kibana extensions in our small ECS subset: kibana/x-pack/plugins/event_log/generated/mappings.json Lines 64 to 94 in 452193f
Note that the
How long is a piece of string? Users could have 1000's of alerts that go off once per second, each scheduling 1000's of actions to run. Or just a handful. No idea, really. There are some known customers who use a lot of ES watches, we've been keeping those in mind in terms of needing to support that kind of scale. In theory, alerting will be "simpler" for customers to use than watcher, so you'd think we'll probably have customers creating more alerts than they created watches. |
Hey, We're changing the You can see this change in the PR here: We wanted to make sure this is visible in here in case our top level kibana key potentially clashes with the work being done for the Kibana log ECS usage. |
One of the things I want to look into for the event log used by alerting, is the new data streams support. I'm guessing the other logging uses referenced here aren't at the point of needing to think about this yet, but figured I'd mention it, see if anyone else is looking into this. The driver for using data streams for the event log are to make it easier to describe the relationship between the indices, aliases, templates and ILM policies. It turns out to be tricky to get these to all work together completely reliably, would be nice to get that additional reliability. Presumably, it also has some performance benefits for both queries and maybe writes. We'd target supporting this at a minor version level, as we currently have version-specific ES resources, so a new minor version can completely change the underlying implementation of these sorts of bits. |
Pinging @elastic/response-ops (Team:ResponseOps) |
This issue hasn't been active in 2 years, and most of the items are completed, so I'll go ahead and close it. If anyone feels we still need it, feel free to reopen. ❤️ |
This issue is intended to be the source-of-truth for all things logging currently being planned or worked on in Kibana.
Some issues belong to multiple categories or tracks of work here, so there is some duplication.
Logging Projects
Kibana Platform Logger
The Kibana Platform (aka "New Platform") has a new logger and configuration that more closely matches Elasticsearch's usage of
log4j
. You can read more about its design here.Related issues:
Audit Logging
Audit logging is a security feature. While audit logging already exists in Kibana, it is currently quite limited. This is a high priority project for Security across the Stack. The current plan is to build new audit logging features on top of the Kibana Platform logger.
Kibana Audit Logging Proposal
Related issues:
Blockers for security to begin building audit logging:
X-Opaque-Id
header to AuditTrail logs and Elasticsearch API calls PipeX-Opaque-Id
header to AuditTrail logs and Elasticsearch API calls #62018Future needs:
Alert History Log (Event Log)
The Alerting team has built a
event_log
plugin for recording specific events into a separate Elasticsearch index. Alerts do not currently integrate with this plugin yet, but it is planned in the near-term.Related issues:
Log Ingestion
Monitoring is moving to metricbeat for ingestion of monitoring data to Elasticsearch. Operations would like to do the same with filebeat for ingesting normal logs into Elasticsearch.
It may also make sense to use filebeat for alerting's history log / event log as well. It solves many of the problems that event log does not currently handle (buffering, exponential backoff, etc.)
If using filebeat fulfills requirements for both use cases, it may make sense to actually combine these into a single log and/or single index in Elasticsearch and filter for each use case at query-time.
Pending Decisions
There are a number of decisions that affect more than one of these projects and need to be made in order to unblock them:
The text was updated successfully, but these errors were encountered: