Project Proposal: Audit Logging SIG #2409

mlenkeit · 2024-10-24T10:47:51Z

This PR contains a project proposal for an Audit Logging SIG as discussed on Slack.

We are aware that the project proposal still has several tbd's especially with regard to staffing and timeline that need to be defined before the SIG can start working.

We will approach other vendors directly with this proposal to identify additional contributors. Of course, anyone who comes across this proposal here on GitHub is invited to contribute.

While we do have some ideas about a potential timeline for semantic conventions, OTEL SDK/API and collector adjustments respectively, we would like to align this with other contributors first before publishing.

Any feedback from the community on the proposed scope of the SIG is highly appreciated!

Open topics

The following items reference topics from the PR discussion that are still open:

immutability, tamper-proof logs, signing - in scope?

linux-foundation-easycla · 2024-10-24T10:47:56Z

The committers listed above are authorized under a signed CLA.

✅ login: mlenkeit / name: Maximilian Lenkeit (f81c2f4, 9337b7f, 5094fb1, 65ae32e, 405ddb5, 0adb8e5, 75f2c57, a5ef343, d7e265f, 776b821, 6dd519d, 3876a31, 2ec002d, 087865c, 711dc46, a6b34f1, 066501b, 70cbac4, 8b38626)

mtwo · 2024-10-25T17:58:22Z

projects/audit-logging.md

+
+Audit Logging is currently not within the scope of OpenTelemetry
+
+- no semantic conventions for audit logs in OTEL


Suggested change

- no semantic conventions for audit logs in OTEL

- There aren't currently any semantic conventions designed specifically for audit logs in OTEL

mtwo · 2024-10-25T18:02:03Z

Are there any requirements around signing logs / detecting tampering? I've heard that mentioned before in the context of audit logs, but I don't know how common of a requirement it is

reyang · 2024-10-31T04:16:00Z

projects/audit-logging.md

+
+Audit logging describes the capability of capturing audit-trail relevant events of a system to meet compliance requirements. Such events may originate from the infrastructure (e.g. a Kubernetes cluster) up to the application-level. It is a capability that is particularly relevant for providers of enterprise software.
+
+Unlike regular application logs, audit logs are usually subject to long retention periods and software providers must guarantee their completeness (i.e. guarantee of delivery).


Good points! In addition, these are something we might want to consider:

Audit logs might be considered as a critical part of the business, which could result in a different API design strategy - for example, audit logging might require a different API behavior, if the information provided by the caller is invalid, the API might throw exception instead of failing silently and move on.

Audit log might require some sensitive information without redaction due to the regulation requirements (e.g. user identity and client IP address).

The data path could require higher level of access control or privilege.

@reyang thanks for mentioning these points.

Especially the API behavior is something that we had thought about initially. However, when we first pitched audit logging on Slack, we received the following comment from Ted Young:

As a rule, the OpenTelemetry API never throws an exception. I understand why you might want this, though it is not present in many audit logging systems, which use regular loggers. So a strong case would have to be made on this particular point.

Based on this initial feedback, we decided to file this SIG proposal without proposing such API changes.

projects/audit-logging.md

reyang · 2024-10-31T04:25:09Z

projects/audit-logging.md

+* Sponsors: tbd
+* GC liaison: tbd
+* Engineers:
+  * SAP will provide a prototype in two languages (tbd; likely two of Java, JavaScript, Go)


I think we need prototype in two parts:

API/SDK - this is where we need three programming languages IIRC.

OTel Collector - higher guarantee on data delivery (completeness, integrity, latency, etc.), data path security.

Thanks for pointing this out! It's clear to us, but I'll work on making this clearer in the doc...

projects/audit-logging.md

Co-authored-by: Reiley Yang <reyang@microsoft.com>

mlenkeit · 2024-11-19T14:08:00Z

Are there any requirements around signing logs / detecting tampering? I've heard that mentioned before in the context of audit logs, but I don't know how common of a requirement it is

@mtwo for all I know, immutability of audit logs is a common requirement although not all audit logging systems/use cases that I've seen address this requirement with technical measures but sometimes also organizational measures. However, given the flexibility of OTel processing queues (i.e. different topologies of collectors), having a technical solution in OTel would be favorable.

@reyang what is your opinion on this?

svrnm · 2024-11-19T17:57:52Z

projects/audit-logging.md

+
+Audit Logging is currently not within the scope of OpenTelemetry
+
+- no semantic conventions for audit logs in OTel


Suggested change

- no semantic conventions for audit logs in OTel

- no semantic conventions for audit logs in OTel

Can you provide some examples of what would be part of such semantic conventions? My knowledge on audit logs is very limited, so it would help to understand the problem much better.

@svrnm our experience has shown that in order to analyze audit logs at scale, it is important to define an (extensible) event catalog. The event catalog standardizes audit log events across workloads/produces. For example, our internal event catalog currently consists of 50+ such events. Ideally, such a catalog would be part of semantic conventions.

To make this more tangible, I've added some examples to the appendix of the document:
https://github.com/open-telemetry/community/pull/2409/files#diff-736e6b0ae9ae655b78d9ba007d08592071abb6cc1ef64d7893ff81642c8ec734R115-R192

another examples from the security world is https://github.com/ocsf.

thanks @mlenkeit. Makes it much clearer

The metadata looks like attributes that would be covered by other semantic conventions (e.g. there is a log.record.id for the metadata.id, the timestamp of course and some of the other ones (e.g. for k8sCluster we have k8s.cluster.name. So I would assume here it is more about re-using and extending certain other domains that are not unique to "audit logs"

For the event and data examples you gave, I would argue that they are not "semantic conventions for audit logs" but "semantic conventions for log types that typically require the strict requirements of auditing". What do I mean by that: if we talk about "semantic conventions for audit logs" I think about a namespace called audit. that holds attributes that are specific to the business logic of audit logging, like a signature that helps to tamperproof the log line, or maybe even meta information under which regulation this log is required to be an "audit log"
In contrast "semantic conventions for log types that typically require the strict requirements of auditing" are their own namespaces like the "UserLoginFailure" example would fall into a "authentication" or "auth" namespace, with "auth.login.method" or "auth.login.failureReason" as potential attributes, event.name being set to auth.login.failure or something.

I am just making those things up to exemplify the difference, they will probably take a different form or shape eventually, so to make a long story short, here is a suggestion to rephrase:

Suggested change

- no semantic conventions for audit logs in OTel

- no semantic conventions for audit logs in OTel

- no semantic conventions for log types that typically rquire the strict requirements of auditing, like authentication, authorization and data changes

@renewelches thanks for calling out OCSF, if I remember correctly there were conversations in the past between OTel and OCSF, cc @lmolkova

Regarding metadata, I fully agree: Most of these attributes are already covered by semconv. We may identify additional attributes in SIG meetings though, depending on the experience/requirement of other contributors/companies.

I understand how "semantic conventions for audit logs" can be misleading. To me, the suggestion that you made has a notion of particularly describe logs that are "already there" (e.g. events emitted by a K8s cluster) and can be considered relevant for audit purposes. Especially in enterprise software, it's common that applications produce logs that are specifically mean to be audit logs (and nothing else). To me, it' s important that we find wording that covers these two types that we do have.

How about the following?

Suggested change

- no semantic conventions for audit logs in OTel

- no semantic conventions for representing and identifying audit trail-relevant events in OTel (like authentication, authorization or modification of

As mentioned in another comment, this all depends on what attributes are changeable or must be immutable. As of my understanding an attribute could be altered by a processor in the collector. Which is something we would want to avoid or want to prevent in cases of audit logs. If we conclude that we can or should only guarantee immutability for the log itself then we must live with replication/doublication. Otherwise we might have to add the constrain that also certain attributes must be immutable.

+1 to looking into OCSF for security events and borrowing relevant semantic conventions from there.

projects/audit-logging.md

…ora/open-telemetry-community into audit-logging-sig-project-proposal

projects/audit-logging.md

reyang · 2024-11-21T14:06:11Z

projects/audit-logging.md

+
+- no semantic conventions for audit logs in OTel
+- OTel APIs/SDKs do not provide feedback to the application level whether data (in particular logs) have been successfully delivered to a remote endpoint. To guarantee delivery, either the SDK has to give those guarantees, or provide feedback to the application so that it can take care of guaranteed delivery itself.
+- OTel collectors may lose audit logs in transit (i.e. no guarantee of delivery)


Does "OTel collectors" mean the "OpenTelemetry Collector" (https://github.com/open-telemetry/opentelemetry-specification/tree/main/specification#project-naming) or "any collectors that can handle OpenTelemetry data (whether OTLP or something else)"?

@reyang This is indeed supposed to refer to the OpenTelemetry Collector. I've rephrased this particular sentence to "OTel Collector instances may lose audit logs in transit" and adjusted other occurrences of OTel collectors as well. 066501b

reyang · 2024-11-21T14:08:42Z

projects/audit-logging.md

+
+- OTel collector receives the event:
+
+  To ensure that the event is not lost even if the collector process is terminated or crashes, the collector may need to persist the event before acknowledging receipt to the workload or SDK. If the event cannot be persisted, receipt must be rejected.


What is the expectation if the collector instance disappeared (e.g., the machine running the collector exploded / was stolen)?

I think this is the most tricky part, or to put it in a question: do we need guarantee of delivery between 2 components (workload->collector,collector->S3) or end-to-end (workload->S3)?

I would assume "end-to-end" except the collector can guarantee that data is persisted according the auditing requirements

(e.g., the machine running the collector exploded / was stolen)

If the solution for audit logging with OTel meant that the OTel Collector had an own persistence, I would argue that theft/explosion/etc. are rather in the responsibility of Operations in terms of configuring said persistence such that it is resilient "enough".

Or to make this more concrete: if for example something such as the storage extension was used, Operations would need to make sure that the database/file/redis storage runs in an HA mode.

I'm stressing the if here, because I think is a detail that the SIG should work out. Or do you think that's something that should rather be clarified upfront?

I'm stressing the if here, because I think is a detail that the SIG should work out. Or do you think that's something that should rather be clarified upfront?

I suggest that we leave this for the SIG to figure out. In the OTEP, I suggest that we avoid "guaranteed delivery" and use something like "certain degree/level of data delivery guarantee". Not a blocker for this PR though (I'm good with the current version).

+1 for what @reyang wrote. I think it is good to have this in the appendix and some wording around it, since there is many people (including myself) who have a superficial knowledge around audit logs, so it helps to contextualize and understand what this is all about. So no more details are needed in this doc, this would be for the SIG to figure out

projects/audit-logging.md

renewelches · 2024-11-21T20:40:03Z

Are there any requirements around signing logs / detecting tampering? I've heard that mentioned before in the context of audit logs, but I don't know how common of a requirement it is

@mtwo for all I know, immutability of audit logs is a common requirement although not all audit logging systems/use cases that I've seen address this requirement with technical measures but sometimes also organizational measures. However, given the flexibility of OTel processing queues (i.e. different topologies of collectors), having a technical solution in OTel would be favorable.
@reyang what is your opinion on this?

@mlenkeit I think this can be achieved as long as OpenTelemetry is designed to allow additive changes, doesn't have to be there in the first place. I personally haven't seen people signing logs, and I've seen lots of cases where immutable data path is used.

I think the key parts are encryption at REST e.g. when a buffer writes to disk and encryption at TRANSPORT.
Where it is getting tricky is when we have to separate audit log data and meta/transport data. In some cases this might lead to duplication. E.g the K8s cluster name that triggered the audit event could be part of the audit log message. In this case it must be immutable. But it is also an OTLP attribute and as such could be changed. Or to phrase it differently, is immutability required for the whole signal or just the message?

mlenkeit · 2024-11-25T17:14:21Z

Releasing for review as per @reyang's (offline) suggestion. I'm aware that there's open tbd's that we still need to fill.

tigrannajaryan · 2024-12-03T16:25:54Z

projects/audit-logging.md

+
+  The workload emits the event via the OTel API/SDK. It may wait for acknowledgement of receipt from the collector before proceeding. If the event is rejected or receipt is not acknowledged in time, the workload or SDK may act accordingly, e.g. retry, rollback a database transaction, inform the user, etc.
+
+- OTel Collector receives the event:


Take a look at the current requirements: https://github.com/open-telemetry/opentelemetry-collector/blob/b9ff1bc54c992bc76cc9ecb0a7ee1f0f591f6d23/receiver/doc.go#L31

This open issue tracks compliance with requirements: open-telemetry/opentelemetry-collector#7460

mlenkeit added 3 commits October 24, 2024 12:20

docs(auditlogs): add audit logging sig proposal

5094fb1

docs(auditlogs): re-number requirements

9337b7f

docs(auditlogs): remove template instructions

75f2c57

mtwo reviewed Oct 25, 2024

View reviewed changes

trask mentioned this pull request Oct 25, 2024

REQUEST: Repository maintenance on the community repo #2415

Closed

reyang reviewed Oct 31, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Oct 31, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Oct 31, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Oct 31, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Oct 31, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Oct 31, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit commented Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit commented Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit commented Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit commented Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit commented Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit and others added 2 commits November 19, 2024 14:24

docs(auditlogs): use OTel over OTEL

f81c2f4

Co-authored-by: Reiley Yang <reyang@microsoft.com>

docs(auditlogs): list @reyang as first sponsor

65ae32e

mlenkeit force-pushed the audit-logging-sig-project-proposal branch from c1aca6e to 65ae32e Compare November 19, 2024 13:44

mlenkeit added 3 commits November 19, 2024 14:45

docs(auditlogs): add Microsoft to interested vendors

776b821

docs(auditlogs): add contacts to vendor list

6dd519d

docs(auditlogs): use consistent punctuation for requirement list

2ec002d

svrnm reviewed Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

svrnm reviewed Nov 19, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit and others added 8 commits November 20, 2024 16:00

docs(auditlogs): minor word change in Challenges chapter

d7e265f

docs(auditlogs): describe guarantee of delivery in appendix

405ddb5

docs(auditlogs): add sample audit logs to appendix

0adb8e5

docs(auditlogs): add links to sample audit logs

a5ef343

docs(auditlogs): add links to appendix A

711dc46

docs(auditlogs): use GitHub handle only in staffing list

087865c

docs(auditlogs): add svrnm as GC liaison

3876a31

Merge branch 'audit-logging-sig-project-proposal' of github.com:apeir…

8b38626

…ora/open-telemetry-community into audit-logging-sig-project-proposal

reyang reviewed Nov 21, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Nov 21, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

reyang reviewed Nov 21, 2024

View reviewed changes

projects/audit-logging.md Outdated Show resolved Hide resolved

mlenkeit added 2 commits November 22, 2024 12:13

docs(auditlogs): minor changes in wording

066501b

docs(auditlogs): shorten requirement ids to pass spell check

70cbac4

svrnm added the area/project-proposal Submitting a filled out project template label Nov 25, 2024

mlenkeit marked this pull request as ready for review November 25, 2024 17:14

mlenkeit requested review from alolita, austinlparker, danielgblanco, jpkrohling, mx-psi, tedsuo and trask as code owners November 25, 2024 17:14

Merge branch 'main' into audit-logging-sig-project-proposal

a6b34f1

tigrannajaryan reviewed Dec 3, 2024

View reviewed changes

timojohlo mentioned this pull request Dec 10, 2024

[opentelemetry] Create Plugin for Audit-Logs cloudoperators/greenhouse-extensions#544

Open

5 tasks

jack-berg mentioned this pull request Dec 12, 2024

Support persistent batch processor to prevent telemetry data loss open-telemetry/opentelemetry-java#6940

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Project Proposal: Audit Logging SIG #2409

Project Proposal: Audit Logging SIG #2409

mlenkeit commented Oct 24, 2024 •

edited

Loading

linux-foundation-easycla bot commented Oct 24, 2024 •

edited

Loading

mtwo Oct 25, 2024

mtwo commented Oct 25, 2024

reyang Oct 31, 2024

mlenkeit Nov 19, 2024 •

edited

Loading

reyang Oct 31, 2024

mlenkeit Nov 19, 2024 •

edited

Loading

mlenkeit commented Nov 19, 2024

svrnm Nov 19, 2024

mlenkeit Nov 21, 2024

renewelches Nov 21, 2024

svrnm Nov 22, 2024

mlenkeit Nov 22, 2024

renewelches Nov 22, 2024

tigrannajaryan Dec 3, 2024

reyang Nov 21, 2024

mlenkeit Nov 22, 2024

reyang Nov 21, 2024

svrnm Nov 22, 2024

mlenkeit Nov 22, 2024 •

edited

Loading

reyang Nov 23, 2024

svrnm Nov 25, 2024 •

edited

Loading

renewelches commented Nov 21, 2024

mlenkeit commented Nov 25, 2024

tigrannajaryan Dec 3, 2024


		Audit Logging is currently not within the scope of OpenTelemetry

		- no semantic conventions for audit logs in OTEL

	- no semantic conventions for audit logs in OTEL
	- There aren't currently any semantic conventions designed specifically for audit logs in OTEL


		Audit logging describes the capability of capturing audit-trail relevant events of a system to meet compliance requirements. Such events may originate from the infrastructure (e.g. a Kubernetes cluster) up to the application-level. It is a capability that is particularly relevant for providers of enterprise software.

		Unlike regular application logs, audit logs are usually subject to long retention periods and software providers must guarantee their completeness (i.e. guarantee of delivery).

	- no semantic conventions for audit logs in OTel
	- no semantic conventions for audit logs in OTel
	- no semantic conventions for log types that typically rquire the strict requirements of auditing, like authentication, authorization and data changes

	- no semantic conventions for audit logs in OTel
	- no semantic conventions for representing and identifying audit trail-relevant events in OTel (like authentication, authorization or modification of


		- OTel collector receives the event:

		To ensure that the event is not lost even if the collector process is terminated or crashes, the collector may need to persist the event before acknowledging receipt to the workload or SDK. If the event cannot be persisted, receipt must be rejected.


		The workload emits the event via the OTel API/SDK. It may wait for acknowledgement of receipt from the collector before proceeding. If the event is rejected or receipt is not acknowledged in time, the workload or SDK may act accordingly, e.g. retry, rollback a database transaction, inform the user, etc.

		- OTel Collector receives the event:

Project Proposal: Audit Logging SIG #2409

Are you sure you want to change the base?

Project Proposal: Audit Logging SIG #2409

Conversation

mlenkeit commented Oct 24, 2024 • edited Loading

Open topics

linux-foundation-easycla bot commented Oct 24, 2024 • edited Loading

Choose a reason for hiding this comment

mtwo commented Oct 25, 2024

Choose a reason for hiding this comment

mlenkeit Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlenkeit Nov 19, 2024 • edited Loading

Choose a reason for hiding this comment

mlenkeit commented Nov 19, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mlenkeit Nov 22, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

svrnm Nov 25, 2024 • edited Loading

Choose a reason for hiding this comment

renewelches commented Nov 21, 2024

mlenkeit commented Nov 25, 2024

Choose a reason for hiding this comment

mlenkeit commented Oct 24, 2024 •

edited

Loading

linux-foundation-easycla bot commented Oct 24, 2024 •

edited

Loading

mlenkeit Nov 19, 2024 •

edited

Loading

mlenkeit Nov 19, 2024 •

edited

Loading

mlenkeit Nov 22, 2024 •

edited

Loading

svrnm Nov 25, 2024 •

edited

Loading