Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change map to key-value pair collection in Logs Data Model #3938

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,9 @@ release.

### Logs

- Change map to key-value pair collection in Logs Data Model.
([#3938](https://github.com/open-telemetry/opentelemetry-specification/pull/3938))

### Events

### Resource
Expand Down
28 changes: 16 additions & 12 deletions specification/logs/data-model.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
* [Requirements](#requirements)
* [Definitions Used in this Document](#definitions-used-in-this-document)
+ [Type `any`](#type-any)
+ [Type `map`](#type-mapstring-any)
+ [Type `[]keyval`](#type-keyvalstring-any)
* [Field Kinds](#field-kinds)
- [Log and Event Record Definition](#log-and-event-record-definition)
* [Field: `Timestamp`](#field-timestamp)
Expand Down Expand Up @@ -99,7 +99,7 @@ The Data Model aims to successfully represent 3 sorts of logs and events:

### Definitions Used in this Document

In this document we refer to types `any` and `map<string, any>`, defined as
In this document we refer to types `any` and `[]keyval<string, any>`, defined as
follows.

#### Type `any`
Expand All @@ -112,18 +112,22 @@ Value of type `any` can be one of the following:

- An array (a list) of `any` values,

- A `map<string, any>`,
- A `[]keyval<string, any>`,

- [since 1.31.0] An empty value (e.g. `null`).

#### Type `map<string, any>`
#### Type `[]keyval<string, any>`

Value of type `map<string, any>` is a map of string keys to `any` values. The
keys in the map are unique (duplicate keys are not allowed). The representation
of the map is language-dependent.
Value of type `[]keyval<string, any>` is a collection of key-value pairs with
string keys and [`any`](#type-any) values.

Arbitrary deep nesting of values for arrays and maps is allowed (essentially
allows to represent an equivalent of a JSON object).
Arbitrary deep nesting of values for arrays and key-value collections MUST be
allowed (essentially allows to represent an equivalent of a JSON object).

The type representation is language-dependent.
It is implementation-specific whether the collection can contain duplicated keys.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a breaking change. We can't allow this. It expands the possible shape of the data that recipients should be able to deal with. This is also not possible to represent in OTLP, so I think this is a no go.

Copy link
Member Author

@pellared pellared Mar 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm. All possibe receipents (processors, exporters)? Would it also be seen as breaking if in SDKs the OTLP exporters would handle the deduplication?

I have an idea to propose adding an option like WithoutKeyValueDeduplication to Go Logs SDK for users who favor performance over the danger of potentially losing some log records. By default the SDK would get rid of duplicates.

EDIT: I am leaning towards closing this PR and issue if the idea above is reasonable and acceptable.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm. All possibe receipents (processors, exporters)?

This document is the logical data model and applies to anyone who creates the data and who reads the data, so yes it applies to processors, exporters, SDKs, Collector, backends.

Would it also be seen as breaking if in SDKs the OTLP exporters would handle the deduplication?

I think it is fine to do the deduplication anywhere you want as long as externally observable data complies with this document.

I have an idea to propose adding an option like WithoutKeyValueDeduplication to Go Logs SDK for users who favor performance over the danger of potentially losing some log records. By default the SDK would get rid of duplicates.

I think it is fair to provide non-default options like this, with a clear warning that the caller is responsible for ensuring there are no duplicates, assuming there is a need for such high performance.

Provided that such option option is possible to add later in non-breaking manner to Go SDK I would not do it right now and will only add it after we gather evidence from real-world usage that it is really needed by the users.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fair to provide non-default options like this, with a clear warning that the caller is responsible for ensuring there are no duplicates, assuming there is a need for such high performance.

Provided that such option option is possible to add later in non-breaking manner to Go SDK I would not do it right now and will only add it after we gather evidence from real-world usage that it is really needed by the users.

@tigrannajaryan, sounds like a good plan 👍 Do you think that the specification should be updated before we introduce such option?

CC @open-telemetry/cpp-maintainers @open-telemetry/rust-maintainers @open-telemetry/dotnet-maintainers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think that the specification should be updated before we introduce such option?

Probably, no, if it is only for Go. Likely, yes, if multiple languages will use it.

If the implementation allows having duplicates, then some exporters (e.g. OTLP)
may require deduplication (removing pairs so that none of them have the same key).
Comment on lines +129 to +130
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My assumption was that the OTLP proto was meant to reflect the Data Model and I assumed the semantics of the OTLP proto were documented in the data model. Is that not the case? Are the semantics of each completely independent?

Specially the sentence "The type representation is language-dependent" makes it seem like this document is exclusively referring to the in-memory representation of logs in the SDK.

Copy link
Member Author

@pellared pellared Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specially the sentence "The type representation is language-dependent" makes it seem like this document is exclusively referring to the in-memory representation of logs in the SDK.

This is a good comment.

I thought that is describing the model for the SDK. I guess I was wrong. Maybe I should change the Logs SDK specification instead? I would need guidance from specification approvers/maintainers.

Copy link
Member Author

@pellared pellared Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are the semantics of each completely independent?

For the SDK it makes sense to support other log exporters than OTLP which can have better performance when there is no de-duplication involved. I believe that there are even backends which are able to support key-values with duplicated keys (without losing the duplicates). There is a lot of software using OTel APIs and SDK for instrumenting application and NOT exporting these via OTLP. I would say that tightly coupling the model to OTLP is against OpenTelemetry "promise" (from https://opentelemetry.io/docs/what-is-opentelemetry/):

OpenTelemetry is vendor- and tool-agnostic, meaning that it can be used with a broad variety of Observability backends

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear: I am not against the 'spirit' of this change, I am just not sure what is the best way to word it :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also understand why we may want to be against this change as if SDKs would offer such functionalities then users may want to expect similar things from the Collector (from similar reasons). I have a feeling that this change would make more sense if it would be acceptable for the Collector and we would also update the comments OTLP proto: open-telemetry/opentelemetry-proto#533. At last, the value of this change may not be worth its cost. But at the same time, maybe we should design SDKs to make this change possible in future (so that the deduplication is an implementation detail and we would not need to make any public API changes to support duplicated key-values).


### Field Kinds

Expand All @@ -133,7 +137,7 @@ fields:

- Named top-level fields of specific type and meaning.

- Fields stored as `map<string, any>`, which can contain arbitrary values of
- Fields stored as `[]keyval<string, any>`, which can contain arbitrary values of
different types. The keys and values for well-known fields follow semantic
conventions for key names and possible values that allow all parties that work
with the field to have the same interpretation of the data. See references to
Expand All @@ -149,7 +153,7 @@ The reasons for having these 2 kinds of fields are:
- Ability to enforce types of named fields, which is very useful for compiled
languages with type checks.

- Flexibility to represent less frequent data as `map<string, any>`. This
- Flexibility to represent less frequent data as `[]keyval<string, any>`. This
includes well-known data that has standardized semantics as well as arbitrary
custom data that the application may want to include in the logs.

Expand Down Expand Up @@ -444,7 +448,7 @@ is optional.

### Field: `Attributes`

Type: [`map<string, any>`](#type-mapstring-any).
Type: [`[]keyval<string, any>`](#type-keyvalstring-any).
Copy link
Member

@TylerHelmuth TylerHelmuth Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've not spent much time in the spec, but changing the data type of a field in a stabilized data format feels like a breaking change to me. The data model is more explicit than the proto that Attributes are a map and since the model is stable I feel like we should assume implementations are depending on this fact (even though we know some languages are ignoring this fact).

Copy link
Contributor

@MrAlias MrAlias Mar 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure this is changing the data-type as much as renaming it to account for implementations that don't implement this as a map.

Any implementation that was compatible prior to this change, will still be compatible after it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming it to account for implementations that don't implement this as a map.

But not implementing as a map was technically against specification right?

Any implementation that was compatible prior to this change, will still be compatible after it.

Agreed, but it allows 2 different implementation to generate incompatible data right? If 1 data source generates attributes as a map and another generates data as a list, when the data comes together, either in the collector or some backend, I anticipate problems, or at least unexpected/undefined behavior.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unresolving per @TylerHelmuth request

Copy link
Member Author

@pellared pellared Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But not implementing as a map was technically against specification right?

Hard to say as the spec was not using normative language.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a instrumentation-only concern and does not change how the exporters emit the data using OTLP please let me know.

It does feel like a breaking change to me to change a data type. I'd like to stick as closely as possible to semver as possible so that any implementers of our spec are not unexpectedly affected.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If 1 data source generates attributes as a map and another generates data as a list, when the data comes together, either in the collector or some backend, I anticipate problems, or at least unexpected/undefined behavior.

The exporter should deduplicate the key-values if it is required.

Additionally, the user can always use a processor which deduplicates key-values.

Copy link
Member Author

@pellared pellared Mar 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a instrumentation-only concern and does not change how the exporters emit the data using OTLP please let me know

Yes it is. I explicitly write in the PR description that OTLP exporters still have to send deduplicated key-values.


Description: Additional information about the specific event occurrence. Unlike
the `Resource` field, which is fixed for a particular source, `Attributes` can
Expand Down
Loading