Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JSON serialization format #2235

Merged
merged 18 commits into from
Mar 11, 2022
Merged

Conversation

atoulme
Copy link
Contributor

@atoulme atoulme commented Dec 21, 2021

Fixes #1443

Changes

Adds a new experimental JSON serialization minimal format to store traces, metrics and logs as JSON on disk.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Jan 11, 2022
experimental/serialization/json.md Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like a lot about this design, in particular that it would provide a very accessible mechanism for implementing asynchronous workflows. It's probably never going to be the most elegant way to manage telemetry, but it's a great catchall.

There are a few minor points I would like to reconsider.

  • The shorthand keys ("v", "rs", "rm", "rl") clash with the preference for clarity demonstrated in telemetry objects themselves.
  • Allowing only one of "rs", "rm", "rl" makes sense as a rule, but it would be nice if the data structure enforced this in a natural way.
  • I would support a rule that each file should only contain one version and type of telemetry. Mixing versions or telemetry types could be problematic for a variety of reasons.
  • The file is actually a json lines file, so we might consider indicating this by suggesting the .jsonl file extension

I'm primarily interested in addressing these points. Probably they can be addressed with incremental changes, or not at all if that's the consensus.


I will also offer a less conventional solution to the above, which we could consider:

  • Establish a new file extension, perhaps .jsont meaning "json telemetry"
  • The .jsont file format has the following expectations:
    • Is a valid .jsonl file
    • The first line is a special metadata object that specifies the telemetry type and OTLP Protobuf version
    • All subsequent lines are of the format specified in the first (i.e. json objects at the ResourceSpans | ResourceMetrics | ResourceLogs level)
  • Example .jsont file:
    {"otlp_protobuf_version": "0.0.0", "signal_type": "metrics"}
    {"resourceMetrics":[{"resource":{"attributes":[{"key":"resource-attr","value":{"stringValue":"resource-attr-val-1"}}]},"instrumentationLibraryMetrics":[{"instrumentationLibrary":{},"metrics":[...]}]}
    {"resourceMetrics":[{"resource":{"attributes":[{"key":"resource-attr","value":{"stringValue":"resource-attr-val-2"}}]},"instrumentationLibraryMetrics":[{"instrumentationLibrary":{},"metrics":[...]}]}

experimental/serialization/json.md Outdated Show resolved Hide resolved
@atoulme
Copy link
Contributor Author

atoulme commented Jan 13, 2022

Excellent suggestions! I'll get to work incorporating them. Thank you so much!

@github-actions github-actions bot removed the Stale label Jan 14, 2022
@atoulme atoulme force-pushed the json_serialization branch 3 times, most recently from a217dc8 to c7ea094 Compare January 20, 2022 01:17
@atoulme atoulme force-pushed the json_serialization branch 2 times, most recently from ca94521 to b16f85d Compare January 21, 2022 00:02
experimental/serialization/json.md Outdated Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
atoulme and others added 3 commits January 26, 2022 13:06
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
Co-authored-by: Anthony Mirabella <a9@aneurysm9.com>
@github-actions
Copy link

github-actions bot commented Feb 3, 2022

This PR was marked stale due to lack of activity. It will be closed in 7 days.

experimental/serialization/json.md Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
experimental/serialization/json.md Outdated Show resolved Hide resolved
Co-authored-by: Tigran Najaryan <4194920+tigrannajaryan@users.noreply.github.com>
@pmm-sumo
Copy link
Contributor

pmm-sumo commented Feb 16, 2022

@aabmass what about oneof + TracesData/MetricsData/LogsData?

if you did it with oneofs, the fields can't be repeated

message OpenTelemetryData {
    oneof data {
        ResourceMetrics resourceMetrics = 1;
        ResourceLogs resourceLogs = 2;
        ResourceSpans resourceSpans= 3;
    }
}

so then the json would have to look like {"resourceSpans": {...}} instead of {"resourceSpans":[{...}, ...]} which is incompatible with this proposal and the current fileexporter IIUC.

@aabmass I meant something slightly different, i.e.:

message OpenTelemetryData {
    oneof data {
        MetricsData metricsData = 1;
        LogsData logsData = 2;
        TracesData tracesData = 3;
    }
}

Would it still not work (since MetricsData/LogsData/TracesData is just a repeated field underneath)? I think this would effect in something like {"tracesData": [{"resource": ..., "instrumentationLibrarySpans": ...}, ...]}

I might be missing something but my understanding is that TracesData/etc. were introduced for usecases like here and are a better fit than ExportTraceServiceRequest/etc.

@aabmass
Copy link
Member

aabmass commented Feb 17, 2022

Would it still not work (since MetricsData/LogsData/TracesData is just a repeated field underneath)? I think this would effect in something like {"tracesData": [{"resource": ..., "instrumentationLibrarySpans": ...}, ...]}

It would work but I think the json would look like this: {"tracesData": {"resourceSpans: [{"resource": ..., "instrumentationLibrarySpans": ...}]}}. Maybe we should table the oneof discussion for now and open a follow up issue for parsing jsonlines as a single protobuf message

I might be missing something but my understanding is that TracesData/etc. were introduced for usecases like here and are a better fit than ExportTraceServiceRequest/etc.

I agree those might be a better fit. This PR currently references OTLP/HTTP standard which is specific to use the the Export*ServiceRequest messages. @pmm-sumo do you want this PR be updated to say TracesData etc. instead?

@pmm-sumo
Copy link
Contributor

pmm-sumo commented Feb 17, 2022

I agree those might be a better fit. This PR currently references OTLP/HTTP standard which is specific to use the the Export*ServiceRequest messages. @pmm-sumo do you want this PR be updated to say TracesData etc. instead?

I guess the reason for referencing OTLP/HTTP is because it is the only place in specification where JSON encoding is mentioned in context of OTLP. However, this PR is actually about proposing how to serialize OTLP using JSON on disk, so I believe it does not have to stick to the notion of being strictly based on OTLP/HTTP, including ExportTracesServiceRequest and such

As for TracesData/etc. - I think it would be good to hear from @bogdandrutu who introduced it into the proto. That would be perhaps the first mention of it in the specification, though I believe it would be inline with the original idea behind it

@tigrannajaryan
Copy link
Member

I guess the reason for referencing OTLP/HTTP is because it is the only place in specification where JSON encoding is mentioned in context of OTLP. However, this PR is actually about proposing how to serialize OTLP using JSON on disk, so I believe it does not have to stick to the notion of being strictly based on OTLP/HTTP, including ExportTracesServiceRequest and such

If we don't reference OTLP/HTTP spec then at least we need to duplicate the wording we have there about JSON encoding. It has specifics which are important:

JSON-encoded Protobuf payloads use proto3 standard defined JSON Mapping for mapping between Protobuf and JSON, with one deviation from that mapping: the trace_id and span_id byte arrays are represented as case-insensitive hex-encoded strings, they are not base64-encoded like it is defined in the standard JSON Mapping. The hex encoding is used for trace_id and span_id fields in all OTLP Protobuf messages, e.g. the Span, Link, LogRecord, etc. messages.

Note that according to Protobuf specs 64-bit integer numbers in JSON-encoded payloads are encoded as decimal strings, and either numbers or strings are accepted when decoding.

@github-actions
Copy link

This PR was marked stale due to lack of activity. It will be closed in 7 days.

@github-actions github-actions bot added the Stale label Feb 25, 2022
@atoulme
Copy link
Contributor Author

atoulme commented Feb 26, 2022

Not much movement here in the last week. I am going to add this to the agenda of the next spec meeting. Please make sure to review and bring up your points if you can?

Here is where I stand myself - we have a very minimal spec that works right now, and we're now figuring out how to map it to the protobuf model, which is super cool. My current thinking is that we should merge this PR and open a new one for the protobuf mapping.

@github-actions github-actions bot removed the Stale label Feb 26, 2022
@atoulme
Copy link
Contributor Author

atoulme commented Mar 1, 2022

Great discussion on the call this morning. I am going to file issues to capture discussion on this issue and make sure we continue the work after this PR, after which it should be good to go.

@atoulme
Copy link
Contributor Author

atoulme commented Mar 1, 2022

@tigrannajaryan we do reference OTLP/HTTP:
The data must be encoded according to the format specified in the [OTLP/HTTP standard](https://github.com/open-telemetry/opentelemetry-specification/blob/main/specification/protocol/otlp.md#otlphttp).
Indeed, that section of the document is what you refer to in your comment. We should be good then, but if you like I can file an issue to ask that we carve out OTLP/HTTP even more to make it super clear what we refer to.

@atoulme
Copy link
Contributor Author

atoulme commented Mar 1, 2022

@pmm-sumo I have filed #2390 to follow up on your comment.

@tigrannajaryan
Copy link
Member

tigrannajaryan commented Mar 2, 2022

We should be good then, but if you like I can file an issue to ask that we carve out OTLP/HTTP even more to make it super clear what we refer to.

One possible approach is to move the paragraph that describes how JSON encoding works from OTLP/HTTP document in this repo to https://github.com/open-telemetry/opentelemetry-proto repo which is responsible for encodings. Then we can link to that from OTLP/HTTP document and from the document that this PR introduces.

@atoulme
Copy link
Contributor Author

atoulme commented Mar 8, 2022

Nice! Can we merge this too?

@carlosalberto
Copy link
Contributor

@tigrannajaryan Can you confirm the remaining items can be done as part of #2390? Sounds like a YES 😄

@tigrannajaryan
Copy link
Member

@tigrannajaryan Can you confirm the remaining items can be done as part of #2390? Sounds like a YES 😄

Yes, I think it is fine.

However, if possible I would like to see more approvals on this issue before we merge it.

@open-telemetry/specs-approvers please review.

Copy link
Member

@djaglowski djaglowski left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@carlosalberto
Copy link
Contributor

We got one more review, and this has been open for a while now. Merging.

@carlosalberto carlosalberto merged commit 44c1acd into open-telemetry:main Mar 11, 2022
@atoulme atoulme deleted the json_serialization branch July 27, 2022 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Define how OTLP data can be stored in a file
10 participants