Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add fingerprints array to span schema #392

Closed
wants to merge 4 commits into from

Conversation

Mmarzex
Copy link
Contributor

@Mmarzex Mmarzex commented Jan 16, 2023

We need to be able to filter spans/traces by the errors/warnings that they generate. This PR Adds a repeated field on the Span schema that we can populate on ingest when we have the spans and events together.

Danwakeem
Danwakeem previously approved these changes Jan 16, 2023
Copy link
Contributor

@Danwakeem Danwakeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good just had a question about the implementation details we had in mind 👍

Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I see in the comments, it’s only needed for ingest internals. Do we really have to add it to schema?
Other issue is that it adds a field for information that's already provided with other fields (fingerprints are already present in trace.events[]). So design-wise this addition doesn't look compelling

@Mmarzex
Copy link
Contributor Author

Mmarzex commented Jan 16, 2023

@medikoo It is also used in the user facing portions of the UI. This is a field that is need by ingest and then by the user facing APIs for retrieving/filtering data.

I understand the concern but we made a decision early on to use the same fields across all three areas, so since this is needed in two of them we have to add it. It would require a change to the entire ingest pipeline to change it from sending just span typed data to sending a span & { fingerprints: [] } type, which I do not want to do. It is a slippery slope to add fields in one part that do not exist everywhere.

@Mmarzex Mmarzex requested a review from medikoo January 16, 2023 22:13
@Mmarzex
Copy link
Contributor Author

Mmarzex commented Jan 17, 2023

@medikoo Also we don't currently expose a fingerprint property on the event schema itself, just the custom_fingerprint which is not expected to be set all the time. I think this property provides the best solution to minimize query overhead in the whole system.

Copy link
Contributor

@Danwakeem Danwakeem left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we need this when we pass the protobuf around at some point I am cool with adding this. We may wanna wait for @medikoo to reply before sending it though.

Copy link
Contributor

@medikoo medikoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand the concern but we made a decision early on to use the same fields across all three areas, so since this is needed in two of them we have to add it. It would require a change to the entire ingest pipeline to change it from sending just span typed data to sending a span & { fingerprints: [] } type, which I do not want to do. It is a slippery slope to add fields in one part that do not exist everywhere.

This signals to me that we're using same schema for different requests, where the same data is expected to be passed. in a different form.

This doesn't look as the right direction to me (ideally those requests should be defined separately). It seems we're introducing a technical debt by building a schema that is hard to reason about.

Is there really no better way to do that?

@Mmarzex
Copy link
Contributor Author

Mmarzex commented Jan 17, 2023

This doesn't look as the right direction to me (ideally those requests should be defined separately). It seems we're introducing a technical debt by building a schema that is hard to reason about.

How could they be defined separately. If we pull say all events from Opensearch that have fingerprint abc123, that may return to us 1,000,000 traceIds, we can't actually then query for all traces like that. We need to be able to query the spans index in opensearch for fingerprints. This is what this achieves.

@medikoo
Copy link
Contributor

medikoo commented Jan 17, 2023

How could they be defined separately. If we pull say all events from Opensearch that have fingerprint abc123, that may return to us 1,000,000 traceIds, we can't actually then query for all traces like that. We need to be able to query the spans index in opensearch for fingerprints. This is what this achieves.

@Mmarzex will moving events from TracePayload to Span improve that? Having that we would have fingerprints at span level, and across TracePayload no data will be duplicated

@Mmarzex Mmarzex closed this Jan 23, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants