-
-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rfc(informational): Combined Replay Envelope Item #82
Changes from 6 commits
050fac5
64b2f60
e00e3d6
180619a
9a3c022
cfd15b5
9b27ddf
1878dad
e404077
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,102 @@ | ||
- Start Date: 2023-03-24 | ||
- RFC Type: informational | ||
- RFC PR: https://github.com/getsentry/rfcs/pull/82 | ||
- RFC Status: draft | ||
|
||
# Summary | ||
|
||
Right now Session Replays data is sent from the SDK via two envelope item types: | ||
|
||
ReplayEvent and ReplayRecording. | ||
|
||
- ReplayEvent contains indexable metadata that is stored in our clickhouse table. | ||
- ReplayRecording contains a large, usually compressed blob that is written to our Filestore (GCS as of now) | ||
|
||
These are always sent in the same envelope by the sdk. We'd like to combine these two envelope item types into one. | ||
|
||
We'd first write a compatibility layer in relay that takes the old format and converts them into a single item type. | ||
We'd then modify the sentry-sdk to send combined item types going forward | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We need to consider compatibility with Self Hosted (and Single Tenant) before we change the SDK |
||
|
||
# Motivation | ||
|
||
## The Current Flow: | ||
|
||
```mermaid | ||
graph | ||
A([SDK Generated Event])-->|HTTP Request| Z{Relay} | ||
Z-->|Published to Kafka topic|B[Replays Kafka Cluster] | ||
B-->|ingest-replay-events|J[Snuba Consumer] | ||
J-->|Buffered Bulk Commit|K[(Clickhouse)] | ||
B-->|ingest-replay-recordings|E[Recording Consumer] | ||
E-->C[Recording Chunk Processor] | ||
C-->|Chunk Stashed|D[(Redis)] | ||
E-->Q[Recording Event Processor] | ||
Q-->F[Chunk Assembly] | ||
I[(Redis)]-->|Chunks Fetched|F | ||
F-->|Assembled Chunks Stored|G[(Blob Storage)] | ||
F-->|File Metadata Saved|H[(PostgreSQL)] | ||
``` | ||
|
||
We'd like to combine these envelope payloads into one for the following reasons: | ||
|
||
- Now that we're decompressing, parsing, and indexing data in the recording, the delineation between the two events no longer makes sense. | ||
- Right now there exists a race condition between ReplayEvent and ReplayRecording -- if a ReplayEvent makes it to clickhouse and is stored before the ReplayRecording, it can result in a bad user experience as a user can navigate to the replay, but the replay's blobs may not be stored yet. We'd like to change it so the snuba writes happen _downstream_ from the recording consumer, as we don't want to make a replay available for search until it's corresponding recording blob has been stored. | ||
- Right now our rate limits are separated per Itemtype. It would be less confusing if the rate limit applied to a single event type. | ||
- It is very hard to do telemetry on the recordings consumer now as we do not have SDK version. combining the envelopes allows us to have metadata in our recordings consumer in an easily accessible way. | ||
|
||
# Options Considered | ||
|
||
New Proposed Flow: | ||
|
||
```mermaid | ||
graph | ||
A([SDK Generated Combined Replay Envelope Item])--> Z[Relay, Combines Envelope Items if needed] | ||
AA([Legacy SDK Generated Separate Items]) --> Z | ||
Z-->|Published to Kafka topic, using random partition key|B[Replays Kafka Cluster] | ||
T-->|ingest-replay-events|J[Snuba Consumer] | ||
J-->|Buffered Bulk Commit|K[(Clickhouse)] | ||
B-->|ingest-replay-recordings|E[Recording Consumer] | ||
E-->Q[Parse Replay for Click Events, etc.] | ||
Q-->R[Emit Outcome if segment = 0] | ||
Q-->S[Store Replay Blob in Storage, GCS in SaaS ] | ||
S-->T[ Emit Kafka Message for Snuba Consumer] | ||
``` | ||
|
||
### Relay Changes: | ||
|
||
1. Create a new ItemType `CombinedReplayRecordingEvent` | ||
2. in `processor.rs` https://github.com/getsentry/relay/blob/606166fca57a84ca1b9240253013871d13827de3/relay-server/src/actors/processor.rs#L1134, replace the `process_replays` function that will combine the envelope items if its the old format using `take_item_by` and `add_item` Envelope functions, or leave them if its the new format. This function would then process the two items and set the item payload. | ||
3. We can begin rejecting replay segments over 10 megabytes, which will allow us to not need chunking in our recording consumer. | ||
|
||
- The combined Payload will have the format of | ||
|
||
``` | ||
ReplayEventJSON\nCompressedReplayRecording | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of making this a newline-separated string, could we also send this as a JSON like this:
Not sure about implications for parsing this, but from an SDK perspective this seems a bit nicer/safer than string-concatting this. But not a hard feeling, just a thought. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. a wait, I guess we can't put the compressed stuff into JSON 🤔 but now that I think about this, not sure if we can concat the UInt8Array that is the compressed payload onto a string as well 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If this merged type is generated on the client then you will need to serialize with a binary format like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. hmm, I see. I guess we can do that, convert the payload into UInt8Array if it isn't compressed, then combine them together into a single UInt8Array. One downside to note here is that it may make debugging harder - I currently tend to disable compression when debugging issues, in order to see what data is sent. This would not be possible anymore (as the payload would then always be UInt8Array). Not a blocker, just to be aware of this downside! |
||
``` | ||
|
||
That the downstream consumer can easily parse by splitting on newline. | ||
|
||
### SDK Changes | ||
cmanallen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
||
Emit a new item type "replay_recording_event" with the format | ||
|
||
``` | ||
ReplayEventJSON\nCompressedReplayRecording | ||
cmanallen marked this conversation as resolved.
Show resolved
Hide resolved
|
||
``` | ||
|
||
replacing the previous transport calls. | ||
|
||
### Replay Recording Consumer Changes | ||
|
||
1. Create a new ReplayRecording consumer that can run along-side the existing consumer, as there will be a change-over period | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't think we need a new consumer. We can switch on the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. got it. 👍🏼 |
||
2. This consumer will do all the same things as the previous recordings consumer with the addition of: | ||
- emitting a JSON event to the snuba-replay-events kafka topic | ||
|
||
# Drawbacks | ||
|
||
This is a decent chunk of engineering work. | ||
|
||
# Unresolved questions | ||
|
||
- Is there a better format for the combined envelope item type? If we split on \n, this means the replay_event should never have a newline in it. I believe this is acceptable, but is there a better format for sending a combined JSON / binary piece of data? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Caught this at the very end. I'm leaving my previous feedback in place :P There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Tags could have a new-line character. |
||
- If rate limits are applied before processing, it seems like we'll need to add a new rate limit for this combined item. This should be okay, anything else to think about here? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm not sure I understand. If rate-limits are applied before processing then this merged type doesn't exist yet. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. yeah, this would only be the case when people are on new SDKs that send the proposed new combined envelope format |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some very old SDKs from alpha still out there that sent two separate requests. As far as I'm aware none of these SDKs are paying customers and they may have stopped ingesting totally.
I think its worthwhile documenting what happens in the event of a missing envelope item. I assume we should 400 or otherwise reject in some meaningful way. Something that tells the client to stop sending.