Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode LogMsg using protobuf #8347

Merged
merged 53 commits into from
Dec 12, 2024
Merged

Encode LogMsg using protobuf #8347

merged 53 commits into from
Dec 12, 2024

Conversation

jprochazk
Copy link
Member

@jprochazk jprochazk commented Dec 6, 2024

Related

What

This PR introduces Serializer::Protobuf to re_log_encoding, and inverts the dependency graph of re_protos, which no longer depends on other re_* crates. This meant the conversion impls from protobuf types to rerun types and back had to be moved into their respective crates. For example, From<StoreId> for re_protos::common::v0::StoreIdis now inre_log_types`.

When encoding a file using this serializer, the data is encoded using a combination of:

  • A custom stream-level protocol
  • Protocol buffers
  • Arrow IPC

The stream-level protocol has changed only a bit, because compression is no longer done for all messages in the stream, and only the contents of ArrowMsg are ever compressed at all. This means the uncompressed_len and compressed_len could be unified to just len.

The actual layout of the messages has not changed, LogMsg is preserved and so are its semantics.

The stream of data stored in an example RRD file using this new encoding looks like:

FileHeader { b"RRIO", version, compression, serializer }     ;; 10 bytes
MessageHeader { kind, len }                                  ;; 8 bytes
SetStoreInfo { application_id, store_id, store_source, ... } ;; len bytes
MessageHeader { kind, len }                                  ;; 8 bytes
ArrowMsg { store_id, arrow_msg }                             ;; len bytes
MessageHeader { kind: End, len: 0 }                          ;; 8 bytes

Note that this stream-level protocol is only used for .rrd files. On the wire, we will use gRPC, which has its own protocol.

In the case of ArrowMsg, the schema+chunk is encoded using Arrow IPC into a byte payload, which may additionally be compressed. The compression setting is stored separately for every ArrowMsg, but per-message compression functionality is not yet exposed through re_log_encoding.

@jprochazk jprochazk added include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages dataplatform Rerun Data Platform integration labels Dec 6, 2024
Copy link

github-actions bot commented Dec 6, 2024

Latest documentation preview deployed successfully.

Result Commit Link
0d3ec40 https://landing-hx1w29tq8-rerun.vercel.app/docs

Note: This comment is updated whenever you push a commit.

Copy link

github-actions bot commented Dec 6, 2024

Web viewer built successfully. If applicable, you should also test it:

  • I have tested the web viewer
Result Commit Link Manifest
cd6ab2e https://rerun.io/viewer/pr/8347 +nightly +main

Note: This comment is updated whenever you push a commit.

@jprochazk jprochazk marked this pull request as ready for review December 6, 2024 18:40
Copy link
Contributor

@zehiko zehiko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for untangling these dependency issues!

generally looks ok to me, few comments and a question about more clear re-use of common arrow seriailzation logic to make re_log_encoding crate clearer.

crates/store/re_chunk_store/src/lib.rs Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/mod.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/wire.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/protobuf/decoder.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/mod.rs Outdated Show resolved Hide resolved
crates/store/re_chunk_store/src/protobuf_conversions.rs Outdated Show resolved Hide resolved
@jprochazk jprochazk marked this pull request as draft December 9, 2024 10:31
@jprochazk
Copy link
Member Author

jprochazk commented Dec 11, 2024

could add a unit/integration test that uses new codec to write and read rrd stream using Protobuf? something similar-ish to test_loading_with_retryable_reader

This is now tested as part of the tests in re_log_encoding/src/decoder/mod.rs. I updated the fake log messages there to actually encode a more realistic scenario (blueprint that sets the background color, though I'm sure it's wrong as I haven't actually tested if it does that, it's just meant to be some data to encode/decode), to exercise encoding for all 3 LogMsg variants.

suggested adding a few basic unit tests in a few

Done for re_log_types and re_tuid. It doesn't exercise every possible path but all the major ones are hit in some way (e.g. the StoreSourceExtra).

Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I only had time for a partial review today - will take a closer look tomorrow!

crates/store/re_log_encoding/src/codec/arrow.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/arrow.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/arrow.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/file/decoder.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/file/decoder.rs Outdated Show resolved Hide resolved
crates/store/re_log_encoding/src/codec/file/decoder.rs Outdated Show resolved Hide resolved
@jprochazk
Copy link
Member Author

jprochazk commented Dec 11, 2024

I updated the benchmark to include protobuf variants of all the encode/decode parts. There are way too many benchmarks now, so I tried using ChatGPT to make sense of the results:

image

Note that it messed up units, and produced total gibberish when asked to fix it. They aren't all in milliseconds.

Results seem fairly mixed, some wins and some losses. The mono-points non-batched vs batched decode is interesting, as msgpack seems to be better at non-batched data? Encode seems to be a win across the board for protobuf, but only by a little.

Copy link
Member

@emilk emilk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

crates/store/re_log_types/src/lib.rs Show resolved Hide resolved
];

for options in options {
println!("{options:?}");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😬

Tip: add a // TODO suffix on code you plan to remove (the linter will stop you from merging it).

Or use dbg!(options); (again, this will not pass CI)

You can also just use re_log::trace and run with RUST_LOG=re_log_encoding=trace

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't plan to remove this, what's wrong with keeping a println in a test? It doesn't get printed unless the test fails, in which case it's useful information about which part of the test failed

Comment on lines +93 to +99
#[allow(clippy::match_same_arms)]
match value.name.as_str() {
"log_time" => Self::new_temporal(value.name),
"log_tick" => Self::new_sequence(value.name),
"frame" => Self::new_sequence(value.name),
"frame_nr" => Self::new_sequence(value.name),
_ => Self::new_temporal(value.name),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, what the hack is this @teh-cmc ??

@jprochazk jprochazk merged commit 0786fda into main Dec 12, 2024
31 checks passed
@jprochazk jprochazk deleted the jan/recording-protobuf branch December 12, 2024 12:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
dataplatform Rerun Data Platform integration include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Replace our serde/MsgPack encoding of enum LogMsg with protobuf
3 participants