Encode `LogMsg` using protobuf #8347

jprochazk · 2024-12-06T17:44:47Z

What

This PR introduces Serializer::Protobuf to re_log_encoding, and inverts the dependency graph of re_protos, which no longer depends on other re_* crates. This meant the conversion impls from protobuf types to rerun types and back had to be moved into their respective crates. For example, From<StoreId> for re_protos::common::v0::StoreIdis now inre_log_types`.

When encoding a file using this serializer, the data is encoded using a combination of:

A custom stream-level protocol
Protocol buffers
Arrow IPC

The stream-level protocol has changed only a bit, because compression is no longer done for all messages in the stream, and only the contents of ArrowMsg are ever compressed at all. This means the uncompressed_len and compressed_len could be unified to just len.

The actual layout of the messages has not changed, LogMsg is preserved and so are its semantics.

The stream of data stored in an example RRD file using this new encoding looks like:

FileHeader { b"RRIO", version, compression, serializer }     ;; 10 bytes
MessageHeader { kind, len }                                  ;; 8 bytes
SetStoreInfo { application_id, store_id, store_source, ... } ;; len bytes
MessageHeader { kind, len }                                  ;; 8 bytes
ArrowMsg { store_id, arrow_msg }                             ;; len bytes
MessageHeader { kind: End, len: 0 }                          ;; 8 bytes

Note that this stream-level protocol is only used for .rrd files. On the wire, we will use gRPC, which has its own protocol.

In the case of ArrowMsg, the schema+chunk is encoded using Arrow IPC into a byte payload, which may additionally be compressed. The compression setting is stored separately for every ArrowMsg, but per-message compression functionality is not yet exposed through re_log_encoding.

github-actions · 2024-12-06T17:54:17Z

Latest documentation preview deployed successfully.

Result	Commit	Link
✅	`0d3ec40`	https://landing-hx1w29tq8-rerun.vercel.app/docs

^{Note: This comment is updated whenever you push a commit.}

github-actions · 2024-12-06T17:55:33Z

Web viewer built successfully. If applicable, you should also test it:

I have tested the web viewer

Result	Commit	Link	Manifest
✅	`cd6ab2e`	https://rerun.io/viewer/pr/8347	`+nightly` `+main`

^{Note: This comment is updated whenever you push a commit.}

zehiko

thanks for untangling these dependency issues!

generally looks ok to me, few comments and a question about more clear re-use of common arrow seriailzation logic to make re_log_encoding crate clearer.

crates/build/re_protos_builder/src/bin/build_re_remote_store_types.rs

crates/store/re_chunk_store/src/lib.rs

crates/store/re_log_encoding/src/codec/mod.rs

crates/store/re_log_encoding/src/codec/wire.rs

crates/store/re_log_encoding/src/protobuf/decoder.rs

crates/store/re_log_encoding/src/codec/mod.rs

crates/store/re_protos/proto/rerun/v0/log_msg.proto

crates/store/re_chunk_store/src/protobuf_conversions.rs

crates/store/re_log_encoding/src/decoder/mod.rs

jprochazk · 2024-12-11T14:05:46Z

could add a unit/integration test that uses new codec to write and read rrd stream using Protobuf? something similar-ish to test_loading_with_retryable_reader

This is now tested as part of the tests in re_log_encoding/src/decoder/mod.rs. I updated the fake log messages there to actually encode a more realistic scenario (blueprint that sets the background color, though I'm sure it's wrong as I haven't actually tested if it does that, it's just meant to be some data to encode/decode), to exercise encoding for all 3 LogMsg variants.

suggested adding a few basic unit tests in a few

Done for re_log_types and re_tuid. It doesn't exercise every possible path but all the major ones are hit in some way (e.g. the StoreSourceExtra).

emilk

I only had time for a partial review today - will take a closer look tomorrow!

crates/build/re_protos_builder/src/bin/build_re_remote_store_types.rs

crates/store/re_log_encoding/benches/msg_encode_benchmark.rs

crates/store/re_log_encoding/src/codec/arrow.rs

crates/store/re_log_encoding/src/codec/file/decoder.rs

jprochazk · 2024-12-11T21:26:53Z

I updated the benchmark to include protobuf variants of all the encode/decode parts. There are way too many benchmarks now, so I tried using ChatGPT to make sense of the results:

Note that it messed up units, and produced total gibberish when asked to fix it. They aren't all in milliseconds.

Results seem fairly mixed, some wins and some losses. The mono-points non-batched vs batched decode is interesting, as msgpack seems to be better at non-batched data? Encode seems to be a win across the board for protobuf, but only by a little.

emilk

👍

crates/store/re_protos/src/v0/rerun.log_msg.v0.rs

crates/store/re_log_types/src/lib.rs

emilk · 2024-12-12T09:36:39Z

crates/store/re_log_encoding/src/decoder/mod.rs

        ];

        for options in options {
+            println!("{options:?}");


😬

Tip: add a // TODO suffix on code you plan to remove (the linter will stop you from merging it).

Or use dbg!(options); (again, this will not pass CI)

You can also just use re_log::trace and run with RUST_LOG=re_log_encoding=trace

I didn't plan to remove this, what's wrong with keeping a println in a test? It doesn't get printed unless the test fails, in which case it's useful information about which part of the test failed

crates/store/re_protos/src/v0/rerun.common.v0.rs

emilk · 2024-12-12T09:39:19Z

crates/store/re_log_types/src/protobuf_conversions.rs

+        #[allow(clippy::match_same_arms)]
+        match value.name.as_str() {
+            "log_time" => Self::new_temporal(value.name),
+            "log_tick" => Self::new_sequence(value.name),
+            "frame" => Self::new_sequence(value.name),
+            "frame_nr" => Self::new_sequence(value.name),
+            _ => Self::new_temporal(value.name),


Wait, what the hack is this @teh-cmc ??

jprochazk added 3 commits December 5, 2024 12:42

fix typo

bdb22ed

temp

8b51ba8

wip

9b4a9a3

jprochazk added include in changelog 🪵 Log & send APIs Affects the user-facing API for all languages dataplatform Rerun Data Platform integration labels Dec 6, 2024

jprochazk added 2 commits December 6, 2024 18:48

Merge branch 'main' into jan/recording-protobuf

317e9c0

fix after merge

8209db4

jprochazk added 3 commits December 6, 2024 18:57

remove unused dep

34dcd4c

exclude re_grpc_client/address links

ba4cfbb

cargo fmt

691fb46

jprochazk mentioned this pull request Dec 6, 2024

Use gRPC everywhere (over the wire) #8349

Open

jprochazk added 3 commits December 6, 2024 19:36

fix lints

a2e58fb

rm dead comment

f22cb03

add todo

b76cf9c

jprochazk marked this pull request as ready for review December 6, 2024 18:40

jprochazk added 3 commits December 6, 2024 19:51

gate behind feature

f91737b

fix check

7a720f8

fix more lints

a2d7ea4

zehiko reviewed Dec 9, 2024

View reviewed changes

jprochazk marked this pull request as draft December 9, 2024 10:31

jprochazk added 7 commits December 10, 2024 11:41

Merge branch 'main' into jan/recording-protobuf

4c18bda

update lockfile

0d3ec40

Merge branch 'main' into jan/recording-protobuf

3d405ff

rename OUTPUT_V0_RUST

5d9051b

remove comments

8497713

rm dead code

187bcb2

docs

8b8be71

jprochazk added 5 commits December 11, 2024 13:02

add todo for arrow ipc compression

77f4142

rename EncodingOptions constants

d79f317

more thorough testing

0b56c95

add conversion unit tests

b1d8ffa

fix compile error

3c9a682

jprochazk and others added 2 commits December 11, 2024 15:07

remove temp

21258a0

Merge branch 'main' into jan/recording-protobuf

04efcf5

emilk reviewed Dec 11, 2024

View reviewed changes

jprochazk added 10 commits December 11, 2024 20:45

use types instead of strings

d1dc3f8

move MessageKind/MessageHeader impls back into declaration site

90360cf

64-bit length

c880b6a

fix

5ce1531

less hardcoding, more type-safety

abbd4c9

type aliases

764d3d6

remove dead comment

922ef0e

remove spaces

a51ffbd

add full bench

d3183f0

fix warning

fed700f

jprochazk added 2 commits December 11, 2024 22:37

remove unused dep

9683405

fix

ec1c6b6

emilk approved these changes Dec 12, 2024

View reviewed changes

jprochazk and others added 5 commits December 12, 2024 12:01

add test for python version parsing

297ab19

mark src/v0 as generated

1d7aecb

fix pattern

df46547

Merge branch 'main' into jan/recording-protobuf

0605295

fix lint

cd6ab2e

jprochazk merged commit 0786fda into main Dec 12, 2024
31 checks passed

jprochazk deleted the jan/recording-protobuf branch December 12, 2024 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Encode `LogMsg` using protobuf #8347

Encode `LogMsg` using protobuf #8347

jprochazk commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

zehiko left a comment

jprochazk commented Dec 11, 2024 •

edited

Loading

emilk left a comment

jprochazk commented Dec 11, 2024 •

edited

Loading

emilk left a comment

emilk Dec 12, 2024

jprochazk Dec 12, 2024

emilk Dec 12, 2024

Encode LogMsg using protobuf #8347

Encode LogMsg using protobuf #8347

Conversation

jprochazk commented Dec 6, 2024 • edited Loading

Related

What

github-actions bot commented Dec 6, 2024 • edited Loading

github-actions bot commented Dec 6, 2024 • edited Loading

zehiko left a comment

Choose a reason for hiding this comment

jprochazk commented Dec 11, 2024 • edited Loading

emilk left a comment

Choose a reason for hiding this comment

jprochazk commented Dec 11, 2024 • edited Loading

emilk left a comment

Choose a reason for hiding this comment

emilk Dec 12, 2024

Choose a reason for hiding this comment

jprochazk Dec 12, 2024

Choose a reason for hiding this comment

emilk Dec 12, 2024

Choose a reason for hiding this comment

Encode `LogMsg` using protobuf #8347

Encode `LogMsg` using protobuf #8347

jprochazk commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

github-actions bot commented Dec 6, 2024 •

edited

Loading

jprochazk commented Dec 11, 2024 •

edited

Loading

jprochazk commented Dec 11, 2024 •

edited

Loading