Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace Session with RecordingContext #1983

Merged
merged 29 commits into from
May 4, 2023
Merged

Conversation

teh-cmc
Copy link
Member

@teh-cmc teh-cmc commented Apr 26, 2023

Session has been replaced by RecordingStream, and SessionBuilder with RecordingStreamBuilder.


This introduces RecordStream, which we've all talked about in leeeeeeength this afternoon so I won't dwell on it too much.

A RecordStream is in charge of the entire data pipeline for a single recording, and comes with well-defined ordering, durability and multithreading guarantees/semantics.

RecordStream replaces the previous Session stuff in the Rust SDK, and will become the basis of the Python SDK too in the next PR.

Try it!

api_demo on row at a time:

RERUN_FLUSH_TICK_SECS=9999999999 RERUN_FLUSH_NUM_BYTES=9999999999 RERUN_FLUSH_NUM_ROWS=0 cargo r -p api_demo

api_demo in one big single packet:

RERUN_FLUSH_TICK_SECS=9999999999 RERUN_FLUSH_NUM_BYTES=9999999999 RERUN_FLUSH_NUM_ROWS=9999999999 cargo r -p api_demo

TODO:


  1. SDK batching/revamp 1: impl DataTableBatcher #1980
  2. Replace Session with RecordingContext #1983
  3. SDK batching/revamp 2.1: clock example for Rust #2000
  4. SDK batching/revamp 2.2: homegrown arrow size estimation routines #2002
  5. SDK batching/revamp 3: sunset PythonSession #1985

On top of #1980
Part of #1619

Future work:

@teh-cmc teh-cmc changed the title SDK batching/revamp 1: introduce RecordingContext & revamp Rust SDK SDK batching/revamp 2: introduce RecordingContext & revamp Rust SDK Apr 26, 2023
@teh-cmc teh-cmc added 🦀 Rust API Rust logging API do-not-merge Do not merge this PR labels Apr 26, 2023
@teh-cmc teh-cmc changed the base branch from main to cmc/sdk_revamp/1_batcher April 26, 2023 22:11
@teh-cmc teh-cmc force-pushed the cmc/sdk_revamp/2_rust_revamp branch 3 times, most recently from fb1ea44 to 36dc901 Compare April 26, 2023 22:17
@teh-cmc teh-cmc changed the title SDK batching/revamp 2: introduce RecordingContext & revamp Rust SDK SDK batching/revamp 2: introduce RecordingContext / sunset Rust's Session Apr 26, 2023
@teh-cmc teh-cmc force-pushed the cmc/sdk_revamp/2_rust_revamp branch 3 times, most recently from ac0ce2e to a3ed552 Compare April 27, 2023 09:07
@teh-cmc teh-cmc force-pushed the cmc/sdk_revamp/1_batcher branch from eba29b7 to 67dc616 Compare April 27, 2023 09:11
@teh-cmc teh-cmc force-pushed the cmc/sdk_revamp/2_rust_revamp branch from a3ed552 to ac1fac2 Compare April 27, 2023 09:12
@teh-cmc teh-cmc changed the title SDK batching/revamp 2: introduce RecordingContext / sunset Rust's Session SDK batching/revamp 2: introduce RecordingContext / sunset Session Apr 27, 2023
@teh-cmc teh-cmc force-pushed the cmc/sdk_revamp/2_rust_revamp branch from ac1fac2 to 14130c5 Compare April 27, 2023 16:30
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/msg_sender.rs Outdated Show resolved Hide resolved
crates/rerun/src/clap.rs Show resolved Hide resolved
crates/re_sdk/src/recording_stream.rs Show resolved Hide resolved
crates/re_sdk/src/recording_stream.rs Show resolved Hide resolved
crates/re_sdk/src/recording_stream.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/recording_stream.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/recording_stream.rs Outdated Show resolved Hide resolved
@teh-cmc teh-cmc removed the do-not-merge Do not merge this PR label Apr 28, 2023
@teh-cmc teh-cmc marked this pull request as ready for review April 28, 2023 08:13
@@ -61,7 +61,7 @@ impl EntityDb {
let mut table = DataTable::from_arrow_msg(msg)?;
table.compute_all_size_bytes();

// TODO(#1619): batch all of this
// TODO(cmc): batch all of this
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oooh.... even more performance gains to be had?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, we only batch during transport at the moment, everything else happens on a row-by-row basis.

Though I'm not sure it's ever going to be worth it in this instance: implementing the write path in terms of individual rows is so flexible, simple and easy to extend... and things are already so fast even with just the transport-level batching...

Comment on lines +21 to +22
/// Only applies to sinks that maintain a backlog.
#[inline]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The fact that all sinks implement this but don't support it strikes me as awkward. It is possible to be explicit about tracking when our Sink is a BufferedSink and only implement drain_backlog there?

At a minimum maybe signature should be Option<Vec<LogMsg>> to disambiguate between "The sink doesn't support a backlog" and "The backlog was empty."

Copy link
Member Author

@teh-cmc teh-cmc May 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean sure we could return an option but I don't see how that'd be an improvement? We can always add it later on if we ever need to disambiguate between no backlog vs. empty backlog, but there's no such need at the moment.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we ever need to disambiguate between no backlog vs. empty backlog

Yeah, we can talk about this more concretely in a future context. I'm thinking about cases like: "I tried to use a Tcp sink, but the tcp end-point wasn't there, so now instead I try to switch my sink to a FileSink, expecting it to drain the backlog of buffered tcp messages, but even though the tcp sink told me there was no backlog, I actually just lost my data"

///
/// This can be used to then construct a [`RecordingStream`] manually using
/// [`RecordingStream::new`].
pub fn finalize(self) -> (bool, RecordingInfo, DataTableBatcherConfig) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logically, returning enabled from the RecordingStreamBuilder feels a bit out of place to me.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually the fact that this doesn't return a RecordingStream at all is even more unexpected... maybe this struct should be named something like RecordstringStreamConstructionArgs (but shorter)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know, I'm not sure adding yet another type will help... The doc already makes it pretty clear what's going on here.

Also this code is just a copy-paste of the old SessionBuilder::finalize code which never really gave us any problems 🤷

pub enum RecordingStreamError {
/// Error within the underlying file sink.
#[error("Failed to create the underlying file sink: {0}")]
FileSink(#[from] re_log_encoding::FileSinkError),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to fix this now but curious how we would make this Sink-agnostic for extensibility reasons in the future? Maybe we don't need extensible anyways so I'm over-thinking it.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something like

    /// Error within the underlying file sink.
    #[error("Failed to create the underlying file sink: {0}")]
    Sink(#[from] Box<dyn std::error::Error + Send + Sync>),

would do the job, wouldn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, something along those lines.

crates/re_sdk/src/recording_stream.rs Outdated Show resolved Hide resolved
crates/re_sdk/src/recording_stream.rs Show resolved Hide resolved
Base automatically changed from cmc/sdk_revamp/1_batcher to main May 3, 2023 20:11
@teh-cmc
Copy link
Member Author

teh-cmc commented May 3, 2023

Updated with some (hopefully) clarifications

Copy link
Member

@jleibs jleibs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good!

Co-authored-by: Jeremy Leibs <jeremy@rerun.io>
@teh-cmc teh-cmc merged commit b2b3064 into main May 4, 2023
@teh-cmc teh-cmc deleted the cmc/sdk_revamp/2_rust_revamp branch May 4, 2023 09:16
@emilk emilk changed the title SDK batching/revamp 2: introduce RecordingContext / sunset Session Replace Session with RecordingContext May 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🦀 Rust API Rust logging API
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants