Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rethinking stateful time APIs to avoid surprising behaviors #3743

Open
teh-cmc opened this issue Oct 9, 2023 · 2 comments
Open

Rethinking stateful time APIs to avoid surprising behaviors #3743

teh-cmc opened this issue Oct 9, 2023 · 2 comments
Labels
🌊 C++ API C/C++ API specific 💬 discussion 🐍 Python API Python logging API 🦀 Rust API Rust logging API

Comments

@teh-cmc
Copy link
Member

teh-cmc commented Oct 9, 2023

TL;DR: Introduce per-recordingstream clocks, as opposed to per-thread clocks.

Status quo

Logging to a recording is controlled by RecordingStreams.

RecordingStream are Send & Sync, i.e. they can be shallow-cloned (refcounted) and sent to other threads where they can then be used concurrently.
re_sdk provides facilities to help keep track of per-process and per-thread RecordingStreams.
Those facilities are mostly used by the Python bindings, though nothing prevents using them in Rust & C++ too.

Independently of recording streams, we also have per-thread clocks.
These clocks keep track of one time per-Timeline per-RecordingStreamId per-ThreadId: i.e. something along the lines of HashMap<(ThreadId, RecordingStreamId, Timeline), Time>.

These times can be set in a stateful manner using the RecordingStream::set_time_* family of methods.
When it's time to log something, a recording stream will query the clock in thread-local storage and ask for the time on all timelines for the current (ThreadId, RecordingStreamId) pair.

Problems

This implicit relationship between threads and RecordingStreams can lead to confusion and/or surprising behavior.

Example:

  1. You call rec.set_time_sequence("frame", 42) on your main thread.
  2. You send a bunch of data as well as a copy of rec to another thread where the data gets pre-processed then logged to rec.
  3. You get confused: the data doesn't have a "frame" timestamp associated with it (the thread-local clock on your post-processing thread doesn't know about it!).

You can imagine all kinds of scenarios like the above. The root cause is always the same though: thread-local clocks vs. recording streams vs. the interaction between those.

In Rust & C++, you generally always work directly with RecordingStream handles of some kind, so at least the confusion stops there.

But in Python, it can actually gets worse:

  • You generally use the global recording-stream, which is implicitly set when calling rr.init().
  • Although sometimes you may be using a thread-local recording-stream (via the with syntax).
  • And nothing prevents you to just use a recording stream handle, like you'd do in Rust/C++!

Finally, each RecordingStream keeps track of its internal, process-global tick: i.e. the log_tick timeline is always global no matter what!

Proposal

My proposal would be to get rid of thread-local clocks altogether, and instead put a clock in each and every RecordingStream.

Furthermore, clocks would not be refcounted:

#[derive(Clone)]
pub struct RecordingStream {
    clock: Clock,
    inner: Arc<Option<RecordingStreamInner>>,
}

That way, cloning a RecordingStream would still be relatively cheap (and more importantly, wouldn't clone the underlying batching and I/O pipelines) but would now behave more like a fork: the clone recording-stream just starts where its parent state left off.

@teh-cmc teh-cmc added 🐍 Python API Python logging API 💬 discussion 🦀 Rust API Rust logging API 🌊 C++ API C/C++ API specific labels Oct 9, 2023
@jleibs
Copy link
Member

jleibs commented Oct 9, 2023

This proposal makes sense to me. Something else worth considering: we eventually want to provide an API in which users can explicitly control the timepoint associated with their data instead of grabbing it from the stream context.

Pretty sure this proposal moves us in a direction that helps with that goal. (Probably a stetch, but you could even imagine implementing it as a special way of creating a RecordingStream with a static clock). Either way, just figured it would be helpful to keep in mind while implementing this.

@teh-cmc
Copy link
Member Author

teh-cmc commented Oct 12, 2023

This proposal makes sense to me. Something else worth considering: we eventually want to provide an API in which users can explicitly control the timepoint associated with their data instead of grabbing it from the stream context.

Pretty sure this proposal moves us in a direction that helps with that goal. (Probably a stetch, but you could even imagine implementing it as a special way of creating a RecordingStream with a static clock). Either way, just figured it would be helpful to keep in mind while implementing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🌊 C++ API C/C++ API specific 💬 discussion 🐍 Python API Python logging API 🦀 Rust API Rust logging API
Projects
None yet
Development

No branches or pull requests

2 participants