Proper viewer support for dataframes #8443

jleibs · 2024-12-12T17:01:45Z

Context

We now have two main types of data:

Recording-like

Recording-like data is our historical Rerun "logged" concept.
Data in a single recording is divided into "Rerun Chunks".
Unlike an Arrow-IPC Stream, each Chunk in a recording stream is allowed to have a different schema.
Rerun chunks depend on the existence of certain required columns related to rows and indexes:
- -> Every Rerun Chunk is a valid RecordBatch, but not every RecordBatch is a valid Rerun Cunk
A Rerun ChunkStore indexes these chunks and allows for flexible querying operations that return chunks.

Table-like

Table-like data has started showing up in our new APIs where we want to map things to a single data-frame.
- Query Results
- Catalog
This exactly matches the traditional Arrow concept
Any Table-like data exposed as a user-facing python API should map to a pa.RecordBatchReader

Improved Viewer Support

In principal, any Dataframe can be converted to an equivalent set of Rerun chunks by:

Injecting a row-id based timeline if one doesn't exist already
Splitting apart columns that belong to separate entities
Wrapping any non-list types as arrow list arrays.

However, the question is where we should apply this transformation. Doing this on the viewer-ingest side (rather than send side) would both simplify a lot of logging code as well as data-platform implementation.

Proposal

For incoming client streams (e.g. TCP, notebook, etc.)

Some header in the stream (maybe part of StoreInfo) should determine whether a stream is a "RerunChunk" stream or a "Dataframe" stream.

For gRPC responses we have the abillity to type these more directly.

In the short term, if the stream is a dataframe stream, each DataframePart read from the stream should be converted to 1 or more Rerun chunks, which are injected into a Store.

Longer-term we might introduce an alternative to the ChunkStore for working with these Dataframe stores more directly.

The text was updated successfully, but these errors were encountered:

jleibs added feat-dataframe-view Everything related to the dataframe view 📺 re_viewer affects re_viewer itself labels Dec 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proper viewer support for dataframes #8443

Proper viewer support for dataframes #8443

jleibs commented Dec 12, 2024

Proper viewer support for dataframes #8443

Proper viewer support for dataframes #8443

Comments

jleibs commented Dec 12, 2024

Context

Recording-like

Table-like

Improved Viewer Support

Proposal