You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Recording-like data is our historical Rerun "logged" concept.
Data in a single recording is divided into "Rerun Chunks".
Unlike an Arrow-IPC Stream, each Chunk in a recording stream is allowed to have a different schema.
Rerun chunks depend on the existence of certain required columns related to rows and indexes:
-> Every Rerun Chunk is a valid RecordBatch, but not every RecordBatch is a valid Rerun Cunk
A Rerun ChunkStore indexes these chunks and allows for flexible querying operations that return chunks.
Table-like
Table-like data has started showing up in our new APIs where we want to map things to a single data-frame.
Query Results
Catalog
This exactly matches the traditional Arrow concept
Any Table-like data exposed as a user-facing python API should map to a pa.RecordBatchReader
Improved Viewer Support
In principal, any Dataframe can be converted to an equivalent set of Rerun chunks by:
Injecting a row-id based timeline if one doesn't exist already
Splitting apart columns that belong to separate entities
Wrapping any non-list types as arrow list arrays.
However, the question is where we should apply this transformation. Doing this on the viewer-ingest side (rather than send side) would both simplify a lot of logging code as well as data-platform implementation.
Proposal
For incoming client streams (e.g. TCP, notebook, etc.)
Some header in the stream (maybe part of StoreInfo) should determine whether a stream is a "RerunChunk" stream or a "Dataframe" stream.
For gRPC responses we have the abillity to type these more directly.
In the short term, if the stream is a dataframe stream, each DataframePart read from the stream should be converted to 1 or more Rerun chunks, which are injected into a Store.
Longer-term we might introduce an alternative to the ChunkStore for working with these Dataframe stores more directly.
The text was updated successfully, but these errors were encountered:
Context
We now have two main types of data:
Recording-like
Table-like
pa.RecordBatchReader
Improved Viewer Support
In principal, any Dataframe can be converted to an equivalent set of Rerun chunks by:
However, the question is where we should apply this transformation. Doing this on the viewer-ingest side (rather than send side) would both simplify a lot of logging code as well as data-platform implementation.
Proposal
For incoming client streams (e.g. TCP, notebook, etc.)
StoreInfo
) should determine whether a stream is a "RerunChunk" stream or a "Dataframe" stream.For gRPC responses we have the abillity to type these more directly.
In the short term, if the stream is a dataframe stream, each
DataframePart
read from the stream should be converted to 1 or more Rerun chunks, which are injected into a Store.Longer-term we might introduce an alternative to the ChunkStore for working with these Dataframe stores more directly.
The text was updated successfully, but these errors were encountered: