-
Notifications
You must be signed in to change notification settings - Fork 10
Home
These are the core concepts in Alluvial.
Streams are the starting point for all data flows using Alluvial. They are an abstraction over your data, independent of how it's stored.
In order for the work of querying a stream to be restartable, readers need to know how far along a stream they've traversed. Cursors represent this position. In other stream-processing systems, this concept is sometimes called an offset or a checkpoint.
Streams have a very constrained query interface in order to facilitate replayability and distribution. It is comprised of two parameters:
- How much of the stream should be skipped because it has already been read?
- How many items should be returned?
These are referred to respectively as the cursor position and the batch size. In LINQ and some other query APIs, they correspond to Skip
and Take
.
Once you have a batch of data from a stream, you want to do something with it, whether take some action or project it into some shape. This is the role of aggregators. They receive batches of data along with a prior aggregated state, update the state using the data, and return a new state. They are pure functional constructs. They can be assembled into pipelines to add concerns such as storage and telemetry.
A catchup is a worker that continually polls or awaits data from a stream or multiple streams and, when new data arrives, it passes it to one or more aggregators that are subscribed.
Streams can be partitioned so that the work of catching up a partitioned stream can be done in parallel, whether in-process or across machines. A distributor is a worker that waits for availability of a given partition and leases it out to be worked on.