Architecture Proposal: Output Topic #1312
coltmcnealy-lh
started this conversation in
Ideas
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Motivation
The Output Topic will allow users of LittleHorse to export data in real-time from their LittleHorse Workflows into external systems.
On a personal note, when I started the LittleHorse Server project over three years ago, I did it with the intention of bridging the gap between Workflows, Streams, and Tables.
Workflows as Tables
I believe that Workflows are Data. For example, consider the following
orders
workflow:If you wanted to "export this workflow" into a database such as Postgres or Snowflake, you might create a database table that looks like the following:
And then insert a new row for every single
WfRun
. This would allow you to do analytics based on your orders.This can be accomplished with an Output Topic that publishes updates to
WfRun
data in real time to Apache Kafka.Workflows as Streams
Another motivation for the Output Topic is that updates to your
WfRun
s can be streams of events. For example, the following use-cases have come up:TaskDef
fails five times in a minute.WfSpec
when 10WfRun
s of a specific type reach a certain failure scenario in a given time window.UserTaskRun
s assigned to the same group.WfRun
triggers anotherWfRun
in a loosely-coupled manner.The above can also be accomplished as well thorugh Kafka.
Topic Structure
Metadata and Execution Data
There are two types of data in LittleHorse:
WfSpec
,TaskDef
, etc.WfRun
,TaskRun
, etc.Metadata is small, relatively static, and global to a cluster. Execution data is large, partitioned, and constantly changing. Consumers doing stream processing on Execution data will often need access to Metadata in order to properly make sense of the Execution Data.
Therefore, we will separate metadata and execution data into two topics:
metatadata-output
topic, which is a single-partition, compacted topic containing metadata updates.execution-output
topic, which is a multi-partition, non-compacted topic containing execution data updates.This will allow stream processors to load the current metadata snapshot through the compacted topic (think of a Kafka Streams Global Store), and then join the Execution Data against that snapshot in real time.
Note that most metadata in LittleHorse is immutable—when you want to change it, you end up creating a new version, which is a separate LittleHorse API Object with its own ID—so historical version mismatching shouldn't be a problem if the consumer is up-to-date on metadata but way behind on execution data.
Multi-Tenancy
There are a few considerations regarding topic structure, ownership, and multi-tenancy:
Principal
might be able to do something inTenant
A
but not inB
.Due to the above reasons, I propose that:
Tenant
gets its own Output Topics (one formetadata
andexecution
data).oneof
s to allow putting all data into the two topics above, and clients can filter it out as needed.This prevents an expensive proliferation of Kafka topics and partitions as much as possible while still allowing different LittleHorse
Tenant
s to have isolated data.Proto Schemas
Naturally, LittleHorse is a protobuf-first system. The output topic will inherit this characteristic.
Tenant
Users should be able to enable or disable the Output Topic on a per-tenant basis.
Output Topic Schemas
Every message in the
execution-output
topic will be anOutputTopicRecord
, and we will make heavy use ofoneof
to allow multiple data types.The initial implementation will allow five types of records to be pushed into the Output Topic. However, we can always extend with more.
WorkflowEvent
s thrown by aWfRun
.WfRun
which is treated as a data entity.UserTaskRun
.Variable
.Metadata Output Topic
// TODO
Configuring What's Sent
WfRun
EntitiesBackground: Public Variables
Configuring Recording Levels
WorkflowEvent
recordsTaskRun
recordsUserTaskRun
recordsImplementation
Future Work
Beta Was this translation helpful? Give feedback.
All reactions