You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Build out the “stdlib” of importable distributed protocol implementations in Hydroflow(+). Some things are simple (heartbeats, retries/acks), some are very complex (Paxos variants, BFT variants, etc). See Hackathon: Examples of common distributed design patterns #346.
Mostly-monotonic (lattice-oriented) variants of standard distributed protocols. For example, @jhellerstein is working on an implementation of retry/ack that uses a mutable “outstanding messages” buffer. It could instead use a tombstone set with efficient tombstone compression a la Neil Conway’s Edelweiss (something @zzlk has on his plate).
Compiler analysis to do (b) automagically!
Identify monotonic use of mutable state, auto-convert to use lattice-based state.
Identify “pure local” uses of mutable state that will remain local even under replication (after replication to two nodes, each node keeps that state private from the other). Identify such non-monotonicity as “safe for replication”.
Example: TCP-style use of sequence numbering on point-to-point channels is non-monotonic because the protocol for message sequencing “peeks” at the increasing counters. Still, it seems OK to replicate otherwise-monotonic programs that use TCP. What properties assure that is safe for replication, and how do we check for those properties in general?
Egglog + Lattices in Hydroflow
Can we match egglog performance? What roadblocks exist?
Would the use of Hydroflow's UnionFind lattice help in some way (with perf, analysis, incremental view maintenance)?
High-performance Datalog (and Dedalus) over Hydroflow. c.f. Souffle, Egglog and other leading implementations.
Differential/Incremental execution on Hydroflow
In the style of DBSP, and then perhaps Differential Dataflow.
Do we have the operators and types required to support manually-constructed incremental flows?
Can we elegantly translate from “batch” execution in some language (Datalog? Hydroflow+?) to such incremental implementations?
Supporting additional algebraic types in Hydroflow
Abelian Groups like the Z-relations used in DBSP
Semirings: general-purpose provenance semirings, lattice semirings (with an idempotent +)?
Could a semiring for multisets replace and improve existing multiset support in Hydroflow?
Integration: what does it mean to have a dataflow library with multiple algebras at play? Composition? Optimization?
Architecture: library of example data structures (like the lattice library) and Hydroflow dataflow operators to suit, and optimizer opportunities.
Impact on distributed systems folks, e.g. CRDT fans.
Build a distributed transactional storage system in Hydroflow
Perhaps starting with @zzlk's Anna implementation?
Work with commodity cloud storage
Many design points in this space. Is there a "toolkit + optimizer" approach here? Discuss with @tiemobang.
Verifying / optimizing for "local-first" applications that are interactive without needing a server
With annotations on the original program to input/output paths that must not depend on a network, can we identify rewrites that use replication to disentangle the local state from the global one?
Static analysis over transactions in web apps: can we automatically find pairs of transactions that commute (and do not require coordination?)
As a first step, can we use SMT to verify whether a pair of transactions are commutative?
Extrapolating local Hydroflow performance to cloud deployments
By profiling each UDF in isolation on a laptop, how accurately can we predict real-world performance?
Ergonomic cycles in Hydroflow+
Functional programming is great for writing dataflow, except when there are fixpoint cycles involved. Hydroflow+ has an API for this, but can we do better? See Flix for inspiration
+
)?The text was updated successfully, but these errors were encountered: