Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Graduate class project ideas #972

Closed
9 tasks
jhellerstein opened this issue Nov 29, 2023 · 2 comments
Closed
9 tasks

Graduate class project ideas #972

jhellerstein opened this issue Nov 29, 2023 · 2 comments
Assignees
Labels
good first issue Good for newcomers

Comments

@jhellerstein
Copy link
Contributor

jhellerstein commented Nov 29, 2023

  • Build out the “stdlib” of importable distributed protocol implementations in Hydroflow(+). Some things are simple (heartbeats, retries/acks), some are very complex (Paxos variants, BFT variants, etc). See Hackathon: Examples of common distributed design patterns #346.
  • Mostly-monotonic (lattice-oriented) variants of standard distributed protocols. For example, @jhellerstein is working on an implementation of retry/ack that uses a mutable “outstanding messages” buffer. It could instead use a tombstone set with efficient tombstone compression a la Neil Conway’s Edelweiss (something @zzlk has on his plate).
  • Compiler analysis to do (b) automagically!
    • Identify monotonic use of mutable state, auto-convert to use lattice-based state.
    • Identify “pure local” uses of mutable state that will remain local even under replication (after replication to two nodes, each node keeps that state private from the other). Identify such non-monotonicity as “safe for replication”.
      • Example: TCP-style use of sequence numbering on point-to-point channels is non-monotonic because the protocol for message sequencing “peeks” at the increasing counters. Still, it seems OK to replicate otherwise-monotonic programs that use TCP. What properties assure that is safe for replication, and how do we check for those properties in general?
  • Egglog + Lattices in Hydroflow
    • Can we match egglog performance? What roadblocks exist?
    • Would the use of Hydroflow's UnionFind lattice help in some way (with perf, analysis, incremental view maintenance)?
  • High-performance Datalog (and Dedalus) over Hydroflow. c.f. Souffle, Egglog and other leading implementations.
  • Differential/Incremental execution on Hydroflow
    • In the style of DBSP, and then perhaps Differential Dataflow.
    • Do we have the operators and types required to support manually-constructed incremental flows?
    • Can we elegantly translate from “batch” execution in some language (Datalog? Hydroflow+?) to such incremental implementations?
  • Supporting additional algebraic types in Hydroflow
    • Abelian Groups like the Z-relations used in DBSP
    • Semirings: general-purpose provenance semirings, lattice semirings (with an idempotent +)?
      • Could a semiring for multisets replace and improve existing multiset support in Hydroflow?
    • Integration: what does it mean to have a dataflow library with multiple algebras at play? Composition? Optimization?
    • Architecture: library of example data structures (like the lattice library) and Hydroflow dataflow operators to suit, and optimizer opportunities.
    • Impact on distributed systems folks, e.g. CRDT fans.
  • Build a distributed transactional storage system in Hydroflow
    • Perhaps starting with @zzlk's Anna implementation?
    • Work with commodity cloud storage
    • Many design points in this space. Is there a "toolkit + optimizer" approach here? Discuss with @tiemobang.
  • Practical worst-case optimal joins in Hydroflow
    • Have a look at Free Join
    • Many design points here. Is there a compositional "toolkit + optimizer" approach?
    • What does WCOJ mean for streams and other dynamics? Examine relationship of WCOJ with Eddies/STAIRS.
@shadaj
Copy link
Member

shadaj commented Nov 29, 2023

  • Verifying / optimizing for "local-first" applications that are interactive without needing a server
    • With annotations on the original program to input/output paths that must not depend on a network, can we identify rewrites that use replication to disentangle the local state from the global one?
  • Static analysis over transactions in web apps: can we automatically find pairs of transactions that commute (and do not require coordination?)
    • As a first step, can we use SMT to verify whether a pair of transactions are commutative?
  • Extrapolating local Hydroflow performance to cloud deployments
    • By profiling each UDF in isolation on a laptop, how accurately can we predict real-world performance?
  • Ergonomic cycles in Hydroflow+
    • Functional programming is great for writing dataflow, except when there are fixpoint cycles involved. Hydroflow+ has an API for this, but can we do better? See Flix for inspiration

@davidchuyaya
Copy link
Contributor

davidchuyaya commented Nov 29, 2023

  • Build a programming assistant that finds non-monotonicity in programs and suggests conversions to monotonic versions Copilot-style
    • Even better: automatically identifying when this conversion is safe and only providing correct rewrites
  • Convert TLA to Dedalus or Hydroflow

@jhellerstein jhellerstein self-assigned this Jan 8, 2024
@MingweiSamuel MingweiSamuel added the good first issue Good for newcomers label Jan 8, 2024
@hydro-project hydro-project locked and limited conversation to collaborators Aug 12, 2024
@MingweiSamuel MingweiSamuel converted this issue into discussion #1379 Aug 12, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
good first issue Good for newcomers
Projects
None yet
Development

No branches or pull requests

4 participants