Skip to content

Commit

Permalink
Refactor GraphTraversal to avoid non-determinism (vercel/turborepo#4098)
Browse files Browse the repository at this point in the history
### Description

The current design of the `Visit` trait makes it easy to introduce
non-determinism by having `map_children` return different results
depending on the order in which it's called.

For instance, if `map_children` tries to de-duplicate children (as is
the case in `ChunkContentVisit`):

```
Order 1:
1. (from parent Node1) map_children([NodeA, NodeB]) -> GraphTraversalControlFlow::Continue([NodeA, NodeB])
2. (from parent Node2) map_children([NodeB, NodeC]) -> GraphTraversalControlFlow::Continue([NodeC])

Order 2:
1. (from parent Node2) map_children([NodeB, NodeC]) -> GraphTraversalControlFlow::Continue([NodeB, NodeC])
2. (from parent Node1) map_children([NodeA, NodeB]) -> GraphTraversalControlFlow::Continue([NodeA])
```

These two calling orders will result in different generated graphs, as
the first order will forget about the edge Node2 -> NodeB, while the
second will forget about the edge Node1 -> NodeB.

Instead, this PR makes it so `map_children` is called *after* inserting
all nodes into the graph. This ensures that a different ordering can't
affect the final shape of the graph.

It also refactors the `GraphTraversal::visit` method and the `Visit`
trait to make it more consistent with graph terminology and (hopefully)
easier to understand.

### Testing Instructions

This solves an issue on the branch sokra/repro-for-alex.
  • Loading branch information
alexkirsz authored Mar 7, 2023
1 parent 1170ad0 commit 8bf0d97
Show file tree
Hide file tree
Showing 10 changed files with 342 additions and 439 deletions.
35 changes: 35 additions & 0 deletions crates/turbo-tasks/src/graph/control_flow.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
/// The control flow of visiting an edge during a graph traversal.
pub enum VisitControlFlow<Node, Abort = !> {
/// The traversal should continue on the outgoing edges of the given node.
Continue(Node),
/// The traversal should skip visiting the edges the given node.
Skip(Node),
/// The traversal should abort and return immediately.
Abort(Abort),
}

impl<Node, Abort> VisitControlFlow<Node, Abort> {
/// Map the continue and skip values of this control flow.
pub fn map_node<Map, Mapped>(self, mut map: Map) -> VisitControlFlow<Mapped, Abort>
where
Map: FnMut(Node) -> Mapped,
{
match self {
VisitControlFlow::Continue(node) => VisitControlFlow::Continue(map(node)),
VisitControlFlow::Skip(node) => VisitControlFlow::Skip(map(node)),
VisitControlFlow::Abort(abort) => VisitControlFlow::Abort(abort),
}
}

/// Map the abort value of this control flow.
pub fn map_abort<Map, Mapped>(self, mut map: Map) -> VisitControlFlow<Node, Mapped>
where
Map: FnMut(Abort) -> Mapped,
{
match self {
VisitControlFlow::Continue(node) => VisitControlFlow::Continue(node),
VisitControlFlow::Skip(node) => VisitControlFlow::Skip(node),
VisitControlFlow::Abort(abort) => VisitControlFlow::Abort(map(abort)),
}
}
}
226 changes: 0 additions & 226 deletions crates/turbo-tasks/src/graph/get_children.rs

This file was deleted.

73 changes: 70 additions & 3 deletions crates/turbo-tasks/src/graph/graph_store.rs
Original file line number Diff line number Diff line change
@@ -1,10 +1,77 @@
use std::collections::HashSet;

/// A graph store is a data structure that will be built up during a graph
/// traversal. It is used to store the results of the traversal.
pub trait GraphStore<T>: Default {
pub trait GraphStore<Node>: Default {
type Handle: Clone;

// TODO(alexkirsz) An `entry(parent_handle) -> Entry` API would be more
// TODO(alexkirsz) An `entry(from_handle) -> Entry` API would be more
// efficient, as right now we're getting the same key multiple times.
/// Inserts a node into the graph store, and returns a handle to it.
fn insert(&mut self, parent_handle: Option<Self::Handle>, node: T) -> (Self::Handle, &T);
///
/// If this method returns `None`, the node edges will not be visited.
fn insert(
&mut self,
from_handle: Option<Self::Handle>,
node: Node,
) -> Option<(Self::Handle, &Node)>;
}

/// A [`GraphStore`] wrapper that skips nodes that have already been
/// visited. This is necessary to avoid repeated work when traversing non-tree
/// graphs (i.e. where a node can have more than one incoming edge).
#[derive(Debug)]
pub struct SkipDuplicates<StoreImpl, Node>
where
StoreImpl: GraphStore<Node>,
{
store: StoreImpl,
visited: HashSet<Node>,
}

impl<StoreImpl, Node> Default for SkipDuplicates<StoreImpl, Node>
where
StoreImpl: GraphStore<Node>,
{
fn default() -> Self {
Self {
store: Default::default(),
visited: Default::default(),
}
}
}

impl<StoreImpl, Node> GraphStore<Node> for SkipDuplicates<StoreImpl, Node>
where
StoreImpl: GraphStore<Node>,
Node: Eq + std::hash::Hash + Clone,
{
type Handle = StoreImpl::Handle;

fn insert(
&mut self,
from_handle: Option<Self::Handle>,
node: Node,
) -> Option<(Self::Handle, &Node)> {
if !self.visited.contains(&node) {
self.visited.insert(node.clone());
self.store.insert(from_handle, node)
} else {
// Always insert the node into the store, even if we've already
// visited it. This is necessary to ensure that the store sees all
// edges.
self.store.insert(from_handle, node);
None
}
}
}

impl<StoreImpl, Node> SkipDuplicates<StoreImpl, Node>
where
StoreImpl: GraphStore<Node>,
{
/// Consumes the wrapper and returns the underlying store.
pub fn into_inner(self) -> StoreImpl {
self.store
}
}
Loading

0 comments on commit 8bf0d97

Please sign in to comment.