Skip to content

Commit

Permalink
Add docs for reachability.h.
Browse files Browse the repository at this point in the history
PiperOrigin-RevId: 340330922
  • Loading branch information
Solumin authored and rchen152 committed Nov 3, 2020
1 parent 9dbc716 commit a51a99f
Showing 1 changed file with 94 additions and 7 deletions.
101 changes: 94 additions & 7 deletions docs/developers/typegraph.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# The Typegraph

<!--*
freshness: { owner: 'tsudol' reviewed: '2020-09-11' }
freshness: { owner: 'tsudol' reviewed: '2020-11-02' }
*-->

<!--ts-->
Expand All @@ -12,8 +12,10 @@ freshness: { owner: 'tsudol' reviewed: '2020-09-11' }
* [Default Data](#default-data)
* [Sets in the Typegraph](#sets-in-the-typegraph)
* [std::set or std::unordered_set?](#stdset-or-stdunordered_set)
* [Reachability](#reachability)
* [Implementation](#implementation)

<!-- Added by: rechen, at: 2020-09-14T12:22-07:00 -->
<!-- Added by: tsudol, at: 2020-11-02T11:52-08:00 -->

<!--te-->

Expand Down Expand Up @@ -150,12 +152,97 @@ cannot guarantee that the items in a set won't be modified, which violates the
`std::unordered_set` invariants. Finally, iterating over and comparing sets is
more efficient for `std::set`.

<!--
Reachability (reachable.h and .cc)
## Reachability

A basic operation when analyzing the CFG is checking if there is a path between
two nodes. This is used when checking if a binding's origin is visible at a
given node, for example. Queries always progress from a child node to the
parent, in the opposite direction of the edges of the CFG. To make these lookups
faster, the `Program` class uses a **backwards reachability** cache that's
handled by the `ReachabilityAnalyzer` class. Backwards reachability means that
the reachability analysis proceeds from child to parent, in the opposite
direction of the directed edges of the graph.

The `ReachabilityAnalyzer` tracks the list of adjacent nodes for each CFG node.
For node `i`, `reacahble[i][j]` indicates whether `j` is reachable from `i`.
When an edge is added to the reachability cache, the cache updates every node to
see if connections are possible.

- we know it's to make parts of the solver faster.
- But how
- and does it actually?
For example, consider three nodes `A -> B -> C`. The cache would be initialised
as:

```
reachable[A] = [True, False, False]
reachable[B] = [False, True, False]
reacahble[C] = [False, False, True]
```

(A node can always be reached from itself.) Since there's a connection from `B`
to `C`, the backwards edge `C -> B` is added to the cache:
```
reachable[A] = [True, False, False]
reachable[B] = [False, True, False]
reacahble[C] = [False, True, True]
```
Then the backwards edge `B -> A` is added:
```
reachable[A] = [True, False, False]
reachable[B] = [True, True, False]
reacahble[C] = [True, True, True]
```
The cache now shows that `A` can only reach itself, `B` can reach itself and
`A`, and `C` can reach all three nodes.
### Implementation
Every CFG node has an ID number of type `size_t`, which these docs will assume
is 64 bits. A naive implementation would make `reachable[n]` a list of the IDs
of the nodes reachable from node `n`, but at 8 bytes per node and potentially
thousands of nodes, that will get too expensive quickly. Instead, nodes are
split into _buckets_ based on their ID. Each bucket tracks 64 nodes, so the top
58 bits of the ID determine the node's bucket and the bottom 6 bits determine
the node's index within the bucket.
These buckets are implemented using bit vectors. Since a bucket covers 64 nodes,
it's represented by an `int64_t`. A node's reachable list is then a
`std::vector<int64_t>`, and the reachability cache that tracks every node's
reachable list is a `std::vector<std::vector<int64_t>>`. All together, node `j`
is reachable from node `i` if `reachable[i][bucket(j)] & bit(j) == 1`.

For example, consider a CFG with 100 nodes, which have IDs from 0 to 99. There
will be 100 entries in the reachability cache, one for each node, such that
`reachable[n]` corresponds to the nodes that are backwards reachable from node
`n`. `reachable[n]` is a `std::vector<int64_t>` with two elements, the first
tracking nodes 0 - 63 and the second tracking nodes 64 - 99, with room to track
another 28 nodes.

Let's check if node 75 is backwards reachable from node 30: `is_reachable(30,
75)`.

1. Find node 75's bucket: `bucket = 75 >> 6 = 1`. (This is equivalent to `75 /
64`.)
1. Find node 75's bit: `bit = 1 << (75 & 0x3F) = 1 << 11`.
1. Check node 30's reachability: `reachable[30][bucket] & bit`.

Adding a new node to the reachability cache is accomplished by adding another
entry to `reachable`. There is a catch: the cache must check if a new bucket is
needed to track the new node. If one is, then every node's reachable list is
extended with one more bucket. Finally, `reachable[n][bucket(n)] = bit[n]` is
set, indicating that node `n` is reachable from itself.

Adding an edge between two nodes is only slightly more complex. Because the
cache tracks reachability, adding an edge may update every node. Remember the `A
-> B -> C` example previously: `add_edge(B, A)` updated both `B` and `C`,
because `B` is reachable from `C`. For `add_edge(src, dst)`, the cache checks if
`src` is reachable from each node `i`, and if so, bitwise-ORs `reachable[i]` and
`reachable[dst]` together. Because `src` is reachable from itself, this will
also update `reachable[src]` when `i == src`.

<!--
Hashing and Sets
Expand Down

0 comments on commit a51a99f

Please sign in to comment.