Skip to content

Commit

Permalink
btree: Reduce opportunities for branch mispredictions in binary search
Browse files Browse the repository at this point in the history
We currently use a textbook binary search algorithm. This is known to
suffer from branch misprediction penalties. The branch mispredictions
can also clobber cache lines, which is detrimental to performance.

I recently read about a branchless binary search algorithm published by
Knuth called Shar's Algorithm (not to be confused with Shor's
algorithm). It is well known to outperform the textbook binary search
algorithm. It does an extra comparison. It is typically presented for
power of 2 array sizes, and adapting it to support non-power of 2 array
sizes is difficult to do in a way that is convincingly correct. Adapting
it to fill out zfs_btree_index_t is even more complex.

Therefore, I invented my own algorithm by refactoring the textbook
algorithm using a few tricks:

	1. x = (y < z) ? a : b is equivalent to
	   x = a * (y < z) + b * (y >= z)

	2. x = (y > z) ? a : b is equivalent to
	   x = a * (y > z) + b * (y <= z)

	3. The maximum number of iterations will be highbit(size), so we
	   can iterate on that.

	4. Ensuring that we get the same results means that we need to
	   handle early matches. This means we must avoid changes to the
	   values of comp and idx when comp is 0, which can do when comp
	   is 0 by doing idx = !!comp * (min + max) / 2 + !comp * idx.
	   This will make us repeat the previous comparison.

	5. If we delete the equal to case from the equivalencies used in
	   calculating min and max, we can cause them to be 0 when we
	   have an early match. This allows us to drop !!comp, since
	   0 + 0) / 2 is 0.

	6. There is still the matter of maintaining behavior when min >=
	   max, where the original algorithm will exit the loop. We
	   achieve this by modifying idx assignment to avoid changes to
	   the value whenever (min >= max). We multiply the first term
	   by (min < max) and replace !comp with (!comp | (min >= max))
	   so that the idx will remain unchanged whenever the original
	   algorithm will terminate early. min will be allowed to
	   increment under these conditions, but max will remain the
	   same, such that the function will always return as if it were
	   the original.

The result is that we avoid both branch misprediction penalties in the
loop and cache pollution by only accessing the memory locations that we
need to access to perform a binary search. This comes at the expense of
some additonal computations, but we are likely to stall waiting on
memory accesses otherwise, so the additional computations should be
effectively free.

Signed-off-by: Richard Yao <richard.yao@alumni.stonybrook.edu>
  • Loading branch information
ryao committed May 14, 2023
1 parent 7381ddf commit feeb3f2
Showing 1 changed file with 21 additions and 14 deletions.
35 changes: 21 additions & 14 deletions module/zfs/btree.c
Original file line number Diff line number Diff line change
Expand Up @@ -216,27 +216,34 @@ zfs_btree_create_custom(zfs_btree_t *tree,
}

/*
* Find value in the array of elements provided. Uses a simple binary search.
* Find value in the array of elements provided. Uses a "branchless" binary
* search derived by refactoring a simple binary search to avoid branch
* misprediction penalties by not branching within the loop.
*/
static void *
zfs_btree_find_in_buf(zfs_btree_t *tree, uint8_t *buf, uint32_t nelems,
const void *value, zfs_btree_index_t *where)
{
uint32_t max = nelems;
uint32_t min = 0;
while (max > min) {
uint32_t idx = (min + max) / 2;
uint8_t *cur = buf + idx * tree->bt_elem_size;
int comp = tree->bt_compar(cur, value);
if (comp < 0) {
min = idx + 1;
} else if (comp > 0) {
max = idx;
} else {
where->bti_offset = idx;
where->bti_before = B_FALSE;
return (cur);
}
uint32_t idx = 0;
uint8_t *cur;
uint32_t i = highbit64(nelems);
int comp = 1;

while (i--) {
idx = (min < max) * (min + max) / 2 +
(!comp | (min >= max)) * idx;
cur = buf + idx * tree->bt_elem_size;
comp = tree->bt_compar(cur, value);
min = (idx + 1) * (comp < 0) + min * (comp > 0);
max = idx * (comp > 0) + max * (comp < 0);
}

if (comp == 0) {
where->bti_offset = idx;
where->bti_before = B_FALSE;
return (cur);
}

where->bti_offset = max;
Expand Down

0 comments on commit feeb3f2

Please sign in to comment.