judofyr · judofyr · Aug 15, 2024 · Aug 15, 2024
diff --git a/README.md b/README.md
@@ -13,7 +13,12 @@
 The benchmark in the figure above (summing over the nodes in a binary tree) is typically one of the worst cases for parallelism frameworks:
 The actual operation is extremely fast so any sort of overhead will have a measurable impact.
 
-Here's the exact same benchmark in [Rayon][rayon], an excellent library in Rust which uses work-stealing fork/join:
+Here's the _exact_ same benchmark in [Rayon][rayon], an excellent library in Rust for doing parallelism.
+Both implementations follow the same fork/join API which gives code that is very easy to read and reason about.
+None of the findings here would surprise anyone who deeply knows Rayon and there are ways of getting better performance out of Rayon by using different techniques.
+This comes at cost of the code becoming more complicated and/or behaving subpar on different types of input.
+The purpose of this benchmark is to not discourage use of Rayon (on the contrary!), but rather demonstrate that it _is_ possible to have both simple code and good parallelism.
+See [issue #5](https://github.com/judofyr/spice/issues/5) for a longer discussion.
 
 ![Time to calculate sum of binary tree of 100M nodes with Rayon](bench/rayon-tree-sum-100M.svg)
 

diff --git a/bench/README.md b/bench/README.md
@@ -7,6 +7,8 @@ Date: August 2024.
 [Rayon][rayon] is high-quality data-parallelism library written in Rust based on the well-known technique of _work-stealing fork/join_.
 [Spice](..), written in Zig, is an experimental implementation of _heartbeat scheduling_ which claims to have a much smaller overhead.
 We'd like to understand how these two techniques compares against each other.
+Rayon also provides a set of API around `ParallelIterator`.
+We're not focusing on these since it's not comparable to the API which Spice provides.
 
 Evaluations of parallel frameworks are often summarized along the lines of "we implemented X algorithms, ran it on a machine with 48 cores and saw a (geometric) mean improvement of 34x".
 This is a fine way of validating that it works for a wide range of problems, but it's hard to draw conclusions from the final result.
@@ -17,7 +19,7 @@ Further benchmarks are recommended to validate the findings.
 
 ## Key findings and recommendations
 
-- Rayon adds roughly **15 nanoseconds** overhead for a single invocation of `fork/join`.
+- Rayon adds roughly **15 nanoseconds** overhead for a single invocation of `rayon::join`.
   This means the smallest amount of work should take around **~1 microsecond** for the overhead be negligible (<1%).
 - Rayon shows **good linear scalability**: ~14x performance improvement when going from 1 to 16 threads.
   This was when the total duration of the program was in the scale of **seconds**.