Cost analysis: Remove "Unacceptable" hack #6782

kripken · 2024-07-23T20:05:43Z

We marked various expressions as having cost "Unacceptable", fixed at 100, to
ensure we never moved them out from an If arm, etc. Giving them such a high
cost avoids that problem - the cost is higher than the limit we have for moving
code from conditional to unconditional execution - but it also means the total
cost is unrealistic. For example, a function with one such instruction + an add
(cost 1) would end up with cost 101, and removing the add would look
insignificant, which causes issues for things that want to compare costs
(like Monomorphization).

To fix this, adjust some costs. The main change here is to give casts a cost of 5.
I measured this in depth, see the attached benchmark scripts, and it looks
clear that in both V8 and SpiderMonkey the cost of a cast is high enough to
make it not worth turning an if with ref.test arm into a select (which would
always execute the test).

Other costs adjusted here matter a lot less, because they are on operations
that have side effects and so the optimizer will anyhow not move them from
conditional to unconditional execution, but I tried to make them a bit more
realistic while I was removing "Unacceptable":

Give most atomic operations the 10 cost we've been using for atomic loads/
stores. Perhaps wait and notify should be slower, however, but it seems like
assuming fast switching might be more relevant.
Give growth operations a cost of 20, and throw operations a cost of 10. These
numbers are entirely made up as I am not even sure how to measure them in
a useful way (but, again, this should not matter much as they have side
effects).

I verified that building a large Java program with this PR causes 0 changes
to code, so this should not risk regressions, though in principle this is not NFC
(and it unlocks Monomorphization work because of that).

aheejin · 2024-07-23T22:54:32Z

Would considering something like https://github.com/dfinity/ic/blob/master/rs/execution_environment/benches/wasm_instructions/WASM_BENCHMARKS.md be more informing than some made-up numbers?

kripken · 2024-07-23T23:17:55Z

Interesting, thanks. Hmm, e.g. memory.fill having a cost of "98" there is surely dependent on the size of the fill they are doing... Though if they had numbers on casts that could be very useful, but I don't see any, unfortunately. I also don't see atomics. edit: or for throw

But separately it might make sense to update all our basic math costs at some point, maybe using their numbers in part. Another source of info could be the benchmark framework that is begun in this PR - the cost of 5 for casts is from there.

tlively · 2024-07-23T23:26:29Z

The dfinity people did warn us to take their numbers with a grain of salt. In particular, their runtime does NaN canonicalization, so their floating point operations are much more expensive than we would expect.

Creating our own benchmarks to get better informed numbers makes sense to me.

kripken · 2024-07-23T23:37:31Z

Btw, here is the output of the script here:

len time:          2489.531000003106

and time:          2947.7890000008583
iff-both time:     3103.6149999974114

or time:           2935.7989999977317
iff-either time:   3241.3820000004575

select time:       2698.045000000748
iff-nextor time:   2607.062000000701

select-three time: 4385.273999998848
iff-three time:    3377.1729999980234

Details of what those are is in the benchmark file, but overall the first is just computing the length of a linked list (baseline to see the overhead of memory and the export call), and then pairs of a non-if and an if, that is, patterns of either executing a ref.test unconditionally or not. And/Or are perhaps a bit faster than an If, but a Select might be slower even in the best case (pair before last) while a worst-case Select is a lot worse (last pair).

Overall these justify not executing ref.test unconditionally even if it allows shrinking code size, hence the cost 5 (which is the cost from which we don't do that).

(Numbers are on V8, but the pattern is similar in SpiderMonkey too.)

tlively · 2024-07-23T23:38:44Z

scripts/benchmarking/bench.js

+// We'll call the benchmark functions in random orders.
+function makeOrders(prefix) {


Is it standard benchmarking practice to run the benchmarks in random orders? Is this meant to defeat unwanted optimizations? Generating every possible order up front is not going to scale to a larger number of benchmarks. Is there something simpler and more scalable we can do?

Is it standard benchmarking practice to run the benchmarks in random orders?

There are other ways to deal with benchmark interactions, like running all the tests for A first, then all the tests for B, etc., rather than interleaving. But interleaving actually makes it more realistic since real-world code is mixed in with other stuff, and it's simple enough to handle here, so it seems appropriate to me.

Is this meant to defeat unwanted optimizations?

Mainly to avoid order being an issue. Imagine that running A, B, C happens to have A warm up the cache for B, or B reset the branch predictor for C. Random orders avoid that.

Generating every possible order up front is not going to scale to a larger number of benchmarks. Is there something simpler and more scalable we can do?

Yeah, past some point it can't work, but so long as we don't hit that limit it is faster to do it this way. The other way would be to generate an unbiased random order on the fly each time, which is not hard, but just takes more work.

tlively · 2024-07-23T23:44:39Z

scripts/benchmarking/bench.wat

+    )
+  )
+
+  (func $makeC (export "makeC") (param $next (ref null $A)) (result anyref)


This is never called from the benchmarking script. Intentional?

Yeah, having $C prevents the optimizer from thinking $B could be final. And so far there isn't a benchmark that uses $C. I'll add a comment.

tlively · 2024-07-23T23:47:42Z

src/ir/cost.h

+  // The cost of throwing a wasm exception. This does not include the cost of
+  // catching it (which might be in another function than the one we are
+  // considering).
+  static const CostType ThrowCost = 10;


Even when pulling numbers out of nowhere, how do you divide the total cost between the catch and the throw? Neither executes without the other (assuming throws are caught at all).

Yeah, I'm not sure about Throw. My intuition is this cost would be the total cost (including the catch), since we don't add any cost in Try.

tlively · 2024-07-23T23:50:35Z

src/passes/RemoveUnusedBrs.cpp

-static_assert(TooCostlyToRunUnconditionally < CostAnalyzer::Unacceptable,
-              "We never run code unconditionally if it has unacceptable cost");


There are still instructions we know should not be run unconditionally if it can be avoided; is there something else we can replace this assertion with?

Good idea, done.

kripken · 2024-07-25T17:53:43Z

All comments should be addressed - @tlively did you have anything else?

tlively · 2024-07-25T18:07:16Z

Nope, LGTM

kripken added 12 commits July 22, 2024 13:48

yolo

d7cf929

work

26a8171

work

b2f2340

work

4cb66c4

more

a68bf16

work

2080449

work

d555059

work

05fc595

undo

e557efe

test

58eca12

test

89cbb81

test

252f45b

kripken requested review from tlively and aheejin July 23, 2024 20:05

typo

5b66de3

tlively reviewed Jul 23, 2024

View reviewed changes

kripken added 2 commits July 24, 2024 08:35

Add comments + asserts

de4a9fa

make conditions match the code below

f1506e5

aheejin approved these changes Jul 25, 2024

View reviewed changes

tlively approved these changes Jul 25, 2024

View reviewed changes

kripken merged commit 9cc1cb1 into WebAssembly:main Jul 25, 2024
13 checks passed

kripken deleted the newcost branch July 25, 2024 18:16

gkdn mentioned this pull request Aug 31, 2024

stringconsts gkdn/binaryen#1

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cost analysis: Remove "Unacceptable" hack #6782

Cost analysis: Remove "Unacceptable" hack #6782

kripken commented Jul 23, 2024

aheejin commented Jul 23, 2024

kripken commented Jul 23, 2024 •

edited

Loading

tlively commented Jul 23, 2024

kripken commented Jul 23, 2024

tlively Jul 23, 2024

kripken Jul 24, 2024

tlively Jul 23, 2024

kripken Jul 24, 2024

tlively Jul 23, 2024

kripken Jul 24, 2024

tlively Jul 23, 2024

kripken Jul 24, 2024

kripken commented Jul 25, 2024

tlively commented Jul 25, 2024

		// We'll call the benchmark functions in random orders.
		function makeOrders(prefix) {

		static_assert(TooCostlyToRunUnconditionally < CostAnalyzer::Unacceptable,
		"We never run code unconditionally if it has unacceptable cost");

Cost analysis: Remove "Unacceptable" hack #6782

Cost analysis: Remove "Unacceptable" hack #6782

Conversation

kripken commented Jul 23, 2024

aheejin commented Jul 23, 2024

kripken commented Jul 23, 2024 • edited Loading

tlively commented Jul 23, 2024

kripken commented Jul 23, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kripken commented Jul 25, 2024

tlively commented Jul 25, 2024

kripken commented Jul 23, 2024 •

edited

Loading