benchmark.cc: Default to inverted mode, add small_digits mode. #74

StephanTLavavej · 2018-08-15T02:45:55Z

Note: If the hexfloat change is undesirable, I can restore the original behavior with a tiny bit of work.

benchmark.cc: Reject unrecognized options.

benchmark.cc: Print hexfloats in verbose mode.

First, this extracts generate_float() and generate_double().

That eliminates the r integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles.

Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").

benchmark.cc: Default to inverted mode, add "-classic".

benchmark.cc: Extract benchmark_options.

This makes it easier to pass options to bench32() and bench64().

benchmark.cc: Validate samples and iterations options.

benchmark.cc: Add "-small_digits=%i".

This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.)

This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers).

As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as:

1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0

That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer).

Currently, shorter output appears to be more stressful for doubles:

64:  118.619    1.991 (x86 benchmark_clang -ryu -64)
64:  277.499    3.048 (x86 benchmark_clang -ryu -64 -small_digits=7)
64:  306.753    2.787 (x86 benchmark_clang -ryu -64 -small_digits=6)
64:  327.964    3.427 (x86 benchmark_clang -ryu -64 -small_digits=5)
64:  347.708    2.876 (x86 benchmark_clang -ryu -64 -small_digits=4)
64:  369.915    2.371 (x86 benchmark_clang -ryu -64 -small_digits=3)
64:  403.309    9.321 (x86 benchmark_clang -ryu -64 -small_digits=2)
64:  477.200    3.409 (x86 benchmark_clang -ryu -64 -small_digits=1)

64:   42.266    1.270 (x64 benchmark_clang -ryu -64)
64:   45.798    1.356 (x64 benchmark_clang -ryu -64 -small_digits=7)
64:   47.418    1.454 (x64 benchmark_clang -ryu -64 -small_digits=6)
64:   49.004    1.464 (x64 benchmark_clang -ryu -64 -small_digits=5)
64:   50.620    1.209 (x64 benchmark_clang -ryu -64 -small_digits=4)
64:   52.759    1.275 (x64 benchmark_clang -ryu -64 -small_digits=3)
64:   55.585    1.402 (x64 benchmark_clang -ryu -64 -small_digits=2)
64:   66.844    1.378 (x64 benchmark_clang -ryu -64 -small_digits=1)

Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case.

32:   42.478    1.558 (x86 benchmark_clang -ryu -32)
32:   33.758    1.145 (x86 benchmark_clang -ryu -32 -small_digits=7)
32:   35.518    1.048 (x86 benchmark_clang -ryu -32 -small_digits=6)
32:   36.035    1.113 (x86 benchmark_clang -ryu -32 -small_digits=5)
32:   37.629    0.999 (x86 benchmark_clang -ryu -32 -small_digits=4)
32:   39.157    1.061 (x86 benchmark_clang -ryu -32 -small_digits=3)
32:   45.113    1.027 (x86 benchmark_clang -ryu -32 -small_digits=2)
32:   55.080    1.227 (x86 benchmark_clang -ryu -32 -small_digits=1)

32:   30.599    1.528 (x64 benchmark_clang -ryu -32)
32:   23.771    0.907 (x64 benchmark_clang -ryu -32 -small_digits=7)
32:   24.571    1.140 (x64 benchmark_clang -ryu -32 -small_digits=6)
32:   25.138    0.864 (x64 benchmark_clang -ryu -32 -small_digits=5)
32:   26.579    1.020 (x64 benchmark_clang -ryu -32 -small_digits=4)
32:   27.664    1.095 (x64 benchmark_clang -ryu -32 -small_digits=3)
32:   30.341    1.405 (x64 benchmark_clang -ryu -32 -small_digits=2)
32:   32.580    1.129 (x64 benchmark_clang -ryu -32 -small_digits=1)

First, this extracts generate_float() and generate_double(). That eliminates the `r` integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles. Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").

This makes it easier to pass options to bench32() and bench64().

This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.) This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers). As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as: 1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0 That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer). Currently, shorter output appears to be more stressful for doubles: ``` 64: 118.619 1.991 (x86 benchmark_clang -ryu -64) 64: 277.499 3.048 (x86 benchmark_clang -ryu -64 -small_digits=7) 64: 306.753 2.787 (x86 benchmark_clang -ryu -64 -small_digits=6) 64: 327.964 3.427 (x86 benchmark_clang -ryu -64 -small_digits=5) 64: 347.708 2.876 (x86 benchmark_clang -ryu -64 -small_digits=4) 64: 369.915 2.371 (x86 benchmark_clang -ryu -64 -small_digits=3) 64: 403.309 9.321 (x86 benchmark_clang -ryu -64 -small_digits=2) 64: 477.200 3.409 (x86 benchmark_clang -ryu -64 -small_digits=1) 64: 42.266 1.270 (x64 benchmark_clang -ryu -64) 64: 45.798 1.356 (x64 benchmark_clang -ryu -64 -small_digits=7) 64: 47.418 1.454 (x64 benchmark_clang -ryu -64 -small_digits=6) 64: 49.004 1.464 (x64 benchmark_clang -ryu -64 -small_digits=5) 64: 50.620 1.209 (x64 benchmark_clang -ryu -64 -small_digits=4) 64: 52.759 1.275 (x64 benchmark_clang -ryu -64 -small_digits=3) 64: 55.585 1.402 (x64 benchmark_clang -ryu -64 -small_digits=2) 64: 66.844 1.378 (x64 benchmark_clang -ryu -64 -small_digits=1) ``` Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case. ``` 32: 42.478 1.558 (x86 benchmark_clang -ryu -32) 32: 33.758 1.145 (x86 benchmark_clang -ryu -32 -small_digits=7) 32: 35.518 1.048 (x86 benchmark_clang -ryu -32 -small_digits=6) 32: 36.035 1.113 (x86 benchmark_clang -ryu -32 -small_digits=5) 32: 37.629 0.999 (x86 benchmark_clang -ryu -32 -small_digits=4) 32: 39.157 1.061 (x86 benchmark_clang -ryu -32 -small_digits=3) 32: 45.113 1.027 (x86 benchmark_clang -ryu -32 -small_digits=2) 32: 55.080 1.227 (x86 benchmark_clang -ryu -32 -small_digits=1) 32: 30.599 1.528 (x64 benchmark_clang -ryu -32) 32: 23.771 0.907 (x64 benchmark_clang -ryu -32 -small_digits=7) 32: 24.571 1.140 (x64 benchmark_clang -ryu -32 -small_digits=6) 32: 25.138 0.864 (x64 benchmark_clang -ryu -32 -small_digits=5) 32: 26.579 1.020 (x64 benchmark_clang -ryu -32 -small_digits=4) 32: 27.664 1.095 (x64 benchmark_clang -ryu -32 -small_digits=3) 32: 30.341 1.405 (x64 benchmark_clang -ryu -32 -small_digits=2) 32: 32.580 1.129 (x64 benchmark_clang -ryu -32 -small_digits=1) ```

ulfjack · 2018-08-15T14:42:55Z

I was using the int output to generate the graphs in the paper (with gnuplot). I'd prefer to keep that; I'm not sure this can easily be changed in bash or gnuplot.

StephanTLavavej · 2018-08-15T18:48:53Z

Restored! Also, I looked at gnuplot.template but couldn't figure out how to adapt it to the addition of ryu_output; is there an easy way to do that, or would it tolerate the string field being moved to the end? I think it's useful but of course I don't want to break your graphs. If necessary, I could add yet another option to emit ryu_output.

ulfjack · 2018-08-16T12:23:39Z

I'll take a look.

StephanTLavavej added 6 commits August 13, 2018 21:59

benchmark.cc: Reject unrecognized options.

ebf9f44

benchmark.cc: Default to inverted mode, add "-classic".

e389a51

benchmark.cc: Extract benchmark_options.

6a1f227

This makes it easier to pass options to bench32() and bench64().

benchmark.cc: Validate samples and iterations options.

19161a6

benchmark.cc: Restore float_bits_as_int.

27d7712

StephanTLavavej mentioned this pull request Aug 16, 2018

Optimize 64-bit division-by-constant for x86 platforms #73

Merged

ulfjack merged commit 2dbe0a1 into ulfjack:master Aug 16, 2018

StephanTLavavej deleted the more_benchmarking branch August 16, 2018 18:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmark.cc: Default to inverted mode, add small_digits mode. #74

benchmark.cc: Default to inverted mode, add small_digits mode. #74

StephanTLavavej commented Aug 15, 2018

ulfjack commented Aug 15, 2018

StephanTLavavej commented Aug 15, 2018

ulfjack commented Aug 16, 2018

benchmark.cc: Default to inverted mode, add small_digits mode. #74

benchmark.cc: Default to inverted mode, add small_digits mode. #74

Conversation

StephanTLavavej commented Aug 15, 2018

ulfjack commented Aug 15, 2018

StephanTLavavej commented Aug 15, 2018

ulfjack commented Aug 16, 2018