-
Notifications
You must be signed in to change notification settings - Fork 99
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
benchmark.cc: Default to inverted mode, add small_digits mode. #74
Conversation
First, this extracts generate_float() and generate_double(). That eliminates the `r` integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles. Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").
This makes it easier to pass options to bench32() and bench64().
This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.) This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers). As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as: 1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0 That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer). Currently, shorter output appears to be more stressful for doubles: ``` 64: 118.619 1.991 (x86 benchmark_clang -ryu -64) 64: 277.499 3.048 (x86 benchmark_clang -ryu -64 -small_digits=7) 64: 306.753 2.787 (x86 benchmark_clang -ryu -64 -small_digits=6) 64: 327.964 3.427 (x86 benchmark_clang -ryu -64 -small_digits=5) 64: 347.708 2.876 (x86 benchmark_clang -ryu -64 -small_digits=4) 64: 369.915 2.371 (x86 benchmark_clang -ryu -64 -small_digits=3) 64: 403.309 9.321 (x86 benchmark_clang -ryu -64 -small_digits=2) 64: 477.200 3.409 (x86 benchmark_clang -ryu -64 -small_digits=1) 64: 42.266 1.270 (x64 benchmark_clang -ryu -64) 64: 45.798 1.356 (x64 benchmark_clang -ryu -64 -small_digits=7) 64: 47.418 1.454 (x64 benchmark_clang -ryu -64 -small_digits=6) 64: 49.004 1.464 (x64 benchmark_clang -ryu -64 -small_digits=5) 64: 50.620 1.209 (x64 benchmark_clang -ryu -64 -small_digits=4) 64: 52.759 1.275 (x64 benchmark_clang -ryu -64 -small_digits=3) 64: 55.585 1.402 (x64 benchmark_clang -ryu -64 -small_digits=2) 64: 66.844 1.378 (x64 benchmark_clang -ryu -64 -small_digits=1) ``` Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case. ``` 32: 42.478 1.558 (x86 benchmark_clang -ryu -32) 32: 33.758 1.145 (x86 benchmark_clang -ryu -32 -small_digits=7) 32: 35.518 1.048 (x86 benchmark_clang -ryu -32 -small_digits=6) 32: 36.035 1.113 (x86 benchmark_clang -ryu -32 -small_digits=5) 32: 37.629 0.999 (x86 benchmark_clang -ryu -32 -small_digits=4) 32: 39.157 1.061 (x86 benchmark_clang -ryu -32 -small_digits=3) 32: 45.113 1.027 (x86 benchmark_clang -ryu -32 -small_digits=2) 32: 55.080 1.227 (x86 benchmark_clang -ryu -32 -small_digits=1) 32: 30.599 1.528 (x64 benchmark_clang -ryu -32) 32: 23.771 0.907 (x64 benchmark_clang -ryu -32 -small_digits=7) 32: 24.571 1.140 (x64 benchmark_clang -ryu -32 -small_digits=6) 32: 25.138 0.864 (x64 benchmark_clang -ryu -32 -small_digits=5) 32: 26.579 1.020 (x64 benchmark_clang -ryu -32 -small_digits=4) 32: 27.664 1.095 (x64 benchmark_clang -ryu -32 -small_digits=3) 32: 30.341 1.405 (x64 benchmark_clang -ryu -32 -small_digits=2) 32: 32.580 1.129 (x64 benchmark_clang -ryu -32 -small_digits=1) ```
I was using the int output to generate the graphs in the paper (with gnuplot). I'd prefer to keep that; I'm not sure this can easily be changed in bash or gnuplot. |
Restored! Also, I looked at gnuplot.template but couldn't figure out how to adapt it to the addition of ryu_output; is there an easy way to do that, or would it tolerate the string field being moved to the end? I think it's useful but of course I don't want to break your graphs. If necessary, I could add yet another option to emit ryu_output. |
I'll take a look. |
Note: If the hexfloat change is undesirable, I can restore the original behavior with a tiny bit of work.
benchmark.cc: Reject unrecognized options.
benchmark.cc: Print hexfloats in verbose mode.
First, this extracts generate_float() and generate_double().
That eliminates the
r
integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles.Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").
benchmark.cc: Default to inverted mode, add "-classic".
benchmark.cc: Extract benchmark_options.
This makes it easier to pass options to bench32() and bench64().
benchmark.cc: Validate samples and iterations options.
benchmark.cc: Add "-small_digits=%i".
This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.)
This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers).
As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as:
1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0
That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer).
Currently, shorter output appears to be more stressful for doubles:
Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case.