Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

benchmark.cc: Default to inverted mode, add small_digits mode. #74

Merged
merged 7 commits into from
Aug 16, 2018

Conversation

StephanTLavavej
Copy link
Contributor

Note: If the hexfloat change is undesirable, I can restore the original behavior with a tiny bit of work.


benchmark.cc: Reject unrecognized options.


benchmark.cc: Print hexfloats in verbose mode.

First, this extracts generate_float() and generate_double().

That eliminates the r integers, so we need another way to print the exact data in verbose mode. C99's hexfloat conversion specifiers are easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping floats and doubles.

Finally, we can also simplify %lf to %f; the arguments are doubles (and C11 says that the 'l' length modifier "has no effect on a following a, A, e, E, f, F, g, or G conversion specifier").


benchmark.cc: Default to inverted mode, add "-classic".


benchmark.cc: Extract benchmark_options.

This makes it easier to pass options to bench32() and bench64().


benchmark.cc: Validate samples and iterations options.


benchmark.cc: Add "-small_digits=%i".

This option stresses Ryu's codepaths for small integers. It accepts values in the range [1, 7]. (32-bit floats have insufficient precision for larger values. With a little work, this range could be extended for 64-bit doubles, if benchmarking moderate-length integers is interesting.)

This also modifies verbose mode to print ryu_output, so we can see what Ryu is emitting (and verify that small_digits mode is actually testing small integers).

As the example in the comment explains, "-small_digits=3" tests values in the range [1.00, 9.99]. These will be printed as:

1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0

That is, there are a few 1-digit and 2-digit values, although most are 3-digit (and none are longer).

Currently, shorter output appears to be more stressful for doubles:

64:  118.619    1.991 (x86 benchmark_clang -ryu -64)
64:  277.499    3.048 (x86 benchmark_clang -ryu -64 -small_digits=7)
64:  306.753    2.787 (x86 benchmark_clang -ryu -64 -small_digits=6)
64:  327.964    3.427 (x86 benchmark_clang -ryu -64 -small_digits=5)
64:  347.708    2.876 (x86 benchmark_clang -ryu -64 -small_digits=4)
64:  369.915    2.371 (x86 benchmark_clang -ryu -64 -small_digits=3)
64:  403.309    9.321 (x86 benchmark_clang -ryu -64 -small_digits=2)
64:  477.200    3.409 (x86 benchmark_clang -ryu -64 -small_digits=1)

64:   42.266    1.270 (x64 benchmark_clang -ryu -64)
64:   45.798    1.356 (x64 benchmark_clang -ryu -64 -small_digits=7)
64:   47.418    1.454 (x64 benchmark_clang -ryu -64 -small_digits=6)
64:   49.004    1.464 (x64 benchmark_clang -ryu -64 -small_digits=5)
64:   50.620    1.209 (x64 benchmark_clang -ryu -64 -small_digits=4)
64:   52.759    1.275 (x64 benchmark_clang -ryu -64 -small_digits=3)
64:   55.585    1.402 (x64 benchmark_clang -ryu -64 -small_digits=2)
64:   66.844    1.378 (x64 benchmark_clang -ryu -64 -small_digits=1)

Interestingly, floats behave similarly except that "unlimited" digits are slower than -small_digits=7. I'm not sure why this is the case.

32:   42.478    1.558 (x86 benchmark_clang -ryu -32)
32:   33.758    1.145 (x86 benchmark_clang -ryu -32 -small_digits=7)
32:   35.518    1.048 (x86 benchmark_clang -ryu -32 -small_digits=6)
32:   36.035    1.113 (x86 benchmark_clang -ryu -32 -small_digits=5)
32:   37.629    0.999 (x86 benchmark_clang -ryu -32 -small_digits=4)
32:   39.157    1.061 (x86 benchmark_clang -ryu -32 -small_digits=3)
32:   45.113    1.027 (x86 benchmark_clang -ryu -32 -small_digits=2)
32:   55.080    1.227 (x86 benchmark_clang -ryu -32 -small_digits=1)

32:   30.599    1.528 (x64 benchmark_clang -ryu -32)
32:   23.771    0.907 (x64 benchmark_clang -ryu -32 -small_digits=7)
32:   24.571    1.140 (x64 benchmark_clang -ryu -32 -small_digits=6)
32:   25.138    0.864 (x64 benchmark_clang -ryu -32 -small_digits=5)
32:   26.579    1.020 (x64 benchmark_clang -ryu -32 -small_digits=4)
32:   27.664    1.095 (x64 benchmark_clang -ryu -32 -small_digits=3)
32:   30.341    1.405 (x64 benchmark_clang -ryu -32 -small_digits=2)
32:   32.580    1.129 (x64 benchmark_clang -ryu -32 -small_digits=1)

First, this extracts generate_float() and generate_double().

That eliminates the `r` integers, so we need another way to print the
exact data in verbose mode. C99's hexfloat conversion specifiers are
easy to use. "%.6a" and "%.13a" print enough hexits for round-tripping
floats and doubles.

Finally, we can also simplify %lf to %f; the arguments are doubles
(and C11 says that the 'l' length modifier "has no effect on a
following a, A, e, E, f, F, g, or G conversion specifier").
This makes it easier to pass options to bench32() and bench64().
This option stresses Ryu's codepaths for small integers. It accepts
values in the range [1, 7]. (32-bit floats have insufficient precision
for larger values. With a little work, this range could be extended
for 64-bit doubles, if benchmarking moderate-length integers is
interesting.)

This also modifies verbose mode to print ryu_output, so we can see what
Ryu is emitting (and verify that small_digits mode is actually testing
small integers).

As the example in the comment explains, "-small_digits=3" tests values
in the range [1.00, 9.99]. These will be printed as:

1E0, 1.01E0, ..., 1.09E0, 1.1E0, 1.11E0, ..., 9.98E0, 9.99E0

That is, there are a few 1-digit and 2-digit values, although most are
3-digit (and none are longer).

Currently, shorter output appears to be more stressful for doubles:

```
64:  118.619    1.991 (x86 benchmark_clang -ryu -64)
64:  277.499    3.048 (x86 benchmark_clang -ryu -64 -small_digits=7)
64:  306.753    2.787 (x86 benchmark_clang -ryu -64 -small_digits=6)
64:  327.964    3.427 (x86 benchmark_clang -ryu -64 -small_digits=5)
64:  347.708    2.876 (x86 benchmark_clang -ryu -64 -small_digits=4)
64:  369.915    2.371 (x86 benchmark_clang -ryu -64 -small_digits=3)
64:  403.309    9.321 (x86 benchmark_clang -ryu -64 -small_digits=2)
64:  477.200    3.409 (x86 benchmark_clang -ryu -64 -small_digits=1)

64:   42.266    1.270 (x64 benchmark_clang -ryu -64)
64:   45.798    1.356 (x64 benchmark_clang -ryu -64 -small_digits=7)
64:   47.418    1.454 (x64 benchmark_clang -ryu -64 -small_digits=6)
64:   49.004    1.464 (x64 benchmark_clang -ryu -64 -small_digits=5)
64:   50.620    1.209 (x64 benchmark_clang -ryu -64 -small_digits=4)
64:   52.759    1.275 (x64 benchmark_clang -ryu -64 -small_digits=3)
64:   55.585    1.402 (x64 benchmark_clang -ryu -64 -small_digits=2)
64:   66.844    1.378 (x64 benchmark_clang -ryu -64 -small_digits=1)
```

Interestingly, floats behave similarly except that "unlimited" digits
are slower than -small_digits=7. I'm not sure why this is the case.

```
32:   42.478    1.558 (x86 benchmark_clang -ryu -32)
32:   33.758    1.145 (x86 benchmark_clang -ryu -32 -small_digits=7)
32:   35.518    1.048 (x86 benchmark_clang -ryu -32 -small_digits=6)
32:   36.035    1.113 (x86 benchmark_clang -ryu -32 -small_digits=5)
32:   37.629    0.999 (x86 benchmark_clang -ryu -32 -small_digits=4)
32:   39.157    1.061 (x86 benchmark_clang -ryu -32 -small_digits=3)
32:   45.113    1.027 (x86 benchmark_clang -ryu -32 -small_digits=2)
32:   55.080    1.227 (x86 benchmark_clang -ryu -32 -small_digits=1)

32:   30.599    1.528 (x64 benchmark_clang -ryu -32)
32:   23.771    0.907 (x64 benchmark_clang -ryu -32 -small_digits=7)
32:   24.571    1.140 (x64 benchmark_clang -ryu -32 -small_digits=6)
32:   25.138    0.864 (x64 benchmark_clang -ryu -32 -small_digits=5)
32:   26.579    1.020 (x64 benchmark_clang -ryu -32 -small_digits=4)
32:   27.664    1.095 (x64 benchmark_clang -ryu -32 -small_digits=3)
32:   30.341    1.405 (x64 benchmark_clang -ryu -32 -small_digits=2)
32:   32.580    1.129 (x64 benchmark_clang -ryu -32 -small_digits=1)
```
@ulfjack
Copy link
Owner

ulfjack commented Aug 15, 2018

I was using the int output to generate the graphs in the paper (with gnuplot). I'd prefer to keep that; I'm not sure this can easily be changed in bash or gnuplot.

@StephanTLavavej
Copy link
Contributor Author

Restored! Also, I looked at gnuplot.template but couldn't figure out how to adapt it to the addition of ryu_output; is there an easy way to do that, or would it tolerate the string field being moved to the end? I think it's useful but of course I don't want to break your graphs. If necessary, I could add yet another option to emit ryu_output.

@ulfjack
Copy link
Owner

ulfjack commented Aug 16, 2018

I'll take a look.

@StephanTLavavej StephanTLavavej deleted the more_benchmarking branch August 16, 2018 18:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants