Use specific builder functions for encoding. #187

qnikst · 2020-07-05T18:46:10Z

Instead of keeping internal function for decoding we can
reuse much more efficient builders from the bytestring library.
It increases speed and improves memory usage and allows more
code sharing between the libraries.

Specialization pragmas were removed, they didn't provide much
benefit anyway as the code is recursive and optimizer does not
inline it. However internal functions were not removed so the
user of cassava can use them for semi-efficient numeric types
encoding in case if there are no efficient builders available
directly for those types.

tibbe · 2020-07-05T20:33:16Z

Please add some performance numbers to the pull request.

Instead of keeping internal function for decoding we can reuse much more efficient builders from the bytestring library. It increases speed and improves memory usage and allows more code sharing between the libraries. Specialization pragmas were removed, they didn't provide much benefit anyway as the code is recursive and optimizer does not inline it. However internal functions were not removed so the user of cassava can use them for semi-efficient numeric types encoding in case if there are no efficient builders available directly for those types.

qnikst · 2020-07-05T21:03:40Z

before the patch:

./Primitives
benchmarking toField/int8
time                 160.4 ns   (159.7 ns .. 161.6 ns)
                     0.999 R²   (0.996 R² .. 1.000 R²)
mean                 162.5 ns   (160.9 ns .. 167.4 ns)
std dev              8.235 ns   (2.792 ns .. 16.53 ns)
variance introduced by outliers: 71% (severely inflated)

benchmarking toField/int16
time                 216.2 ns   (213.2 ns .. 219.5 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 214.2 ns   (213.2 ns .. 215.6 ns)
std dev              3.857 ns   (2.628 ns .. 6.190 ns)
variance introduced by outliers: 22% (moderately inflated)

benchmarking toField/int32
time                 356.3 ns   (354.8 ns .. 358.8 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 363.8 ns   (359.5 ns .. 371.9 ns)
std dev              19.08 ns   (10.28 ns .. 34.14 ns)
variance introduced by outliers: 70% (severely inflated)

benchmarking toField/int64
time                 587.2 ns   (584.2 ns .. 591.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 585.3 ns   (582.4 ns .. 589.1 ns)
std dev              10.65 ns   (8.311 ns .. 13.65 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking toField/word8
time                 160.7 ns   (160.0 ns .. 161.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 158.7 ns   (157.6 ns .. 159.8 ns)
std dev              3.762 ns   (3.226 ns .. 4.635 ns)
variance introduced by outliers: 34% (moderately inflated)

benchmarking toField/word16
time                 203.6 ns   (202.5 ns .. 204.8 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 202.8 ns   (202.0 ns .. 204.1 ns)
std dev              3.682 ns   (2.742 ns .. 5.898 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking toField/word32
time                 341.9 ns   (337.1 ns .. 348.3 ns)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 337.4 ns   (334.7 ns .. 342.0 ns)
std dev              11.82 ns   (7.062 ns .. 20.90 ns)
variance introduced by outliers: 51% (severely inflated)

benchmarking toField/word64
time                 577.0 ns   (570.7 ns .. 584.1 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 573.5 ns   (570.2 ns .. 577.6 ns)
std dev              12.44 ns   (8.991 ns .. 17.87 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/float
time                 1.000 μs   (994.8 ns .. 1.008 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 999.1 ns   (995.7 ns .. 1.003 μs)
std dev              12.16 ns   (9.843 ns .. 16.90 ns)
variance introduced by outliers: 10% (moderately inflated)

benchmarking toField/double
time                 1.072 μs   (1.068 μs .. 1.078 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.076 μs   (1.072 μs .. 1.081 μs)
std dev              16.04 ns   (12.37 ns .. 24.26 ns)
variance introduced by outliers: 14% (moderately inflated)

After the patch:

benchmarking toField/int8
time                 110.4 ns   (109.4 ns .. 111.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 110.0 ns   (109.5 ns .. 110.7 ns)
std dev              2.123 ns   (1.675 ns .. 2.956 ns)
variance introduced by outliers: 26% (moderately inflated)

benchmarking toField/int16
time                 119.3 ns   (116.6 ns .. 124.0 ns)
                     0.992 R²   (0.987 R² .. 0.996 R²)
mean                 119.7 ns   (117.0 ns .. 123.1 ns)
std dev              10.61 ns   (7.894 ns .. 13.13 ns)
variance introduced by outliers: 88% (severely inflated)

benchmarking toField/int32
time                 120.5 ns   (119.9 ns .. 121.1 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 120.3 ns   (119.7 ns .. 121.0 ns)
std dev              2.147 ns   (1.741 ns .. 2.653 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking toField/int64
time                 139.2 ns   (137.4 ns .. 142.1 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 138.6 ns   (137.7 ns .. 140.2 ns)
std dev              4.053 ns   (2.336 ns .. 6.299 ns)
variance introduced by outliers: 44% (moderately inflated)

benchmarking toField/word8
time                 109.2 ns   (108.7 ns .. 109.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 108.9 ns   (108.5 ns .. 110.0 ns)
std dev              2.172 ns   (1.238 ns .. 3.950 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/word16
time                 117.2 ns   (115.5 ns .. 119.8 ns)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 120.7 ns   (118.3 ns .. 124.3 ns)
std dev              9.617 ns   (7.068 ns .. 13.47 ns)
variance introduced by outliers: 86% (severely inflated)

benchmarking toField/word32
time                 117.8 ns   (116.9 ns .. 118.9 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 117.8 ns   (117.2 ns .. 118.6 ns)
std dev              2.343 ns   (1.847 ns .. 3.179 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/word64
time                 135.8 ns   (135.1 ns .. 136.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 137.0 ns   (136.3 ns .. 137.7 ns)
std dev              2.329 ns   (1.909 ns .. 2.920 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking toField/float
time                 1.143 μs   (1.135 μs .. 1.155 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.152 μs   (1.145 μs .. 1.160 μs)
std dev              24.15 ns   (18.47 ns .. 31.17 ns)
variance introduced by outliers: 25% (moderately inflated)

benchmarking toField/double
time                 1.221 μs   (1.208 μs .. 1.242 μs)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 1.213 μs   (1.206 μs .. 1.224 μs)
std dev              29.44 ns   (18.03 ns .. 51.01 ns)
variance introduced by outliers: 31% (moderately inflated)

should I include that in the patch content itself?

Actually for float, and double it gives worse result. So I've exclude it from the patch for now.
I'll investigate and see if builders in the bytestring should be improved.

tibbe · 2020-07-05T21:36:03Z

I'll let @hvr decide. For me just having the numbers somewhere is good (in particular to make sure someone looked at them).

andreasabel · 2021-11-10T08:40:40Z

@qnikst

I'll investigate and see if builders in the bytestring should be improved.

What was the conclusion of this investigation?

I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?

qnikst · 2021-11-11T10:06:26Z

What was the conclusion of this investigation?

I think I didn't come with a concrete solution, but I can't recall since long time passed.

I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?

Yes! I'll do will make them on the weekend in the worst case.

andreasabel · 2021-11-16T20:30:45Z

@qnikst : QuickCheck tests would be great here!

qnikst force-pushed the master branch from b8b2222 to a2c416a Compare July 5, 2020 21:03

andreasabel added the performance label Nov 16, 2021

andreasabel added this to the 0.5.2.1 milestone Nov 16, 2021

andreasabel removed this from the 0.5.2.1 milestone Jan 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use specific builder functions for encoding. #187

Use specific builder functions for encoding. #187

qnikst commented Jul 5, 2020

tibbe commented Jul 5, 2020

qnikst commented Jul 5, 2020

tibbe commented Jul 5, 2020

andreasabel commented Nov 10, 2021

qnikst commented Nov 11, 2021

andreasabel commented Nov 16, 2021

Use specific builder functions for encoding. #187

Are you sure you want to change the base?

Use specific builder functions for encoding. #187

Conversation

qnikst commented Jul 5, 2020

tibbe commented Jul 5, 2020

qnikst commented Jul 5, 2020

tibbe commented Jul 5, 2020

andreasabel commented Nov 10, 2021

qnikst commented Nov 11, 2021

andreasabel commented Nov 16, 2021