Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use specific builder functions for encoding. #187

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

qnikst
Copy link

@qnikst qnikst commented Jul 5, 2020

Instead of keeping internal function for decoding we can
reuse much more efficient builders from the bytestring library.
It increases speed and improves memory usage and allows more
code sharing between the libraries.

Specialization pragmas were removed, they didn't provide much
benefit anyway as the code is recursive and optimizer does not
inline it. However internal functions were not removed so the
user of cassava can use them for semi-efficient numeric types
encoding in case if there are no efficient builders available
directly for those types.

@tibbe
Copy link
Collaborator

tibbe commented Jul 5, 2020

Please add some performance numbers to the pull request.

Instead of keeping internal function for decoding we can
reuse much more efficient builders from the bytestring library.
It increases speed and improves memory usage and allows more
code sharing between the libraries.

Specialization pragmas were removed, they didn't provide much
benefit anyway as the code is recursive and optimizer does not
inline it. However internal functions were not removed so the
user of cassava can use them for semi-efficient numeric types
encoding in case if there are no efficient builders available
directly for those types.
@qnikst
Copy link
Author

qnikst commented Jul 5, 2020

before the patch:

./Primitives
benchmarking toField/int8
time                 160.4 ns   (159.7 ns .. 161.6 ns)
                     0.999 R²   (0.996 R² .. 1.000 R²)
mean                 162.5 ns   (160.9 ns .. 167.4 ns)
std dev              8.235 ns   (2.792 ns .. 16.53 ns)
variance introduced by outliers: 71% (severely inflated)

benchmarking toField/int16
time                 216.2 ns   (213.2 ns .. 219.5 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 214.2 ns   (213.2 ns .. 215.6 ns)
std dev              3.857 ns   (2.628 ns .. 6.190 ns)
variance introduced by outliers: 22% (moderately inflated)

benchmarking toField/int32
time                 356.3 ns   (354.8 ns .. 358.8 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 363.8 ns   (359.5 ns .. 371.9 ns)
std dev              19.08 ns   (10.28 ns .. 34.14 ns)
variance introduced by outliers: 70% (severely inflated)

benchmarking toField/int64
time                 587.2 ns   (584.2 ns .. 591.0 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 585.3 ns   (582.4 ns .. 589.1 ns)
std dev              10.65 ns   (8.311 ns .. 13.65 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking toField/word8
time                 160.7 ns   (160.0 ns .. 161.1 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 158.7 ns   (157.6 ns .. 159.8 ns)
std dev              3.762 ns   (3.226 ns .. 4.635 ns)
variance introduced by outliers: 34% (moderately inflated)

benchmarking toField/word16
time                 203.6 ns   (202.5 ns .. 204.8 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 202.8 ns   (202.0 ns .. 204.1 ns)
std dev              3.682 ns   (2.742 ns .. 5.898 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking toField/word32
time                 341.9 ns   (337.1 ns .. 348.3 ns)
                     0.998 R²   (0.996 R² .. 1.000 R²)
mean                 337.4 ns   (334.7 ns .. 342.0 ns)
std dev              11.82 ns   (7.062 ns .. 20.90 ns)
variance introduced by outliers: 51% (severely inflated)

benchmarking toField/word64
time                 577.0 ns   (570.7 ns .. 584.1 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 573.5 ns   (570.2 ns .. 577.6 ns)
std dev              12.44 ns   (8.991 ns .. 17.87 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/float
time                 1.000 μs   (994.8 ns .. 1.008 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 999.1 ns   (995.7 ns .. 1.003 μs)
std dev              12.16 ns   (9.843 ns .. 16.90 ns)
variance introduced by outliers: 10% (moderately inflated)

benchmarking toField/double
time                 1.072 μs   (1.068 μs .. 1.078 μs)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 1.076 μs   (1.072 μs .. 1.081 μs)
std dev              16.04 ns   (12.37 ns .. 24.26 ns)
variance introduced by outliers: 14% (moderately inflated)

After the patch:

benchmarking toField/int8
time                 110.4 ns   (109.4 ns .. 111.3 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 110.0 ns   (109.5 ns .. 110.7 ns)
std dev              2.123 ns   (1.675 ns .. 2.956 ns)
variance introduced by outliers: 26% (moderately inflated)

benchmarking toField/int16
time                 119.3 ns   (116.6 ns .. 124.0 ns)
                     0.992 R²   (0.987 R² .. 0.996 R²)
mean                 119.7 ns   (117.0 ns .. 123.1 ns)
std dev              10.61 ns   (7.894 ns .. 13.13 ns)
variance introduced by outliers: 88% (severely inflated)

benchmarking toField/int32
time                 120.5 ns   (119.9 ns .. 121.1 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 120.3 ns   (119.7 ns .. 121.0 ns)
std dev              2.147 ns   (1.741 ns .. 2.653 ns)
variance introduced by outliers: 23% (moderately inflated)

benchmarking toField/int64
time                 139.2 ns   (137.4 ns .. 142.1 ns)
                     0.999 R²   (0.998 R² .. 1.000 R²)
mean                 138.6 ns   (137.7 ns .. 140.2 ns)
std dev              4.053 ns   (2.336 ns .. 6.299 ns)
variance introduced by outliers: 44% (moderately inflated)

benchmarking toField/word8
time                 109.2 ns   (108.7 ns .. 109.6 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 108.9 ns   (108.5 ns .. 110.0 ns)
std dev              2.172 ns   (1.238 ns .. 3.950 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/word16
time                 117.2 ns   (115.5 ns .. 119.8 ns)
                     0.996 R²   (0.993 R² .. 0.998 R²)
mean                 120.7 ns   (118.3 ns .. 124.3 ns)
std dev              9.617 ns   (7.068 ns .. 13.47 ns)
variance introduced by outliers: 86% (severely inflated)

benchmarking toField/word32
time                 117.8 ns   (116.9 ns .. 118.9 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 117.8 ns   (117.2 ns .. 118.6 ns)
std dev              2.343 ns   (1.847 ns .. 3.179 ns)
variance introduced by outliers: 27% (moderately inflated)

benchmarking toField/word64
time                 135.8 ns   (135.1 ns .. 136.8 ns)
                     1.000 R²   (1.000 R² .. 1.000 R²)
mean                 137.0 ns   (136.3 ns .. 137.7 ns)
std dev              2.329 ns   (1.909 ns .. 2.920 ns)
variance introduced by outliers: 21% (moderately inflated)

benchmarking toField/float
time                 1.143 μs   (1.135 μs .. 1.155 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.152 μs   (1.145 μs .. 1.160 μs)
std dev              24.15 ns   (18.47 ns .. 31.17 ns)
variance introduced by outliers: 25% (moderately inflated)

benchmarking toField/double
time                 1.221 μs   (1.208 μs .. 1.242 μs)
                     0.999 R²   (0.997 R² .. 1.000 R²)
mean                 1.213 μs   (1.206 μs .. 1.224 μs)
std dev              29.44 ns   (18.03 ns .. 51.01 ns)
variance introduced by outliers: 31% (moderately inflated)

should I include that in the patch content itself?

Actually for float, and double it gives worse result. So I've exclude it from the patch for now.
I'll investigate and see if builders in the bytestring should be improved.

@tibbe
Copy link
Collaborator

tibbe commented Jul 5, 2020

I'll let @hvr decide. For me just having the numbers somewhere is good (in particular to make sure someone looked at them).

@andreasabel
Copy link
Member

@qnikst

I'll investigate and see if builders in the bytestring should be improved.

What was the conclusion of this investigation?

I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?

@qnikst
Copy link
Author

qnikst commented Nov 11, 2021

What was the conclusion of this investigation?

I think I didn't come with a concrete solution, but I can't recall since long time passed.

I think it would make sense to add QuickCheck tests that demonstrate that the new implementation matches the behavior of the old. Do you think you could add such tests?

Yes! I'll do will make them on the weekend in the worst case.

@andreasabel
Copy link
Member

@qnikst : QuickCheck tests would be great here!

@andreasabel andreasabel added this to the 0.5.2.1 milestone Nov 16, 2021
@andreasabel andreasabel removed this from the 0.5.2.1 milestone Jan 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants