Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Encode structs directly to output buffer. #519

Merged
merged 1 commit into from
May 4, 2024

Conversation

benluddy
Copy link
Contributor

@benluddy benluddy commented Apr 12, 2024

Description

For variable-length structs (structs with omitempty fields), encoding to the unused capacity at the
end of the output buffer while counting nonempty items is cheaper than using a separate temporary
buffer (no pool interactions and better spatial locality). Copying the items can be avoided entirely
by reserving space in the output buffer for the head if the encoded length of the head can be
predicted before checking optional fields.

                                                                     │ before.txt  │              after.txt              │
                                                                     │   sec/op    │   sec/op     vs base                │
Marshal/Go_struct_to_CBOR_map                                          1.404µ ± 0%   1.408µ ± 1%        ~ (p=0.170 n=10)
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      443.8n ± 0%   430.6n ± 0%   -2.99% (p=0.000 n=10)
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      181.7n ± 0%   163.5n ± 0%  -10.04% (p=0.000 n=10)
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   813.5n ± 0%   784.8n ± 0%   -3.53% (p=0.000 n=10)
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   300.8n ± 0%   275.4n ± 0%   -8.43% (p=0.000 n=10)
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                763.8n ± 0%   727.7n ± 0%   -4.73% (p=0.000 n=10)
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                284.2n ± 0%   257.6n ± 0%   -9.36% (p=0.000 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                                 1.422µ ± 0%   1.414µ ± 1%   -0.56% (p=0.029 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                                1.341µ ± 1%   1.338µ ± 1%        ~ (p=0.340 n=10)
MarshalCanonical/Go_struct_to_CBOR_map                                 386.4n ± 0%   392.4n ± 0%   +1.57% (p=0.000 n=10)
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       386.9n ± 0%   384.8n ± 0%   -0.52% (p=0.001 n=10)
geomean                                                                560.5n        540.4n        -3.59%

                                                                     │ before.txt │              after.txt              │
                                                                     │    B/op    │    B/op     vs base                 │
Marshal/Go_struct_to_CBOR_map                                          208.0 ± 0%   208.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   176.0 ± 0%   176.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   48.00 ± 0%   48.00 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                160.0 ± 0%   160.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                48.00 ± 0%   48.00 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_keyasint_to_CBOR_map                                 192.0 ± 0%   192.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_toarray_to_CBOR_array                                192.0 ± 0%   192.0 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map                                 64.00 ± 0%   64.00 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       64.00 ± 0%   64.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                46.18        46.18       +0.00%
¹ all samples are equal

                                                                     │ before.txt │              after.txt              │
                                                                     │ allocs/op  │ allocs/op   vs base                 │
Marshal/Go_struct_to_CBOR_map                                          1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_keyasint_to_CBOR_map                                 1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_toarray_to_CBOR_array                                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map                                 1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                1.000        1.000       +0.00%
¹ all samples are equal

PR Was Proposed and Welcomed in Currently Open Issue

  • This PR was proposed and welcomed by maintainer(s) in issue #___
  • Closes or Updates Issue #___

Checklist (for code PR only, ignore for docs PR)

  • Include unit tests that cover the new code
  • Pass all unit tests
  • Pass all lint checks in CI (goimports, gosec, staticcheck, etc.)
  • Sign each commit with your real name and email.
    Last line of each commit message should be in this format:
    Signed-off-by: Firstname Lastname firstname.lastname@example.com
  • Certify the Developer's Certificate of Origin 1.1
    (see next section).

Certify the Developer's Certificate of Origin 1.1

  • By marking this item as completed, I certify
    the Developer Certificate of Origin 1.1.
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benluddy Thanks for opening this PR!

In addition to getting non-empty field count in the first pass, what do you think about also getting non-empty fields ([]*reflect.Value) as well in the same pass? So we don't need to perform same operations in both passes.

For example:

  • in the first pass, get non-empty field count kvcount, also create and populate a non-empty field reflect values fvs []*reflect.Value on stack.
  • in the second pass, encode non-empty field if fvs[i] != nil.

Benchmarks show more improvement when updated with these changes.

Thoughts?

@fxamacker fxamacker added this to the v2.7.0 milestone Apr 22, 2024
@benluddy benluddy force-pushed the struct-encode-directly branch 2 times, most recently from 2a344fa to 020398e Compare April 23, 2024 01:52
@benluddy
Copy link
Contributor Author

Instead of making two passes, it now encodes the items to the output buffer while counting, encodes the head at the end, and uses excess capacity in the output buffer to swap the positions of the encoded head and the encoded items. This turned out to be faster. Then I realized that you'll usually have a variable-length struct whose head encodes to the same number of bytes regardless of the number of items (e.g. any struct with fewer than 24 fields). In that case, it reserves space in the output buffer for the head, encodes the items, then overwrites the head bytes in the output buffer at the end, once it knows the actual map size.

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@benluddy Thanks for updating this PR! The one pass approach sounds good! 👍

I have some suggestions to simplify the code for your consideration:

  • have a utility function to return length of encoded map head with element count
  • only cache maxHeadLen in encodingStructType
  • in encodeStruct(), we can:
    • reserve bytes of maxHeadLen before encoding elements
    • overwrite the reserved bytes with the real head after elements are encoded
    • if real head len < max head len, shift encoded elements to the left and truncate underlying buffer

Thoughts?

@benluddy
Copy link
Contributor Author

@benluddy Thanks for updating this PR! The one pass approach sounds good! 👍

I have some suggestions to simplify the code for your consideration:

* have a utility function to return length of encoded map head with element count

* only cache `maxHeadLen` in `encodingStructType`

* in `encodeStruct()`, we can:
  
  * reserve bytes of `maxHeadLen` before encoding elements
  * overwrite the reserved bytes with the real head after elements are encoded
  * if real head len < max head len, shift encoded elements to the left and truncate underlying buffer

Thoughts?

Absolutely! For some reason I was afraid of overlapping copies, but they are clearly safe according to the spec (https://go.dev/ref/spec#Appending_and_copying_slices). I'll implement your suggestions and rerun the benchmarks. Thanks!

For variable-length structs (structs with omitempty fields), encoding to the unused capacity at the
end of the output buffer while counting nonempty items is cheaper than using a separate temporary
buffer (no pool interactions and better spatial locality). Copying the items can be avoided entirely
by reserving space in the output buffer for the head if the encoded length of the head can be
predicted before checking optional fields.

                                                                     │ before.txt  │              after.txt              │
                                                                     │   sec/op    │   sec/op     vs base                │
Marshal/Go_struct_to_CBOR_map                                          1.404µ ± 0%   1.408µ ± 1%        ~ (p=0.170 n=10)
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      443.8n ± 0%   430.6n ± 0%   -2.99% (p=0.000 n=10)
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      181.7n ± 0%   163.5n ± 0%  -10.04% (p=0.000 n=10)
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   813.5n ± 0%   784.8n ± 0%   -3.53% (p=0.000 n=10)
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   300.8n ± 0%   275.4n ± 0%   -8.43% (p=0.000 n=10)
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                763.8n ± 0%   727.7n ± 0%   -4.73% (p=0.000 n=10)
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                284.2n ± 0%   257.6n ± 0%   -9.36% (p=0.000 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                                 1.422µ ± 0%   1.414µ ± 1%   -0.56% (p=0.029 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                                1.341µ ± 1%   1.338µ ± 1%        ~ (p=0.340 n=10)
MarshalCanonical/Go_struct_to_CBOR_map                                 386.4n ± 0%   392.4n ± 0%   +1.57% (p=0.000 n=10)
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       386.9n ± 0%   384.8n ± 0%   -0.52% (p=0.001 n=10)
geomean                                                                560.5n        540.4n        -3.59%

                                                                     │ before.txt │              after.txt              │
                                                                     │    B/op    │    B/op     vs base                 │
Marshal/Go_struct_to_CBOR_map                                          208.0 ± 0%   208.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   176.0 ± 0%   176.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   48.00 ± 0%   48.00 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                160.0 ± 0%   160.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                48.00 ± 0%   48.00 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_keyasint_to_CBOR_map                                 192.0 ± 0%   192.0 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_toarray_to_CBOR_array                                192.0 ± 0%   192.0 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map                                 64.00 ± 0%   64.00 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       64.00 ± 0%   64.00 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                46.18        46.18       +0.00%
¹ all samples are equal

                                                                     │ before.txt │              after.txt              │
                                                                     │ allocs/op  │ allocs/op   vs base                 │
Marshal/Go_struct_to_CBOR_map                                          1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_keyasint_to_CBOR_map                                 1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
Marshal/Go_struct_toarray_to_CBOR_array                                1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map                                 1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       1.000 ± 0%   1.000 ± 0%       ~ (p=1.000 n=10) ¹
geomean                                                                1.000        1.000       +0.00%
¹ all samples are equal

Signed-off-by: Ben Luddy <bluddy@redhat.com>
@benluddy benluddy force-pushed the struct-encode-directly branch from 020398e to d981dec Compare April 29, 2024 15:50
@benluddy
Copy link
Contributor Author

@fxamacker I just pushed those changes. Benchmarks looks good! As expected, it's a bit faster on the interesting cases by avoiding the extra copies, and the worst-case scratch buffer space needed is only a few bytes instead of being proportional to the encoded size of the entire map:

                                                                     │  prev.txt   │              next.txt              │
                                                                     │   sec/op    │   sec/op     vs base               │
Marshal/Go_struct_to_CBOR_map                                          1.403µ ± 1%   1.408µ ± 1%       ~ (p=0.672 n=10)
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      425.5n ± 0%   430.6n ± 0%  +1.20% (p=0.000 n=10)
Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map      163.5n ± 0%   163.5n ± 0%       ~ (p=0.283 n=10)
Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map   800.4n ± 0%   784.8n ± 0%  -1.94% (p=0.000 n=10)
Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map   278.7n ± 0%   275.4n ± 0%  -1.15% (p=0.000 n=10)
Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map                730.9n ± 0%   727.7n ± 0%  -0.43% (p=0.000 n=10)
Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map                259.3n ± 0%   257.6n ± 0%  -0.64% (p=0.000 n=10)
Marshal/Go_struct_keyasint_to_CBOR_map                                 1.414µ ± 1%   1.414µ ± 1%       ~ (p=0.445 n=10)
Marshal/Go_struct_toarray_to_CBOR_array                                1.352µ ± 1%   1.338µ ± 1%  -1.07% (p=0.007 n=10)
MarshalCanonical/Go_struct_to_CBOR_map                                 392.5n ± 0%   392.4n ± 0%       ~ (p=0.514 n=10)
MarshalCanonical/Go_struct_to_CBOR_map_canonical                       393.7n ± 0%   384.8n ± 0%  -2.25% (p=0.001 n=10)
geomean                                                                543.3n        540.4n       -0.54%

One benchmark case appeared to regress, but I'm convinced I missed an interfering background process during that run. Re-running that case gave results on par with the previous implementation:

                                                                     │    0.txt    │                1.txt                 │
                                                                     │   sec/op    │   sec/op     vs base                 │
Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map      425.5n ± 0%   425.9n ± 0%  +0.09% (p=0.015 n=10)

@benluddy benluddy requested a review from fxamacker April 29, 2024 16:01
Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @benluddy for updating this PR and sharing benchmarks! 👍 LGTM!

@fxamacker fxamacker merged commit 28a8572 into fxamacker:master May 4, 2024
17 checks passed
fxamacker added a commit that referenced this pull request May 5, 2024
This commit removes encodeFixedLengthStruct() and reuses
encodeStruct() to simplify code.

Previously, encodeStruct() used extra buffer to encode elements
to get actual encoded element count.  To avoid this overhead,
encodeFixedLengthStruct() was created to encode fixed
length struct (struct without any "omitempty" fields) since
encoded element count is always known in this use case.

With PR #519 (#519),
encodeStruct() doesn't use extra buffer any more, and
encodeFixedLengthStruct() isn't necessary.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants