-
Notifications
You must be signed in to change notification settings - Fork 60
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Encode structs directly to output buffer. #519
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benluddy Thanks for opening this PR!
In addition to getting non-empty field count in the first pass, what do you think about also getting non-empty fields ([]*reflect.Value
) as well in the same pass? So we don't need to perform same operations in both passes.
For example:
- in the first pass, get non-empty field count
kvcount
, also create and populate a non-empty field reflect valuesfvs []*reflect.Value
on stack. - in the second pass, encode non-empty field if
fvs[i] != nil
.
Benchmarks show more improvement when updated with these changes.
Thoughts?
2a344fa
to
020398e
Compare
Instead of making two passes, it now encodes the items to the output buffer while counting, encodes the head at the end, and uses excess capacity in the output buffer to swap the positions of the encoded head and the encoded items. This turned out to be faster. Then I realized that you'll usually have a variable-length struct whose head encodes to the same number of bytes regardless of the number of items (e.g. any struct with fewer than 24 fields). In that case, it reserves space in the output buffer for the head, encodes the items, then overwrites the head bytes in the output buffer at the end, once it knows the actual map size. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@benluddy Thanks for updating this PR! The one pass approach sounds good! 👍
I have some suggestions to simplify the code for your consideration:
- have a utility function to return length of encoded map head with element count
- only cache
maxHeadLen
inencodingStructType
- in
encodeStruct()
, we can:- reserve bytes of
maxHeadLen
before encoding elements - overwrite the reserved bytes with the real head after elements are encoded
- if real head len < max head len, shift encoded elements to the left and truncate underlying buffer
- reserve bytes of
Thoughts?
Absolutely! For some reason I was afraid of overlapping copies, but they are clearly safe according to the spec (https://go.dev/ref/spec#Appending_and_copying_slices). I'll implement your suggestions and rerun the benchmarks. Thanks! |
For variable-length structs (structs with omitempty fields), encoding to the unused capacity at the end of the output buffer while counting nonempty items is cheaper than using a separate temporary buffer (no pool interactions and better spatial locality). Copying the items can be avoided entirely by reserving space in the output buffer for the head if the encoded length of the head can be predicted before checking optional fields. │ before.txt │ after.txt │ │ sec/op │ sec/op vs base │ Marshal/Go_struct_to_CBOR_map 1.404µ ± 0% 1.408µ ± 1% ~ (p=0.170 n=10) Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map 443.8n ± 0% 430.6n ± 0% -2.99% (p=0.000 n=10) Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map 181.7n ± 0% 163.5n ± 0% -10.04% (p=0.000 n=10) Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map 813.5n ± 0% 784.8n ± 0% -3.53% (p=0.000 n=10) Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map 300.8n ± 0% 275.4n ± 0% -8.43% (p=0.000 n=10) Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map 763.8n ± 0% 727.7n ± 0% -4.73% (p=0.000 n=10) Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map 284.2n ± 0% 257.6n ± 0% -9.36% (p=0.000 n=10) Marshal/Go_struct_keyasint_to_CBOR_map 1.422µ ± 0% 1.414µ ± 1% -0.56% (p=0.029 n=10) Marshal/Go_struct_toarray_to_CBOR_array 1.341µ ± 1% 1.338µ ± 1% ~ (p=0.340 n=10) MarshalCanonical/Go_struct_to_CBOR_map 386.4n ± 0% 392.4n ± 0% +1.57% (p=0.000 n=10) MarshalCanonical/Go_struct_to_CBOR_map_canonical 386.9n ± 0% 384.8n ± 0% -0.52% (p=0.001 n=10) geomean 560.5n 540.4n -3.59% │ before.txt │ after.txt │ │ B/op │ B/op vs base │ Marshal/Go_struct_to_CBOR_map 208.0 ± 0% 208.0 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map 176.0 ± 0% 176.0 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map 48.00 ± 0% 48.00 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map 160.0 ± 0% 160.0 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map 48.00 ± 0% 48.00 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_keyasint_to_CBOR_map 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_toarray_to_CBOR_array 192.0 ± 0% 192.0 ± 0% ~ (p=1.000 n=10) ¹ MarshalCanonical/Go_struct_to_CBOR_map 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ MarshalCanonical/Go_struct_to_CBOR_map_canonical 64.00 ± 0% 64.00 ± 0% ~ (p=1.000 n=10) ¹ geomean 46.18 46.18 +0.00% ¹ all samples are equal │ before.txt │ after.txt │ │ allocs/op │ allocs/op vs base │ Marshal/Go_struct_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_many_fields_all_omitempty_all_empty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_some_fields_all_omitempty_all_empty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_many_fields_all_omitempty_all_nonempty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_some_fields_all_omitempty_all_nonempty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_many_fields_one_omitempty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_some_fields_one_omitempty_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_keyasint_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ Marshal/Go_struct_toarray_to_CBOR_array 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ MarshalCanonical/Go_struct_to_CBOR_map 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ MarshalCanonical/Go_struct_to_CBOR_map_canonical 1.000 ± 0% 1.000 ± 0% ~ (p=1.000 n=10) ¹ geomean 1.000 1.000 +0.00% ¹ all samples are equal Signed-off-by: Ben Luddy <bluddy@redhat.com>
020398e
to
d981dec
Compare
@fxamacker I just pushed those changes. Benchmarks looks good! As expected, it's a bit faster on the interesting cases by avoiding the extra copies, and the worst-case scratch buffer space needed is only a few bytes instead of being proportional to the encoded size of the entire map:
One benchmark case appeared to regress, but I'm convinced I missed an interfering background process during that run. Re-running that case gave results on par with the previous implementation:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @benluddy for updating this PR and sharing benchmarks! 👍 LGTM!
This commit removes encodeFixedLengthStruct() and reuses encodeStruct() to simplify code. Previously, encodeStruct() used extra buffer to encode elements to get actual encoded element count. To avoid this overhead, encodeFixedLengthStruct() was created to encode fixed length struct (struct without any "omitempty" fields) since encoded element count is always known in this use case. With PR #519 (#519), encodeStruct() doesn't use extra buffer any more, and encodeFixedLengthStruct() isn't necessary.
Description
For variable-length structs (structs with omitempty fields), encoding to the unused capacity at the
end of the output buffer while counting nonempty items is cheaper than using a separate temporary
buffer (no pool interactions and better spatial locality). Copying the items can be avoided entirely
by reserving space in the output buffer for the head if the encoded length of the head can be
predicted before checking optional fields.
PR Was Proposed and Welcomed in Currently Open Issue
Checklist (for code PR only, ignore for docs PR)
Last line of each commit message should be in this format:
Signed-off-by: Firstname Lastname firstname.lastname@example.com
(see next section).
Certify the Developer's Certificate of Origin 1.1
the Developer Certificate of Origin 1.1.