Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add UnmarshalFirst #398

Merged
merged 1 commit into from
May 6, 2023
Merged

Conversation

immesys
Copy link
Contributor

@immesys immesys commented Apr 28, 2023

Description

In a similar vein to #397, there are some places in our code where we have a CBOR header in front of non-CBOR data. As we started integrating the decoder-based approach, we found there were places where we already had the data as a slice (not a reader), and the data items were sufficiently small that the overhead of constructing a reader over the slice so as to use the decoder became significant [1].

As a separate, but related issue, after we upgraded to 2.5.0, we found there were places in our code that relied upon the old behavior of Unmarshal being OK with the data slice having extraneous data, and it took us a little bit to find the issue as the comment on Unmarshal doesn't explicitly mention the new behavior.

This PR hopes to solve both of those issues by:

  • Adding an UnmarshalFirst (neé UnmarshalPrefix) function that behaves similarly to the ASN.1 and PEM functions I referenced in Add RemainingBytes utility to streaming decoder #397. Namely, it is exactly like Unmarshal except that it is ok with having extraneous data and will return the remaining data to you.
  • Adding a comment to Unmarshal that makes it clear that an ExtraneousDataError is returned if there is data appearing after a valid item. I think this makes it clearer to those observing errors after upgrading to 2.5.0, what the cause is and what they may want to do about it (change data, or use UnmarshalFirst)

[1] This PR also adds two benchmarks that show what I mean about performance difference between Unmarshal and Decode for smaller items. An excerpt of that from my Macbook Pro M1


BenchmarkUnmarshalPrefix/CBOR_bool_to_Go_bool-10                 	27587844	        43.44 ns/op
BenchmarkUnmarshalPrefix/CBOR_positive_int_to_Go_uint64-10       	23871630	        50.57 ns/op
BenchmarkUnmarshalPrefix/CBOR_float_to_Go_float64-10             	23281004	        52.00 ns/op
BenchmarkUnmarshalPrefix/CBOR_bytes_to_Go_[]uint8-10             	12625482	        93.86 ns/op
BenchmarkUnmarshalPrefix/CBOR_map_to_Go_map[string]string-10                	 1000000	      1197 ns/op
BenchmarkUnmarshalPrefixViaDecoder/CBOR_bool_to_Go_bool-10                           	 7343330	       168.1 ns/op
BenchmarkUnmarshalPrefixViaDecoder/CBOR_positive_int_to_Go_uint64-10                 	 7209103	       154.5 ns/op
BenchmarkUnmarshalPrefixViaDecoder/CBOR_float_to_Go_float64-10                       	 6824787	       159.2 ns/op
BenchmarkUnmarshalPrefixViaDecoder/CBOR_bytes_to_Go_[]uint8-10                       	 6131350	       211.1 ns/op
BenchmarkUnmarshalPrefixViaDecoder/CBOR_map_to_Go_map[string]string-10               	  908347	      1273 ns/op

PR Was Proposed and Welcomed in Currently Open Issue

I skipped the "propose in issue" step again, but as before, please feel free to propose a completely different solution and I will change the code (or to reject the idea entirely). No hard feelings.

Checklist (for code PR only, ignore for docs PR)

  • Include unit tests that cover the new code
  • Pass all unit tests
  • Pass all 18 ci linters (golint, gosec, staticcheck, etc.)
  • Sign each commit with your real name and email.
    Last line of each commit message should be in this format:
    Signed-off-by: Firstname Lastname firstname.lastname@example.com
  • Certify the Developer's Certificate of Origin 1.1
    (see next section).

Certify the Developer's Certificate of Origin 1.1

  • By marking this item as completed, I certify
    the Developer Certificate of Origin 1.1.
Developer Certificate of Origin
Version 1.1

Copyright (C) 2004, 2006 The Linux Foundation and its contributors.
660 York Street, Suite 102,
San Francisco, CA 94110 USA

Everyone is permitted to copy and distribute verbatim copies of this
license document, but changing it is not allowed.

Developer's Certificate of Origin 1.1

By making a contribution to this project, I certify that:

(a) The contribution was created in whole or in part by me and I
    have the right to submit it under the open source license
    indicated in the file; or

(b) The contribution is based upon previous work that, to the best
    of my knowledge, is covered under an appropriate open source
    license and I have the right under that license to submit that
    work with modifications, whether created in whole or in part
    by me, under the same open source license (unless I am
    permitted to submit under a different license), as indicated
    in the file; or

(c) The contribution was provided directly to me by some other
    person who certified (a), (b) or (c) and I have not modified
    it.

(d) I understand and agree that this project and the contribution
    are public and that a record of the contribution (including all
    personal information I submit with it, including my sign-off) is
    maintained indefinitely and may be redistributed consistent with
    this project or the open source license(s) involved.

@immesys immesys force-pushed the immesys/unmarshal-prefix branch 2 times, most recently from 5f79917 to d833b05 Compare April 28, 2023 18:35
@immesys
Copy link
Contributor Author

immesys commented Apr 28, 2023

A few rough spots with lint checks and accidentally using go1.20 features, but I think it's good to review now.

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for contributing this PR! I really appreciate you explaining your use case and including benchmarks!

BTW, UnmarshalPrefix can also be used for decoding CBOR Sequences (RFC 8742) without creating Decoder! 👍

Some minor suggestions:

  • Rename UnmarshalPrefix to UnmarshalFirst.
  • Add UnmarshalFirst function in DecMode interface.
  • A few minor tweaks to comments (e.g. used "CBOR data item").

decode.go Outdated Show resolved Hide resolved
decode.go Outdated Show resolved Hide resolved
decode.go Outdated Show resolved Hide resolved
decode.go Outdated Show resolved Hide resolved
@immesys immesys changed the title Add UnmarshalPrefix Add UnmarshalFirst May 1, 2023
Signed-off-by: Michael Andersen <michael.andersen@antimatter.io>
@immesys
Copy link
Contributor Author

immesys commented May 1, 2023

Thank you for your timely review! I have made the suggested changes, and rebased against master to incorporate the valid-to-wellformed change

Copy link
Owner

@fxamacker fxamacker left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks @immesys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants