feature: Transparent encoding and decoding of Go strings to and from CBOR byte strings #446

benluddy · 2023-12-07T20:47:15Z

Is your feature request related to a problem? Please describe.

I am attempting to add a CBOR encoding to an existing project that serves many API resources and supports JSON and Protobuf. API object types are primarily defined as Go struct types. Clients of the API that are written in Go can import and use these type definitions directly. Some clients can't use them because they don't know until runtime exactly what type they'll be handling, but are still able to perform useful work by relying on strict API conventions. Those clients are written to operate on "encoding/json"-compatible map[string]interface{} objects.

The existing encoders don't require that Go strings contain valid UTF-8 sequences, and I would like the proposed CBOR encoding to reject CBOR inputs that contain validity errors. This presents a few challenges:

Encoding a Go string that doesn't contain a valid UTF-8 sequence produces CBOR with validity errors (i.e. we can potentially encode something that we refuse to decode).
We can't change existing stable API types to use cbor.ByteString in place of string, since it would be a breaking change for consumers of the API types.
In dynamic clients, an object may have been encoded to CBOR from a Go struct but is decoded into map[string]interface{}. For compatibility with JSON, untagged CBOR byte strings need to decode into interface{} as string and not []byte.

Describe the solution you'd like

I thought it would be most effective to group this proposal under one issue since it comes from a single use case. I hope that's OK.

Add a new encode option controlling the CBOR string type of Go strings. The default would be "text string" and preserve the current behavior, and a second value would be "byte string".
Add a new decode option controlling the Go type when decoding a CBOR byte string into interface{}. The default would be "[]byte", preserving the current behavior, but with the ability to select "string".
Add a new decode option controlling whether or not to allow a CBOR byte string to be decoded into a destination value of type string. By default this can preserve the existing behavior.

Describe alternatives you've considered

Pre-validating and rejecting input strings containing invalid UTF-8 would make the CBOR encoder refuse to encode certain objects that the other supported encoders can encode.
Sanitizing invalid input strings in advance (as encoding/json does) incurs runtime overhead and isn't required by the existing Protobuf encoding.

Additional context

I'm happy to contribute all changes necessary to support this use case.

The text was updated successfully, but these errors were encountered:

fxamacker · 2023-12-11T03:25:05Z

Hi @benluddy, thanks for opening this issue! Please feel free to open a PR to add the proposed 3 non-default options! 👍

If my understanding is correct, you want to:

avoid overhead of checking/sanitizing UTF-8 string before encoding it, and
prevent encoding string containing invalid UTF-8 to CBOR text string

To achieve this, you want to add 3 options for Go string <--> CBOR byte string:

encode Go string to CBOR byte string
decode CBOR byte string as a Go string when decoding to interface{}
allow decoding CBOR byte string directly to Go string

Currently, the UTF8DecodeInvalid option allows decoding CBOR text strings containing invalid UTF-8 to Go string (invalid UTF-8 strings are well-formed but not valid).

Thanks again!

dinhxuanvu · 2024-02-02T15:20:54Z

This feature has been completed.
/close

fxamacker · 2024-02-18T20:00:59Z

Closed by #465

benluddy mentioned this issue Dec 8, 2023

feature: Expected later base64 encoding tag #449

Closed

fxamacker assigned benluddy Dec 11, 2023

fxamacker added the enhancement New feature or request label Dec 11, 2023

benluddy mentioned this issue Jan 5, 2024

New options for encoding Go strings to and from CBOR byte strings #465

Merged

8 tasks

fxamacker closed this as completed Feb 18, 2024

shanebishop mentioned this issue Feb 28, 2024

question: how to correctly decode CBOR-encoded data that was encoded with cbor.StringToByteString #500

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feature: Transparent encoding and decoding of Go strings to and from CBOR byte strings #446

feature: Transparent encoding and decoding of Go strings to and from CBOR byte strings #446

benluddy commented Dec 7, 2023

fxamacker commented Dec 11, 2023 •

edited

Loading

dinhxuanvu commented Feb 2, 2024

fxamacker commented Feb 18, 2024

feature: Transparent encoding and decoding of Go strings to and from CBOR byte strings #446

feature: Transparent encoding and decoding of Go strings to and from CBOR byte strings #446

Comments

benluddy commented Dec 7, 2023

fxamacker commented Dec 11, 2023 • edited Loading

dinhxuanvu commented Feb 2, 2024

fxamacker commented Feb 18, 2024

fxamacker commented Dec 11, 2023 •

edited

Loading