You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
I am attempting to add a CBOR encoding to an existing project that serves many API resources and supports JSON and Protobuf. API object types are primarily defined as Go struct types. Clients of the API that are written in Go can import and use these type definitions directly. Some clients can't use them because they don't know until runtime exactly what type they'll be handling, but are still able to perform useful work by relying on strict API conventions. Those clients are written to operate on "encoding/json"-compatible map[string]interface{} objects.
The existing encoders don't require that Go strings contain valid UTF-8 sequences, and I would like the proposed CBOR encoding to reject CBOR inputs that contain validity errors. This presents a few challenges:
Encoding a Go string that doesn't contain a valid UTF-8 sequence produces CBOR with validity errors (i.e. we can potentially encode something that we refuse to decode).
We can't change existing stable API types to use cbor.ByteString in place of string, since it would be a breaking change for consumers of the API types.
In dynamic clients, an object may have been encoded to CBOR from a Go struct but is decoded into map[string]interface{}. For compatibility with JSON, untagged CBOR byte strings need to decode into interface{} as string and not []byte.
Describe the solution you'd like
I thought it would be most effective to group this proposal under one issue since it comes from a single use case. I hope that's OK.
Add a new encode option controlling the CBOR string type of Go strings. The default would be "text string" and preserve the current behavior, and a second value would be "byte string".
Add a new decode option controlling the Go type when decoding a CBOR byte string into interface{}. The default would be "[]byte", preserving the current behavior, but with the ability to select "string".
Add a new decode option controlling whether or not to allow a CBOR byte string to be decoded into a destination value of type string. By default this can preserve the existing behavior.
Describe alternatives you've considered
Pre-validating and rejecting input strings containing invalid UTF-8 would make the CBOR encoder refuse to encode certain objects that the other supported encoders can encode.
Sanitizing invalid input strings in advance (as encoding/json does) incurs runtime overhead and isn't required by the existing Protobuf encoding.
Additional context
I'm happy to contribute all changes necessary to support this use case.
The text was updated successfully, but these errors were encountered:
Hi @benluddy, thanks for opening this issue! Please feel free to open a PR to add the proposed 3 non-default options! 👍
If my understanding is correct, you want to:
avoid overhead of checking/sanitizing UTF-8 string before encoding it, and
prevent encoding string containing invalid UTF-8 to CBOR text string
To achieve this, you want to add 3 options for Go string <--> CBOR byte string:
encode Go string to CBOR byte string
decode CBOR byte string as a Go string when decoding to interface{}
allow decoding CBOR byte string directly to Go string
Currently, the UTF8DecodeInvalid option allows decoding CBOR text strings containing invalid UTF-8 to Go string (invalid UTF-8 strings are well-formed but not valid).
Is your feature request related to a problem? Please describe.
I am attempting to add a CBOR encoding to an existing project that serves many API resources and supports JSON and Protobuf. API object types are primarily defined as Go struct types. Clients of the API that are written in Go can import and use these type definitions directly. Some clients can't use them because they don't know until runtime exactly what type they'll be handling, but are still able to perform useful work by relying on strict API conventions. Those clients are written to operate on "encoding/json"-compatible
map[string]interface{}
objects.The existing encoders don't require that Go strings contain valid UTF-8 sequences, and I would like the proposed CBOR encoding to reject CBOR inputs that contain validity errors. This presents a few challenges:
cbor.ByteString
in place ofstring
, since it would be a breaking change for consumers of the API types.map[string]interface{}
. For compatibility with JSON, untagged CBOR byte strings need to decode intointerface{}
asstring
and not[]byte
.Describe the solution you'd like
I thought it would be most effective to group this proposal under one issue since it comes from a single use case. I hope that's OK.
interface{}
. The default would be "[]byte", preserving the current behavior, but with the ability to select "string".string
. By default this can preserve the existing behavior.Describe alternatives you've considered
Additional context
I'm happy to contribute all changes necessary to support this use case.
The text was updated successfully, but these errors were encountered: