Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bug: Two different keys that decode into the same field are not considered duplicate keys when decoding into struct #488

Closed
benluddy opened this issue Feb 8, 2024 · 3 comments
Assignees
Labels
bug Something isn't working

Comments

@benluddy
Copy link
Contributor

benluddy commented Feb 8, 2024

What version of fxamacker/cbor are you using?

Current HEAD of master, 2da2d67.

Does this issue reproduce with the latest release?

Yes.

What OS and CPU architecture are you using (go env)?

go env Output
$ go env

What did you do?

  1. Create a DecMode with DupMapKey set to DupMapKeyEnforcedAPF.
  2. Unmarshal to struct a CBOR map containing two keys, both of which match the same struct field.

https://go.dev/play/p/jHzyzyfWCVD

What did you expect to see?

The caller of Unmarshal should have an indication that a map key in the input was effectively ignored. More specifically, if two map keys match the same struct field, they should be treated as equal for the purposes of detecting duplicate keys. This is not compatible with the current behavior of DupMapKeyEnforcedAPF, but since there is no way to tell that this has happened (other than doing a parallel decode into map), it seems reasonable to me to modify DupMapKeyEnforcedAPF rather than adding a new DupMapKeyMode.

I mentioned this in #432 in the context of case-insensitivity, but there may be other ways to reproduce it (possibly a separate issue, but I included in the playground link an example that relies on a keyasint field tag matching either an integer key or a string key containing the int encoded to base-ten).

What did you see instead?

No DupMapKeyError is returned from Unmarshal.

I understand that this raises interesting questions about consistency on the meaning of DupMapKeyError depending on whether the destination object has a struct type or a map type. I think some behavior differences are expected there. Today, for example, decoding into struct already forbids CBOR keys that are neither strings nor integers, even though the same keys may be permitted when decoding a map.

I'm especially interested in this because my use case is always configured to reject inputs with duplicate map keys. Populating the map used to detect duplicates (

cbor/decode.go

Line 1902 in 2da2d67

mapKeys = make(map[interface{}]struct{}, len(structType.fields))
) makes many short-lived heap allocations. For the common case where all map keys uniquely match one struct field, all of those allocations can be avoided if we can rely on the existing foundFldIdx []bool (or similar) to recognize duplicates. I've explored that optimization, and my (application-specific) benchmarks suggest that allocations during decode can be reduced by more than 50% along with a more than 1.6x speedup.

@benluddy benluddy changed the title bug: Two keys that decode into the same field are not considered duplicate keys when decoding into struct bug: Two different keys that decode into the same field are not considered duplicate keys when decoding into struct Feb 8, 2024
@fxamacker
Copy link
Owner

@benluddy Thanks for reporting this bug and sharing your insights!

The caller of Unmarshal should have an indication that a map key in the input was effectively ignored. More specifically, if two map keys match the same struct field, they should be treated as equal for the purposes of detecting duplicate keys.

I agree.

I understand that this raises interesting questions about consistency on the meaning of DupMapKeyError depending on whether the destination object has a struct type or a map type.

Yeah, this is about key equivalency for specific data models in section 2.2 of RFC 8949.

User can:

  • tweak destination type when decoding to any, or
  • specify destination type to decode to

So destination type is specific data model, and currently duplicate map key detection evaluates key based on key's destination type.

For struct destination, if different map keys can be decoded to the same field, those keys should be considered duplicate.

Also, potential json/v2 handles duplicate names depending on destination object.

it seems reasonable to me to modify DupMapKeyEnforcedAPF rather than adding a new DupMapKeyMode.

I agree! Please feel free to open PR to modify DupMapKeyEnforceAPF to resolve this issue.

I'm especially interested in this because my use case is always configured to reject inputs with duplicate map keys.
For the common case where all map keys uniquely match one struct field, all of those allocations can be avoided if we can rely on the existing foundFldIdx []bool (or similar) to recognize duplicates.

mapKeys is also used to detect duplicate map keys for fields not specified in struct destination. So maybe it is still needed even if decoder rejects keys that can be decoded to the same field.

Do you also reject unknown struct fields ExtraDecErrorUnknownField? If the use case is to reject unknown fields and no duplicate keys, the input data should be a subset for struct destination type, and I think mapKeys can be removed. Maybe we can special case this combination and take advantage of the optimization.

Thoughts?

@fxamacker fxamacker added the bug Something isn't working label Feb 12, 2024
@benluddy
Copy link
Contributor Author

I agree! Please feel free to open PR to modify DupMapKeyEnforceAPF to resolve this issue.

Great! I'll get a PR open soon.

Do you also reject unknown struct fields ExtraDecErrorUnknownField? If the use case is to reject unknown fields and no duplicate keys, the input data should be a subset for struct destination type, and I think mapKeys can be removed. Maybe we can special case this combination and take advantage of the optimization.

We sometimes reject and sometimes allow with a warning, but either way unknown fields must be detected. I think the optimization would be safe in general if mapKeys is used only for keys that don't match a struct field. You'd get a duplicate key error either when decoding into a field for the second time, or when two unknown fields are equal. If most or all keys map to fields in common inputs, most or all of the allocations related to mapKeys can be avoided. The existing limitation on which CBOR key types may be mapped to a struct field helps to rule out surprising cases, like having two identical strings with one enclosed in a tag.

@fxamacker
Copy link
Owner

Closed by #492

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants