Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[postcard-schema] Key hash calculation could suffer collision if path and schema alias #217

Open
jamesmunns opened this issue Feb 11, 2025 · 7 comments
Labels
postcard-schema Related to the postcard-schema crate

Comments

@jamesmunns
Copy link
Owner

Described in jamesmunns/postcard-spec-ng#1 (comment),

Since some of the schema primes fall in the ascii range (and really, many other values could fall in the utf-8 range), it could be possible to get a collision between two Keys, for example since b'e' is the prime used for bytearray, the following two would alias:

("car", (bytearray, bytearray))
("care", (bytearray))

We should consider this to the next breaking change in postcard-schema and postcard-rpc.

Right now in postcard-rpc, this will lead to a compilation failure, as it would detect the two keys collide.

CC @DavidBuchanan314 who discovered this, and @max-heller for thoughts.

@jamesmunns
Copy link
Owner Author

One proposed fix for this is adding a value to be hashed between the two, ideally something that is NEITHER valid utf-8 NOR a valid schema item. We could potentially hash a fragment that is invalid utf-8, and also not a valid schema prime identifier.

@DavidBuchanan314
Copy link

Another mitigation could be to length-prefix the path

@jamesmunns
Copy link
Owner Author

Oh that's a good call. I also meant to make a note that we accept an empty path string, which probably isn't great for some reason I can't think of right now.

@max-heller max-heller added the postcard-schema Related to the postcard-schema crate label Feb 11, 2025
@max-heller
Copy link
Collaborator

Length-prefixing seems like a good solution.

we accept an empty path string, which probably isn't great for some reason I can't think of right now.

I'll note that I use empty paths

@jamesmunns
Copy link
Owner Author

One thing to note about length prefixing: we might need to pick a max length (e.g. u16, u32, u64), or commit to using varint encoding (varint(usize)) for the length. Basically, this is a note to not just blindly use usize as the length and hash the to_le_bytes form.

@thaodt
Copy link

thaodt commented Feb 13, 2025

One thing to note about length prefixing: we might need to pick a max length (e.g. u16, u32, u64), or commit to using varint encoding (varint(usize)) for the length. Basically, this is a note to not just blindly use usize as the length and hash the to_le_bytes form.

I'd like to go with using varint encoding for length prefixing. Fixed length can waste space for small values (e.g., u32 uses 4 bytes to store "testing" despite its 7-byte length).

@jamesmunns
Copy link
Owner Author

jamesmunns commented Feb 13, 2025

@thaodt this is for the key hashing, so we wouldn't store this anywhere. It would just be fed into the hasher. TYPICALLY this would only be done at a compile time anyway. The issue is: we need all systems to agree on the input to the hashing process, so for the length of "testing", that could either be [0x07] (varint), [0x07, 0x00, 0x00, 0x00] (u32), or [0x07, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00] (u64).

In all cases, the output hash will be the same size, and that is what is stored. It's just about how many bytes we feed into the hasher.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
postcard-schema Related to the postcard-schema crate
Projects
None yet
Development

No branches or pull requests

4 participants