-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spec Notes #1
Comments
on the hashing front, maybe also consider crc - should be a higher quality hash than fnv and maybe also faster on systems without a hardware multiplier (but maybe I'm over thinking this - is the hashing expected to be done during compilation rather than at runtime?) |
I think you're off by a commit (the pdf has the shortcode of the commit it was built from). I think the only changes were to the readme and license tho.
Yep, that's a good call. It's a comparison of the "rust type" vs the "wire type". Edit: Fixed
Yep, the current pdf version only has one of the pages from https://postcard.jamesmunns.com, which does discuss and cite the Serde Data Model. I'm publishing this before it's really "done", and started as just messing around with Typst :)
Yep, that's a typo. Feel free to PR, or I'll fix. Edit: fixed
That's a good idea. Right now the current version of the spec serves as both, could make sense to extract normative vs context items for the next (this) revision of the spec.
Yes, generally! Size on the wire, and convenience for MCU targets is a pretty big priority. I did find that the "protobuf style varints" optimize pretty damn well, and beat the other techniques that I tried when benched on desktop systems in perf. My guess is that smaller data was enough to beat the "branchiness" introduced in decoding varints.
That's a good way of putting it: the table shows that postcard WILL accept it.
Hmm, there's not a way to have an "empty" schema, it's "one of N variants" basically. However we could have an empty path. I'm not sure if I check for that anywhere. Adding a fixed character in the middle (potentially an unused prime) could be a neat idea. In GENERAL, this is mostly just a way to have pseudorandom, deterministic, message IDs. Like I said, I don't generally worry too much about malicious constructions, anyone with enough knowledge to create a key collision could just as easily create a valid payload anyway. Messages are still checked for successful deserialization (though I don't ensure that the whole message is consumed, and there are chances for misinterpretation if a byte sequence can be successfully decoded as multiple types), but my "threat model" is: "the user updated the client or the server, but not the other, and we don't want to accidentally misparse messages".
The hashing is generally done exclusively at compile time. I honestly chose fnv1a64 because it was easy to implement as a postcard-rpc also checks at compile time the set of all keys, and picks the smallest possible version without a collision, so on the device side, these could be shrunk from 8 byte blobs to 4, 2, or 1 byte blobs, "perfect hash function" style. |
You'd run into the same problem with, e.g. ("car", (bytearray, bytearray)) vs ("care", (bytearray)). Not hugely likely to happen by accident but I could still see it happening, and it's cheap to fix. |
Yes, in software, though it does defintely make it a breaking wire change for any exisiting postcard-rpc users, which I don't love, but I definitely will include it in the next breaking change. |
Opened jamesmunns/postcard#217 to track the schema hash collision issue. |
Address comments from @DavidBuchanan314 on #1
This is very much a nitpick/bikeshed/whatever, but would you consider an alternate notation for the hashing?
I find these a little counter-intuitive, because if you add the numerical hash values you won't get the right result. I could imagine someone doing a from-spec implementation getting confused by this. I'd rather see something like:
or
(where Another possibility is to think of hashing as a function that updates a state, like so:
Thus,
|
Yeah, I'm likely to change the notation. The feedback has been universally negative on that point :D |
Some notes on this version of the spec: https://github.com/jamesmunns/postcard-spec-ng/blob/7aa6c310bee2a24912aa4974c7910449d2c9372c/spec.typ (which I think corresponds to the pdf at https://postcard.rs/spec-6529e24.pdf )
These are just my "first impressions" as I read through.
"The Serde Data Model" could probably do with some kind of link/citation (I'm unfamiliar with it, as a non-rustacean)
Typo of unsigned?
The tables with two "Type" column headings are a little confusing, and I'm unsure what this one in particular is trying to convey (I'm sure I'll figure it out as I read more of the spec, this is just my first impression) (future edit: "wire format" might be a better heading for the second column?)
Section 2.2 explains why Zigzag encoding was chosen. This is nice context to have but I feel it gets in the way of the core spec itself. Maybe consider a dedicated appendix (or even a blog post?) that talks about design choices and goals? I'm getting the vibe that compactness is a design priority over e.g. ser/des perf on superscalar CPUs but this is never explicitly stated.
re: Canonicalization - why bother defining canonical-ness if postcard doesn't require it? Perhaps there is some unspoken "you SHOULD emit canonical encodings but MUST be prepared to accept non-canonical encodings?".
Schema Key Hashing
With "we don’t claim resistance to malicious events" in mind, I can still see a source of accidental hash collisions. The problem is, some of your primes also correspond to common ascii characters, e.g.
bytearray
-> 0x65 ->e
. So path "care" with empty schema would hash to the same as path "car" with schema of a single bytearray. A simple solution could be to hash in a byte that's not valid in either path or schema, between the two, i.e..:Also, why not fnv32? Simpler to implement on tiny systems, and should still give you plenty of resistance to non-malicious collisions.
Postcard-RPC
Not looked at yet! (but as a general thought, it might be useful to have some way to have each end communicate their max supported isize/usize, up-front)
The text was updated successfully, but these errors were encountered: