Spec Notes #1

DavidBuchanan314 · 2025-02-11T11:39:40Z

Some notes on this version of the spec: https://github.com/jamesmunns/postcard-spec-ng/blob/7aa6c310bee2a24912aa4974c7910449d2c9372c/spec.typ (which I think corresponds to the pdf at https://postcard.rs/spec-6529e24.pdf )

These are just my "first impressions" as I read through.

"The Serde Data Model" could probably do with some kind of link/citation (I'm unfamiliar with it, as a non-rustacean)

Unsized Integer Encoding

Typo of unsigned?

The tables with two "Type" column headings are a little confusing, and I'm unsure what this one in particular is trying to convey (I'm sure I'll figure it out as I read more of the spec, this is just my first impression) (future edit: "wire format" might be a better heading for the second column?)

Section 2.2 explains why Zigzag encoding was chosen. This is nice context to have but I feel it gets in the way of the core spec itself. Maybe consider a dedicated appendix (or even a blog post?) that talks about design choices and goals? I'm getting the vibe that compactness is a design priority over e.g. ser/des perf on superscalar CPUs but this is never explicitly stated.

re: Canonicalization - why bother defining canonical-ness if postcard doesn't require it? Perhaps there is some unspoken "you SHOULD emit canonical encodings but MUST be prepared to accept non-canonical encodings?".

Schema Key Hashing

With "we don’t claim resistance to malicious events" in mind, I can still see a source of accidental hash collisions. The problem is, some of your primes also correspond to common ascii characters, e.g. bytearray -> 0x65 -> e. So path "care" with empty schema would hash to the same as path "car" with schema of a single bytearray. A simple solution could be to hash in a byte that's not valid in either path or schema, between the two, i.e..:

key = hash(PATH) + hash([0x00]) + hash(T::SCHEMA)

Also, why not fnv32? Simpler to implement on tiny systems, and should still give you plenty of resistance to non-malicious collisions.

Postcard-RPC

Not looked at yet! (but as a general thought, it might be useful to have some way to have each end communicate their max supported isize/usize, up-front)

The text was updated successfully, but these errors were encountered:

DavidBuchanan314 · 2025-02-11T11:52:44Z

on the hashing front, maybe also consider crc - should be a higher quality hash than fnv and maybe also faster on systems without a hardware multiplier (but maybe I'm over thinking this - is the hashing expected to be done during compilation rather than at runtime?)

jamesmunns · 2025-02-11T12:16:15Z

which I think corresponds to the pdf

I think you're off by a commit (the pdf has the shortcode of the commit it was built from). I think the only changes were to the readme and license tho.

The tables with two "Type" column headings are a little confusing, and I'm unsure what this one in particular is trying to convey (I'm sure I'll figure it out as I read more of the spec, this is just my first impression) (future edit: "wire format" might be a better heading for the second column?)

Yep, that's a good call. It's a comparison of the "rust type" vs the "wire type".

Edit: Fixed

"The Serde Data Model" could probably do with some kind of link/citation (I'm unfamiliar with it, as a non-rustacean)

Yep, the current pdf version only has one of the pages from https://postcard.jamesmunns.com, which does discuss and cite the Serde Data Model. I'm publishing this before it's really "done", and started as just messing around with Typst :)

Unsized Integer Encoding

Yep, that's a typo. Feel free to PR, or I'll fix.

Edit: fixed

Section 2.2 explains why Zigzag encoding was chosen. This is nice context to have but I feel it gets in the way of the core spec itself. Maybe consider a dedicated appendix (or even a blog post?) that talks about design choices and goals?

That's a good idea. Right now the current version of the spec serves as both, could make sense to extract normative vs context items for the next (this) revision of the spec.

I'm getting the vibe that compactness is a design priority over e.g. ser/des perf on superscalar CPUs but this is never explicitly stated.

Yes, generally! Size on the wire, and convenience for MCU targets is a pretty big priority. I did find that the "protobuf style varints" optimize pretty damn well, and beat the other techniques that I tried when benched on desktop systems in perf. My guess is that smaller data was enough to beat the "branchiness" introduced in decoding varints.

re: Canonicalization - why bother defining canonical-ness if postcard doesn't require it? Perhaps there is some unspoken "you SHOULD emit canonical encodings but MUST be prepared to accept non-canonical encodings?".

That's a good way of putting it: the table shows that postcard WILL accept it.

With "we don’t claim resistance to malicious events" in mind, I can still see a source of accidental hash collisions. The problem is, some of your primes also correspond to common ascii characters, e.g. bytearray -> 0x65 -> e. So path "care" with empty schema would hash to the same as path "car" with schema of a single bytearray. A simple solution could be to hash in a byte that's not valid in either path or schema, between the two, i.e..:

Hmm, there's not a way to have an "empty" schema, it's "one of N variants" basically. However we could have an empty path. I'm not sure if I check for that anywhere. Adding a fixed character in the middle (potentially an unused prime) could be a neat idea.

In GENERAL, this is mostly just a way to have pseudorandom, deterministic, message IDs. Like I said, I don't generally worry too much about malicious constructions, anyone with enough knowledge to create a key collision could just as easily create a valid payload anyway. Messages are still checked for successful deserialization (though I don't ensure that the whole message is consumed, and there are chances for misinterpretation if a byte sequence can be successfully decoded as multiple types), but my "threat model" is: "the user updated the client or the server, but not the other, and we don't want to accidentally misparse messages".

Also, why not fnv32? ... maybe also consider crc

The hashing is generally done exclusively at compile time. I honestly chose fnv1a64 because it was easy to implement as a const fn in Rust. In a typical setup, this hashing will never be done at runtime, and the keys are something like a compile-time generated blob. I actually originally tried to use blake2s, but ended up bailing because I didn't want to write a const-fn version of it. I'm not against changing the hash algo, but it's mostly "good enough" for me. I'd definitely accept a PR in the future that changed it for the better, if better could be quantified :D

postcard-rpc also checks at compile time the set of all keys, and picks the smallest possible version without a collision, so on the device side, these could be shrunk from 8 byte blobs to 4, 2, or 1 byte blobs, "perfect hash function" style.

DavidBuchanan314 · 2025-02-11T12:25:56Z

there's not a way to have an "empty" schema

You'd run into the same problem with, e.g. ("car", (bytearray, bytearray)) vs ("care", (bytearray)). Not hugely likely to happen by accident but I could still see it happening, and it's cheap to fix.

jamesmunns · 2025-02-11T12:28:15Z

it's cheap to fix

Yes, in software, though it does defintely make it a breaking wire change for any exisiting postcard-rpc users, which I don't love, but I definitely will include it in the next breaking change.

jamesmunns · 2025-02-11T12:38:53Z

Opened jamesmunns/postcard#217 to track the schema hash collision issue.

@DavidBuchanan314

Address comments from @DavidBuchanan314 on #1

DavidBuchanan314 · 2025-02-11T13:25:46Z

This is very much a nitpick/bikeshed/whatever, but would you consider an alternate notation for the hashing?

hash([0x01, 0x02]) + hash([0x03, 0x04])

key = hash(PATH) + hash(T::SCHEMA)

I find these a little counter-intuitive, because if you add the numerical hash values you won't get the right result. I could imagine someone doing a from-spec implementation getting confused by this.

I'd rather see something like:

key = hash(PATH + T::SCHEMA)

or

key = hash(PATH || T::SCHEMA)

(where || is a common concatenation symbol - but perhaps not overly well known to general audiences)

Another possibility is to think of hashing as a function that updates a state, like so:

state1 = hash(state0, string)

Thus,

hash(S0, [0x01, 0x02, 0x03, 0x04]) == hash(hash(S0, [0x01, 0x02]), [0x03, 0x04])

jamesmunns · 2025-02-11T13:30:38Z

Yeah, I'm likely to change the notation. The feedback has been universally negative on that point :D

jamesmunns mentioned this issue Feb 11, 2025

[postcard-schema] Key hash calculation could suffer collision if path and schema alias jamesmunns/postcard#217

Open

jamesmunns added a commit that referenced this issue Feb 11, 2025

Address review comments

6763142

Address comments from @DavidBuchanan314 on #1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Spec Notes #1

Spec Notes #1

DavidBuchanan314 commented Feb 11, 2025 •

edited

Loading

DavidBuchanan314 commented Feb 11, 2025

jamesmunns commented Feb 11, 2025 •

edited

Loading

DavidBuchanan314 commented Feb 11, 2025

jamesmunns commented Feb 11, 2025

jamesmunns commented Feb 11, 2025 •

edited

Loading

DavidBuchanan314 commented Feb 11, 2025

jamesmunns commented Feb 11, 2025

Spec Notes #1

Spec Notes #1

Comments

DavidBuchanan314 commented Feb 11, 2025 • edited Loading

Schema Key Hashing

Postcard-RPC

DavidBuchanan314 commented Feb 11, 2025

jamesmunns commented Feb 11, 2025 • edited Loading

DavidBuchanan314 commented Feb 11, 2025

jamesmunns commented Feb 11, 2025

jamesmunns commented Feb 11, 2025 • edited Loading

DavidBuchanan314 commented Feb 11, 2025

jamesmunns commented Feb 11, 2025

DavidBuchanan314 commented Feb 11, 2025 •

edited

Loading

jamesmunns commented Feb 11, 2025 •

edited

Loading

jamesmunns commented Feb 11, 2025 •

edited

Loading