-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Cases for a "self describing postcard" #92
Comments
Also: If you think you might want this or want to try this out before it releases, feel free to sound off here as well. I'll keep you in the loop whenever I have something ready to try. |
Here's some prior art that was shown for me (a different way of generating the schema): https://docs.rs/serde-reflection/latest/serde_reflection/ It's good to see their Serde Data Model types match fairly 1:1 with what we came up with. Pros: It doesn't require a second derive This issue is more about what to do with that schema, but I should probably review their "Features and Limitations", as we will likely have similar constraints. |
I'm not sure if the following thoughts are related, but here's what I faced recently: We accept client input as JSON, but we store in postcard's format for space efficiency and performance.What surprised me at first was that, even though This also means that we can only support the lowest common set of features for the formats we support (that's just json and postcard right now). Offering an We can't make our schema evolve unless we use enums for everythingGiven there's no support for That's because we store our data in a persistent store and retrieving it (deserializing it in the process) is not possible if the types have changed in any way. Can't append new fields to a structThis is the big reason why we need to version things: we can't add new fields, even if they have default values. I'm looking at alternatives formats that would at least allow that. For example For context, we're accepting client data as JSON, we're exchanging data between nodes and storing data as |
Hey @jeromegn - thanks for the input, and particularly reminding me about the limitations around serde tunables like I don't know if I have any answers yet, but these are really good data points, so I appreciate it! At the moment with the "schema on the side" approach, I do expect "deserializing with schema" to be slower than "deserializing without schema", just because it has to do more. I have no idea the order of magnitude increase tho. For use cases in a database, it might be possible to do an "upgrade" approach, either as part of a migration or "update on access" to switch to the newest schema when you run into old schemas, but I don't have a great story around that yet. Mostly I don't want to make postcard "worse" for existing users who are fine with the current limitations, which means I'm sorta limited to either doing something "on the side", or to make a different library "inspired" by postcard which is more flexible, at some perf/size cost. This would bring it more in line with things like cbor or protobuf. |
LoRaWan If you're not familiar with lorawan and want to know the more about it, TTN has great docs. For this discussion all you really need to know is that data rates are low (~1-22kb/s), and devices communicate directly and exclusively with a gateway. Currently, packets are either comply with the CayenneLPP spec, or are hand crafted on the device, and hand parsed by a "codec" on the gateway. The codec is, per that spec, written in JS. Cayenne is pretty nice, but if your application doesn't fit into it's mold then the fallback of hand crafting packets is pretty dire. I see two ways to make this situation better. The first would be to extend the gateway to allow webasm binaries in addition to JS. This would be similar to how JS is used now, except you could use postcard as it currently exists and get around hand-crafting packets. But, that's an aside as far as this issue is concerned. The other solution would be something like a self describing postcard. The gateway would still need to be extended. That's not a big problem though, because the de-facto[1] standard gateway, ChirpStack was recently rewritten in rust. Making adding this feature (once it exists in postcard) nearly trivial. This is similar to Cayenne, but much more flexible. The big downside is that, at least currently, the codec stores no state. The gateway does store state for each device though, so it might be possible to store the description there. For this application, additional size cost is of the concern, but the perf cost is negligible. Lorawan devices don't typically uplink data more often than once a minute. [1] The two biggest public networks are TTN and Helium. TTN uses chirpstack, and Helium is planning on moving to chirpstack. AFAIK, most ISPs that deploy a lorawan network also use chirpstack, but I don't know if that's always true. |
A usecase we have in practice that was not mentioned here is that we also search for a way to just "hash" the schema in a cryptographic manner, so we don't necessarily want to understand the schema. In this way we can give postcard data a stronger typing and assert that postcard data has the semantics we expect. JSON is more advantageous in this sense because named fields give a little bit more guarantees towards the semantic of the data. Thanks for the hint about the I believe that this ultimately requires general support (not wanting to say "serde support"). But I believe serde should provide a way to walk across the AST of a serde structure. Protocol implementation can then provide a schema generator that infers a postcard schema, json schema or (for our usecase) a schema "hash". I recon that in combination with I understand that such a thing has not been accepted into serde because it is hard to get right. It could be a strategy to align this crates |
@therealfrauholle for reference, the experimental schema capabilities of postcard here: https://docs.rs/postcard/latest/postcard/experimental/schema/index.html, DOES support edit: you could send this hash as part of the "header" or "ID" of a message type to ensure coherence. The largest reason this hasn't stabilized yet is that I haven't decided whether the schema should hash for JUST "structural" typing or "structural AND nominal" typing. As an example: // A - base case
struct Example {
temp: f32,
humidity: f32,
}
// B - Type name changed
struct ExaMple {
temp: f32,
humidity: f32,
}
// C - fields reordered, but type sequence still the same
struct Example {
humidity: f32,
temp: f32,
}
// D - one field renamed, no semantic or structural change
struct Example {
temperature: f32,
humidity: f32,
} Which of these structs should be "the same schema"? If we JUST use structural typing, they are ALL the same (basically: If we only look at nominal typing of the FIELDS, A + B would be equivalent, but none of the others are. If we look at ALL nominal typing, NONE would be equivalent. Chances are, the best option is to pick "nominal and structural of types and fields" as the default, but document how someone could implement something different. |
TL;DR: I want to hear from you if you have ever needed postcard to do something ("something" is defined below) that it doesn't today.
Background
Postcard is generally very efficient on the wire, partly because it is not "self describing" - the messages themselves give no hint or expectation on how they are to be deserialized.
In optimal cases, where both sides of the communication are Rust, and use the same
serde
representation/type definition (e.g. - they share a common "types" crate that defines the wire types), this is great, and both sides understand each other.However there are some sub-optimal cases:
Today
I'm currently looking into ways it would be possible to augment postcard data with schema information, so the "sub-optimal" cases listed above could be handled.
To be clear - postcard's core format will not change.
Ideally this would be an "optional add-on" - something you can use contextually, sometimes even after the fact, to enable those suboptimal cases, or as "extra metadata" you could send either with every message, or "on first connection", or "on request", or whatever makes sense for your link budget.
If that isn't possible, I'd probably look into making this a second crate "inspired by postcard", which can be used when a little more overhead is worth the flexibility.
That being said - I'm trying not to focus as much on "how" to make this possible yet, and instead looking at "what is needed". Discussions of "how to do this" are out of scope for this issue's comments.
What I need
Instead of blindly implementing what I THINK would be useful (to me, at least), I'd like to hear from folks who have run into the sub-optimal cases above, or even ones that I didn't list above. This will help me make sure whatever I end up researching/implementing covers the actual needs/gaps in today's postcard.
Ideally, I'd like to keep this discussion public, but I am also willing to have a private chat via email or matrix (contact info on my profile, or ask here), and I am willing to sign/provide an MNDA to discuss any proprietary usage that might benefit from changes like the ones proposed.
Thank you!
The text was updated successfully, but these errors were encountered: