Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow struct deserialization to ignore unknown fields #44

Closed
erickt opened this issue Mar 20, 2015 · 18 comments
Closed

Allow struct deserialization to ignore unknown fields #44

erickt opened this issue Mar 20, 2015 · 18 comments
Milestone

Comments

@erickt
Copy link
Member

erickt commented Mar 20, 2015

Right now #[derive(Deserialize)] on structs and struct variants raises an error if it's passed an unknown field. We should add an annotation that allows users to just skip these fields.

@Byron
Copy link
Contributor

Byron commented Apr 26, 2015

To me it is highly desirable to allow the serde-client to learn about deserialization/serialization issues. An example I have is a client communicating to a server. The server may at some point start returning fields in its json-encoded stream that are now unknown to the client. The latter should be able to choose how to handle this. Some application might want to fail outright, others might just want to warn about the data-loss. In any case, this kind of information should not be withheld from the serde-client, as it would prevent truly safe client implementations. After all, Rust is safe and fast, in that order, so are Rust programs.

Possible Solutions

Options for the Deserialization/Serialization

It feels like a clean solution to provide an options structure to all functions that perform serialization or deserialization. That way, one can configure the operation as desired. Doing so would help other issues as well, like #65 to help configure 'pretty printing'.

Special Struct Fields

Maybe one could add a struct field which is annotated with something like [serde(catch_unknown)], which will contain all field names that were skipped. That way, a client who receives values from a server may at least report that some fields were added, to possibly trigger a code-update to contain these fields natively in future.

The obvious downside of this is that structures are artificially enlarged, and it feels like orthogonal concepts are mixed here (i.e. the struct has nothing to do with the 'serde' operation, and should not be forced to keep its results).

Example

#[derive(Deserialize)]
struct Foo {
    x: i64,
    #[serde(catch_unknown)]
    __unknown_fields__: Vec<String>,
}

Possible Issues

A field with the proposed custom annotation must not deserialize, nor serialize. Also it must not clash with an existing field within the 'to-be-deserialized' json data. Possibly there are other intricacies that I don't see right now.

@oli-obk
Copy link
Member

oli-obk commented May 8, 2015

toml-rs does this by filling a toml structure with the left-over data. The same could be done for json and xml, since both have a value.

@kirillkh
Copy link

Is there any kind of workaround at this time? For example,
a) deseralize JSON as a map of maps
b) filter out the keys that are not present in the structs
c) then serialize back to string
d) deseralize as struct

I have no clue how to do (b). If this is a viable approach (ignoring the performance, of course), can anyone write the code to do this?

Or is there some other way to hack into the mechanism without complicating things too much?

@oli-obk
Copy link
Member

oli-obk commented Aug 21, 2015

@kirillkh it's probably not that hard to implement directly in serde. You need to modify all the unknown_field functions to not error when some flag is set in the deserializer.

@kirillkh
Copy link

@oli-obk it's not immediately clear how to change this in serde, e.g. line 832 in serde_codegen/src/de.rs

@oli-obk
Copy link
Member

oli-obk commented Aug 22, 2015

the question is whether to solve this for json or or for specific types... If it should be for the types, then you need to add a new serde-attribute and not call unknown_field at the location you showed

@kirillkh
Copy link

This sounds more like a patch than a work around.

@softprops
Copy link
Contributor

+1 for making it a default to have deserialization not fail for unmapped fields for a few reasons.

  1. It's the default behavior of rustc serialize as well as many (de)serializing libraries. This is a behavior that users expect.

  2. A client of a service may intentionally chose to not capture fields during deserialization the client application doesn't care to allocate the extra memory to pass around. without an alternative, it's an unavoidable tax clients have to pay.

  3. It's surprising user experience to have working code break with semver(sioned) services honoring contracts. It shouldn't be a breaking change for a remote service to add fields without changing fields clients have mapped.

@teburd
Copy link

teburd commented Oct 7, 2015

I'm choosing not to capture one of the bajillion fields with my http json client I'm building out at the moment and ran in to this, kind of an annoying issue for me as I'm not really interested in filling out hundreds of possible fields for what I'm doing. For now I can probably hack serde to ignore unknown, I think in the future though this probably should be a deserialization option

@little-arhat
Copy link

^ Same here, is there currently any low boilerplate way to achieve this?

@norcalli
Copy link

I would really like this feature.

@alexanderdean
Copy link

+1 this would be a great add.

@JohnHeitmann
Copy link
Contributor

I have a prototype solution to this working. It needs a few days of work before it's ready for submission. I'll hijack this issue for a mini design review:

#serde[skip_deserializing_unknown]
When on a container, causes any unrecognized fields encountered during deserialization to be silently ignored. This is only meaningful for deserializing from self-describing serialization formats like JSON. Non-self-describing formats like bincode will continue to error if there are extra serialized fields.

Implementation:

  1. Add attrs::ContainerAttrs to support attributes on containers. FieldAttrs could be hijacked to work (it's what my prototype does), but it looks super tacky.
  2. Plumb ContainerAttrs through serde_codegen's Deserialize functions
  3. Add skip_deserializing_unknown attribute (I'm not married to the name; plus, see below).
  4. In addition to the __fieldN fields on the codegen'ed __Field enum, also synthesize __Field::__ignore when skip_deserializing_unknown is true. In all matches on __Field where it errors in the fallthrough case, change it to silently eat __ignore.
  5. ... in particular, visit_map's value visiting should handle __ignore value visiting by coercing the visit() result into one that does the least possible amount of work. Currently my prototype coerces to serde_json::Value and then discards it. I'll make a lighter-weight generic consumer before submission if I can't find one pre-built.

Some formats might want to know we're skipping data (for example a format with length prefixes for entire maps/objects), but I'm keeping it simple for now and adding no new public API.

On the one hand, I don't want to muddy up this submission by advocating that skipping unknown should be the default. On the other hand, I plan on advocating for that afterwards, and there is a small way that will affect this first round: how do I allow for potential skip_deserializing_unknown negation in the future? As far as I can see there are two alternatives:

  1. Make it take a boolean: #[serde(skip_deserializing_unknown=false)]. Simple, but discordant with the other attrs.
  2. Introduce an error_deserializing_unknown attr, and allow at most one of {error_deserializing_unknown, skip_deserializing_unknown} to be specified.

I think I lean towards 1, and adding support for that syntax to the other boolean attrs.

@softprops
Copy link
Contributor

+1 @JohnHeitmann I still can think of a case where we'd want this on as the default. I realized that making it a default is kind of a breaking changing but I can't think of who would want to depend on on this behavior as a default.They'd currently be a in a very fragile state anyway.

I'd rather flip the use of your utility to turn on deserialization errors when new fields are added to structures existing clients are unaware of.

@anguslees
Copy link

Re the name: I'd suggest simply "skip_unknown" (or similar for inverted
sense) since afaics the "deserialisation" part is implied (and lengthy).

On Wed, 6 Jan 2016, 12:11 AM doug tangren notifications@github.com wrote:

+1 @JohnHeitmann https://github.com/JohnHeitmann I still can think of a
case where we'd want this on as the default. I realized that making it a
default is kind of a breaking changing but I can't think of who would want
to depend on on this behavior as a default.They'd currently be a in a very
fragile state anyway.

I'd rather flip the use of your utility to turn on deserialization
errors when new fields are added to structures existing clients are unaware
of.


Reply to this email directly or view it on GitHub
#44 (comment).

Message protected by MailGuard: e-mail anti-virus, anti-spam and content
filtering.
http://www.mailguard.com.au/mg

Report this message as spam
https://console.mailguard.com.au/ras/1Nzqq2xa7s/3LHHWoZMk56isao94qjy56/6.077

@softprops
Copy link
Contributor

@JohnHeitmann is there a repo I could clone to build your changes to test it out. This gh issue is the only thing holding me back from switching all my rustc serialize deps over to serde deps so I'm excited someone is looking into a fix.

@JohnHeitmann
Copy link
Contributor

@softprops https://github.com/JohnHeitmann/serde/tree/v0.6.x

Should be in a testable state.

@erickt
Copy link
Member Author

erickt commented Feb 16, 2016

This has been implemented in #201 for serde 0.7.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests