Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added experimental support for deserializer state #1327

Closed
wants to merge 32 commits into from

Conversation

mitsuhiko
Copy link
Contributor

@mitsuhiko mitsuhiko commented Jun 29, 2018

This implements the proposal in #1325. It's not great yet in performance most likely and i most likely will need to make this state feature optional but i'm generally happy with the concept behind it.

The basic idea is that some state can be associated with the deserializer that a Deserialize can access. Additionally a lot of these wrappers need to forward the state appropriately. Likewise the state is now stored with the internally buffered content as well as otherwise it's impossible to synchronize.

The implementation uses Rc in a vec internally keyed by TypeId. I'm planning on using this to pass path information and out of bounds metadata to wrapper types. The deserializer I'm using this with looks something like this:

impl<'de, T> Deserialize<'de> for Annotated<T>
where
    T: Deserialize<'de>
{
    fn deserialize<D: Deserializer<'de>>(deserializer: D) -> Result<Self, D::Error> {
        let path = deserializer.state().get::<Rc<Path>>().map(|x| x.to_string());
        let mut annotated: Self = match Maybe::deserialize(deserializer)? {
            Maybe::Valid(value) => Annotated::new(value),
            Maybe::Invalid(u) => Annotated::error(format!("unexpected {}", u.0)),
        };
        annotated.path = path;
        Ok(annotated)
    }
}

The state is injected by a wrapping deserializer:

let json = r#"{
    "event_id": "864ee97977bf43ac96d74f7486d138ab",
    "breadcrumbs": {"values":[]}
}"#;
let jd = &mut serde_json::Deserializer::from_str(json);
let e: Annotated<Event> = wrapping_deserialize(jd).unwrap();
assert_eq!(e.value.unwrap().id.path, Some("event_id".to_string()));

Internally the State that is passed around is a copy-on-write like structure. It can be cloned and that either just bumps refcounts or copies a None over. I'm sure this can be optimized further if this approach seems sensible.

@Marwes
Copy link
Contributor

Marwes commented Jun 29, 2018

I have also needed stateful serialization/deserialization so this this seems interesting. Are you aware of my https://travis-ci.org/Marwes/serde_state? This seems potentially simpler though as opposed to the separate traits and therefore derives that my fork requires. The fact that visitors and deserializers must be updated to be state aware seems to be the main issue?

@mitsuhiko
Copy link
Contributor Author

@Marwes yeah. I saw serde_state and that triggered some of that investigation.

With regards to what needs to be updated: since state needs to be kept somewhere anything that involves buffering gets tricky. So if one wants to deserialize into serde-value/serde-json's value then this will obviously break stuff. The internal buffering in content I made state aware.

The only thing that generally needs updating are deserializers wrapping another deserializer. I don't think that's a massive change and with some version checking that implementation can be added only if the serde version depended on has support for that method.

I still don't like some parts of this design (in particular the TLS inspired with is not super great) and it does not pass the tests yet but the approach turned out easier than i thought.

@Marwes
Copy link
Contributor

Marwes commented Jun 29, 2018

The only thing that generally needs updating are deserializers wrapping another deserializer. I don't think that's a massive change and with some version checking that implementation can be added only if the serde version depended on has support for that method.

Hmm, wouldn't say SeqVisitor (used for Vec, BTreeSet etc) also need to be updated

struct SeqVisitor<T $(, $typaram)*> {
? It seems that the visitors generated by the derive needed changing here so I'd imagine the same thing is needed for manually implemented visitors?

@Marwes
Copy link
Contributor

Marwes commented Jun 29, 2018

I'd imagine that technically the value deserializers would also need to be state aware but those could perhaps be ignored since it is probably not very useful https://docs.serde.rs/serde/de/value/index.html

@mitsuhiko
Copy link
Contributor Author

State awareness of value deserializers is not very helpful since they have no place to store the state (on the value). Yes the visitors need updating as well. Depending on how many are out there it might become more or less annoying to update.

I did however so far not notice too many areas outside of serde itself that need updating (and I did not update all there yet, and there are still failing tests).

@mitsuhiko
Copy link
Contributor Author

@Marwes you're right there are more places that need state passing. Trying to hunt them all down but it's quite easy to screw up sadly :(

@mitsuhiko
Copy link
Contributor Author

I need to investigate this still so this is mostly a braindump, but I believe that the content deserializer temporarily passes through the value deserializer in a few places and that means state info is unavailable there.

@mitsuhiko
Copy link
Contributor Author

The last version of this PR now also introduces a replace_state(State) method on the deserializer trait. The default implementation panics if a non empty state is set. This makes it easier to find deserializers which do not correctly implement state forwarding.

Additionally the value deserializers now all can carry state. One problem there is that some of these were Copy in the past which is obviously not going to work with attached state. This is really annoying because I do not see how I can make that part backwards compatible unless I duplicate all those deserializers but since they are created usually via the into_derserializer method that will still not entirely solve that issue.

@mitsuhiko
Copy link
Contributor Author

I just realized that replace_state drives this ad absurdum as then the deserializer needs to figure out when to ignore state replacement.

@mitsuhiko
Copy link
Contributor Author

I think other than it being very complex the biggest issue I foresee from this is IntoDeserializer gaining a into_deserializer_with_state implementation that panics if non empty state is set. It could alternatively silently just drop the state. However currently serde quite frequently dispatches through the value deserializers so losing state seems problematic.

To carry the state through in value I needed to make some types non Copy. Alternatively the deserializers could obviously be duplicated for stateless and stateful ones in the value module.

I'm planning on testing this for a few weeks now as it unblocks us but I probably want to see if I can simplify the design somehow.

jan-auer added 3 commits July 4, 2018 12:15
* master:
  Release 1.0.69
  Local inner macros
  Link to issue picker
  Factor out the getting help links
@dtolnay dtolnay added the wip label Aug 11, 2018
@dtolnay
Copy link
Member

dtolnay commented Aug 11, 2018

Please reopen a pull request when you'd like for me to take a look. Thanks!

@dtolnay dtolnay closed this Aug 11, 2018
@dtolnay dtolnay removed the wip label Aug 11, 2018
@mitsuhiko mitsuhiko deleted the feature/state branch August 11, 2018 20:40
@mitsuhiko mitsuhiko restored the feature/state branch August 11, 2018 20:40
@mitsuhiko
Copy link
Contributor Author

We are currently still using this fork but if time permits we will try some alternative approaches since it's unlikely we want to stick to a permafork of serde.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

4 participants