-
-
Notifications
You must be signed in to change notification settings - Fork 777
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Shrink derive output size #2250
Shrink derive output size #2250
Conversation
What the table headers mean? |
The headings are those used in https://perf.rust-lang.org/compare.html. The important one is the "% Change". |
For a struct like this: ``` struct Small { x: u32, y: u32, #[serde(skip_serializing_if = "Option::is_none")] maybe: Option<&'static str>, } ``` it will change the field count computation from this: ``` false as usize + 1 + 1 + if Option::is_none(&self.maybe) { 0 } else { 1 } ``` to this: ``` 2usize + if Option::is_none(&self.maybe) { 0 } else { 1 } ``` For fields with many fields, the difference will be bigger, avoiding long sequences of `1 + 1 + 1 + ...`. This is a small compile-time win.
756e173
to
184c130
Compare
This turns two matches into one, for each field in a struct.
This turns two matches into one, for each field in a struct.
This turns two matches into one, for each field in a struct.
184c130
to
f8cc37c
Compare
/// calling `Error::invalid_length`. | ||
/// | ||
/// This method exists purely to reduce the size of the generated code, | ||
/// reducing compile times. `SeqAccess` implementations should not override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason to implement is specifically as a method rather than a free function? Using a free function gives the desired no-overriding guarantee, and method call syntax is not used in derived code anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A method was easier, being more similar to the original function. But it's not fundamental.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AnthonyMikh here, that this would be better as free functions in serde::__private::de
module
Any thoughts on whether this change is a good one? There are 32 positive emoji reactions on the description. I'm happy to make adjustments (e.g. moving that method to a free function), but it's not yet clear to me if this change is wanted by the serde developers. Thanks! |
Summary: the wins in terms of size reduction in a "real world" project that I tested are unfortunately very small. I was very happy to see this PR as I recently found out that generated serde code makes up a big part of the final binary of a project I'm working on as part of my day job. It's a web application with Rust backend that performs various tasks and uses serde derive for lots of different stuff. (It's even open source, if anyone's interested: link). So I thought it might be a valuable data point to see how this PR performs with a "production project". I know that this PR is mainly motivated by compile times, but I thought checking it for binary output size is interesting too. Unfortunately, the results were not as great as I has hoped for. I tested serde 1.0.139 vs. this branch (f8cc37c), using [profile.release]
debug = 1
codegen-units = 1
lto = "thin"
[profile.release.package."*"]
debug = false We need basic debug info to better debug errors in production. And it allows me to check per symbol size! Also, some custom But first, the final file sizes (the
With this patch, the binary including debug info is even larger than with 1.0.139. That was unexpected! At least the stripped one shows a small win. I tried only checking the size of serde functions. I did that via this command chain:
Here is the result matrix for the two variables
That's a roughly 13KiB reduction = 98.3% of the original size. A win is a win, but I had hoped for more and I feel like there are a lot of size optimization that could still be done. I also looked at the biggest functions to see what changed there. 20 largest serde functions
A few things:
Ok, I hope this investigation is useful at all. As someone with 800KiB of serde functions in my release artifact, I'm very interested in improvements in this area ^_^ |
Thank you Lukas, your writeup is enormously helpful in evaluating this PR. One thing I would want to look into is whether any of the "small serde" redesigns (miniserde, deser, …) are viable for your use case, or even do at all better on code size. In response to @nnethercote's last comment: it is not unwanted, but it is unsolicited, and an unsolicited PR (i.e. not in response to a "help wanted" label or an issue discussion which I participate in) is always less likely to appear at the top of my priorities. The way to get a PR reviewed is by being the most impactful thing for me to dedicate time to according to my opinion, and my sense has been this is not that, since there are multiple other serde PRs that have been queued longer and which I would rather land over this. Recently though my biggest chunk of time is going to a major serde_yaml revamp since that crate has been neglected much worse and longer than this crate. You called out how impressive the emoji reactions to this PR are, but that is not a signal I look at in deciding where to invest time. It is predominantly a measure of social media reach over anything else, and prioritizing PRs from whoever has the highest twitter followers, rather than whichever is the most impactful contribution to the project, is not a productive direction I think. |
I made some tests with
Notice how the Still, I think this is worth pursuing. And I'd like to thank nnethercote for putting in the effort to optimize not only the Rust compiler, but also many of the popular third-party crate. |
@LukasKalbertodt, @lnicola: thanks for the data. Your results are expected, though I see now this wasn't clear from my original description. The change I made shrinks the size of the Rust code that the front end has to process, but the new functions are marked For binary size, my measurements show that AFAICT, the generated |
AFAIK, the visitor is driven by the deserializer which can mean that it is driven by the order of the fields in the wire format. Hence it needs to be able to handle any order to be able to deserialize the output of systems other than Serde. |
From the other hand a new struct attribute could be added, for example, |
Something that did likely also help would be an attribute to only allow structs and enums with named fields and variants (for json, toml, ...) or structs and enums with index based fields and variants (for bincode, ...), while disallowing the other for the type it is applied to. In the common case where you only need to serialize/deserialize a single format, that would cut the size of the derived Deserialize impl in half I think. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dtolnay, in light of recent speedup attempts of serde_derive
shouldn't this PR be merged (after resolving conflicts)? At a minimum, it gives a more readable output of cargo expand
and I do not see objective disadvantages.
/// calling `Error::invalid_length`. | ||
/// | ||
/// This method exists purely to reduce the size of the generated code, | ||
/// reducing compile times. `SeqAccess` implementations should not override |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @AnthonyMikh here, that this would be better as free functions in serde::__private::de
module
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not on board with this approach; I am confident that this adds more bloat in the long term than it would remove in the short term, along with other downsides that have not been identified above.
I have planned to prototype a counterproposal but didn't get to it so far.
-
The new factoring here is overly specific to the current set of patterns used by our generated visit_seq and visit_map implementations, which are not good patterns and I have already intended to change them (clarified below).
-
I am not on board with adding new private functions, neither
doc(hidden)
on the public deserializer API traits as done here, nor in__private
as brought up by some commenters above, for vanilla types that don't rely on the various buffering internals. This introduces friction for everyone trying to understand how serde's deserializer API is intended to be used. This PR makes it that much harder to use serde_derive-generated code as a starting point for adapting to fit someone's custom need. I do not agree that this "gives a more readable output of cargo expand". It gives output that is more compact, but wrong, in the sense that you are not allowed to use it. Your code will break if you use it. Previously cargo expand would give you code that you could more easily adapt and use. As a consequence of hygiene it must access things likeResult
andOk
through a private re-export in serde, but all the types and traits and functions used are public API available for the user in their handwritten impls. -
I am also not on board with just making the functions added here public to avoid hidden functions. Related to the first bullet, these will become obsolete, and they would be permanent API bloat. It also exacerbates the issue that the API of serde being challenging is as much of an impediment as compile time instruction counts are.
Regarding the reliance on private functions, including for buffering: one of my projects is to make all private functions relied on by serde_derive's generated code into public API available to users. The only possible way this is viable is if they can be versioned independently of the serde crate. If someone is interested in making progress with this, I would recommend picking up rust-lang/rust#54363 (comment).
Regarding how to do visit_map better: among other things, it definitely should not be producing a separate __Field
type for every data structure you use derive(Deserialize)
on and instantiating the Deserializer
's methods with it. There should be one single non-generic DeserializeSeed
type that is used for deserializing all fields of all data structures, because none of that logic is different depending on whose fields they are.
-
This would reduce the number of instantiations of
Deserializer
methods by 50%, without requiring data formats to do unsafe manual erasing like Optimize compile time of Deserializer trait methods json#687. -
It would also eliminate the code currently reproduced for visit_str/visit_bytes/visit_u64 of all those distinct
__Field
enums, which is additional savings on top of the deserializer instantiations. -
Finally it's actually an opportunity to improve field deserialization performance too, since the one reusable
DeserializeSeed
would be able to implement smarter strategies that are not feasible today, such as optimizing for the common case where fields in the input appear in the same order as fields in the Rust data structure. This is an optimization that exists in Thrift that has been extremely significant. Instead of comparing incoming fields against data structure fields top to bottom every time, there is state to remember and resume search from the most recently matched field. For a large data structure with n fields this is (n/2)× speedup in the overwhelmingly common case (quadratic time becomes linear time) and only 2× slowdown in the uncommon worst case of fields encountered in exactly reverse order, and identical performance for randomly ordered fields.
Further in the direction of non-generic __Field
is the idea that Deserialize
impls should be done by handing off a kind of type-erased bytecode for the deserializer to evaluate. If serde were to end up in the standard library ever, this is the form I would be most comfortable with it taking. This is the ultimate non-generic DeserializeSeed
, essentially. Instead of a __Field
-specific seed that holds a static array of field names and can produce field indices from them during deserialization, which separate code still needs to match on to deal with field values, this could do more of the work non-generically by also representing a callback to run per field, location to write the result, what checks to perform (duplicate fields allowed or not), etc. In this approach no control flow needs to be compiled individually per derived Deserialize impl, and no Deserializer methods need to be instantiated, so Deserialize impls become free — basically a single static
.
Thank you anyway for the PR and the attention to this issue.
Something like this? |
This PR has a series of commits that shrink the size of the code generated for
derive(Serialize, Deserialize)
, particularly for large structs.Here are some instruction count results for some synthetic benchmarks.
big{1,2,3}00
each contain a single struct with {100,200,300}u32
fields that is annotated withderive(Serialize, Deserialize)
.derive-serde
contains a variety of structs and enums. It is identical to https://github.com/rust-lang/rust/blob/master/src/test/ui/deriving/deriving-all-codegen.rs except with the builtin trait derives replaced withderive(Serialize, Deserialize)
.