Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

format_args! is slow #76490

Open
workingjubilee opened this issue Sep 8, 2020 · 6 comments
Open

format_args! is slow #76490

workingjubilee opened this issue Sep 8, 2020 · 6 comments
Labels
A-fmt Area: `std::fmt` C-feature-request Category: A feature request, i.e: not implemented / a PR. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.

Comments

@workingjubilee
Copy link
Member

Like molasses in the Antarctic.

As a consequence, so is any method which depends on its Arguments, like {fmt, io}::Write::write_fmt. The microbenchmarks in this issue about write!'s speed demonstrate that merely running the same arguments through format_args! and then write_fmt, even if it's just a plain string literal without any formatting required, produces a massive slowdown next to just feeding the same through fmt::Write::write_str or io::Write::write_all.

Unfortunately, write!, format!, println!, and other such macros are a common feature of fluent Rust code. Rust promises a lot of zero-cost abstractions, and on a scale from "even better than you could handwrite the asm" to "technically, booting an entire virtual machine is zero cost if you define the expression as booting a virtual machine..." this is currently "not very". Validating and formatting strings correctly can be surprisingly complex, which is going to increase with features like implicit named arguments in format_args!, so we can expect increasing speed here may be challenging. However, this should be possible, even if it might require extensive redesign.

Multiple Problems, Multiple Solutions

  • format_args!'s internal machinery in the Rust compiler can likely be improved.
  • Consumers of Arguments, such as fmt::{format, write} and {fmt, io}::Write::write_fmt, can be reviewed for runtime performance.
  • Macros downstream of format_args! often are invoked to do something simple that does not require extensive formatting and can use the pattern-matching feature of macro_rules! to special-case simple patterns to side-step format_args! when it's not needed. This will increase the complexity of those macros and risks breakage if done incautiously, but could be a big gain in itself.

Unfortunately some of these cases may run up against complex situations with types, trait bounds, and method resolutions, because e.g. both io::Write and fmt::Write both exist and write! needs to "serve" both. Fortunately, this is exactly the sort of thing that can benefit from the recent advances in const generics, since it's a lot of compile-time evaluation that could benefit from interacting with types (as opposed to being purely syntactic like macros), and in the future generic associated types and specialization may be able to minimize breakage from type issues as those features come online, so it's a good time to begin reviewing this code.

Related issues and PRs

@jyn514 jyn514 added C-feature-request Category: A feature request, i.e: not implemented / a PR. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 8, 2020
@Mark-Simulacrum Mark-Simulacrum added T-libs Relevant to the library team, which will review and decide on the PR/issue. and removed T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Sep 8, 2020
@Mark-Simulacrum
Copy link
Member

Note that the formatting infrastructure in core::fmt is intentionally not fast, as it optimizes for code size over speed. There are alternatives, e.g., https://github.com/japaric/ufmt which is smaller/faster and makes some different tradeoffs.

I don't know that a blanket issue like this is useful -- I suspect the overall API cannot change at this point, but individual improvements can be, of course, discussed in T-compiler (as this is a libs impl, not T-libs, concern).

@jonas-schievink
Copy link
Contributor

It's code size is also notoriously poor for embedded systems fwiw

@Mark-Simulacrum
Copy link
Member

It's true that it may not meet the code size goal well either - I do think we should try and go for size over speed in general, though the two are not always mutually exclusive.

@workingjubilee
Copy link
Member Author

workingjubilee commented Sep 9, 2020

Are there other alternatives like ufmt primarily for embedded use?
Size is cache and cache is speed, or rather the not needing it. It is probably the case that many optimizations for speed will help reduce overall size as well (and vice versa), and Arguments itself is sequestered from instantiation or introspection and versioned internally. It's not as obscured as a nameless type, but it is likely easy to change many subtle particulars about it without breaking major APIs.

Other crates of interest:

@jonas-schievink
Copy link
Contributor

Are there other alternatives like ufmt primarily for embedded use?

We've also recently written https://github.com/knurling-rs/defmt, which does the formatting on the host instead of the device through liberal use of the forbidden arts. It is not compatible with the core::fmt syntax though, and can only be used for logging (since the device can't actually use the formatted data).

@mqudsi
Copy link
Contributor

mqudsi commented Aug 10, 2022

It's annoying that manually repeatedly writing to a Write target is actually faster than issuing a single call to both format-and-write to the destination.

There are so many issues on fmt performance that I'm not sure posting this here is going to even be seen, but I wanted to share same resources (while not being directly relevant due to the world of difference between how a managed and GC'd language like C# handles formatting vs how rust does, may still prove useful). These writeups on improvements to formatting and handling of interpolated strings in C# 10 and 11 are good reads (and contain some benchmarks that contain both cycle count and allocation numbers). There's a world of resources on this topic in the .NET GitHub repo that I've read in the past (for reasons having nothing to do with rust) that involve the trade offs of checking for and special casing constant patterns, patterns with single expressions, patterns that can be transformed directly to string concatenation, patterns joined by the same constant character (i.e. such that they can be formed by calling string.Join(..), the C# equivalent to slice::join in rust).

One cool thing C# does (that may (??) not be possible in rust without compiler magic because we don't denote interpolated strings via any special prefix the way C# uses $"this is a {var} string" with the leading $) is that string format evaluation (including the evaluation of any parameters/expressions injected into the string as parameters) can be deferred or even skipped altogether without any macro magic if the function taking the interpolated string specifies that it expects an interpolated string rather than a plain string. That allows for coalescing repeated/nested calls to fmt and lets you implement the equivalent of debug!("value: {}, calculate_something()); where evaluation of calculate_something() is skipped altogether if not running at or above a certain log level -- but without using a macro at all; instead the interpolated string and its arguments/stack are passed as-is, still unevaluated, to the called function.

Anyway, I wasn't going anywhere in particular with this but just wanted to put down these thoughts and relevant links for posterity (and for myself to be able to find in the future). There's a lot of prior art out there in the open source community with some cool ideas, even if the actual implementation and all the factors that weigh into such a decision are something very niche and specific to each language and its domain details.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Area: `std::fmt` C-feature-request Category: A feature request, i.e: not implemented / a PR. I-slow Issue: Problems and improvements with respect to performance of generated code. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

5 participants