Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Propose implicit named arguments for formatting macros #2795

Merged

Conversation

davidhewitt
Copy link
Contributor

@davidhewitt davidhewitt commented Oct 28, 2019

Rendered

Proposes the ability to pass implicit named arguments to formatting macros, for example:

let person = "Charlie";
print!("Hello, {person}!");    // implicit named argument `person`

A follow up to internals discussion https://internals.rust-lang.org/t/println-use-named-arguments-from-scope/10633 and WIP PR rust-lang/rust#65338

@Centril Centril added T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC. A-fmt Proposals relating to std::fmt and formatting macros. A-macros-libstd Proposals that introduce new standard library macros labels Oct 28, 2019
@kennytm
Copy link
Member

kennytm commented Oct 28, 2019

An alternative syntax is to clearly distinguish the interpolated variable from named arguments,

println!("Hello, {(person)}!");
//                ^      ^

and it allows all expressions can be naturally used

format_args!("Hello {(get_person())}")
//                   ^~~~~~~~~~~~~^

println!(r#"{("Hello"):~^33}\n{(self.person):~^33}"#);

println!("Hello, {(if self.foo { &self.person } else { &self.other_person })}");

println!(
    "variable-person = {(person1)}, named-person = {person2}, positional-person = {}",
    person3,
    person2=person2,
);

@CAD97
Copy link

CAD97 commented Oct 28, 2019

and it allows all expressions can be naturally used

First: this is backwards compatible to add on top of the "single Ident special case". Most languages with string interpolation have a special case for single Ident, so we wouldn't be out of line to make that easier. It's also by far the most common case.

Second: allowing arbitrary expressions opens a whole separate can of worms around nesting. Just as a few edge case examples: "{("a")}", "{(\"a\")}", r#"{(")}")}"#. You can make rules for how these resolve (it'd have to be "standard string token first, then work on the content of the string literal"), but this adds a huge number of edge cases that just plain don't exist for the ""{name}" is a shortcut for "{name}", name=name" simple case.

So we definitely should stick to the simple, fairly obvious case that eliminates that stutter, and worry about generalizing it to more expressions later if desired.

@kennytm
Copy link
Member

kennytm commented Oct 28, 2019

@CAD97

Most languages with string interpolation have a special case for single Ident

Half of the languages with string interpolation have special case for single ident.

Firstly, many of them do support interpolating arbitrary expressions, without any special case for single idents:

Furthermore, when the string interpolation syntax is special-cased for single idents, it is often because there is no closing sigil, which is not the case for Rust

  • Kotlin, Groovy, PHP: "$ident ${expr}"
  • Ruby: "#@xxx #@@yyy #$zzz #{expr}" (note: local variables (no sigils) won't work, must use "#{ident}")
  • Scala: s"$ident ${expr}"

Finally, string interpolation is a feature built into the language itself, which often live independently from the "format" facility, e.g. in Python you write f"{ident}" or "{}".format(ident), but "{ident}".format() is meaningless. This RFC fuses string formatting and interpolation, so the experiences from other languages are not reliable, e.g. they can't express format!("{a} {b}", b=123).

And hence I disagree with this criticism:

Second: allowing arbitrary expressions opens a whole separate can of worms around nesting. Just as a few edge case examples: "{("a")}", "{(\"a\")}", r#"{(")}")}"#. You can make rules for how these resolve (it'd have to be "standard string token first, then work on the content of the string literal"), but this adds a huge number of edge cases that just plain don't exist for the ""{name}" is a shortcut for "{name}", name=name" simple case.

"{("a")}" is simply syntax error. Remember we're talking about a macro format!("...") here, the format string literal is not special in the Rust tokenizer/parser, it is not an interpolated string. To the proc macro format_args!(), the format strings "{(\")}\")}" and r#"{(")}")}"# are equivalent. The format string parser could,

  • If we see {,
    • If we see {, unescape as {
    • If we see an integer, it's a positional argument
    • If we see an identifier, it's a named argument
    • If we see : or }, it's an (implicit) positional argument
    • If we see (, step back one character then parse a Rust token tree
    • Otherwise, emit an error

I don't see "huge number of edge cases" syntactically. IMO it has less edge cases than the current RFC because when you see an ident, you don't need to determine whether {ident} is a named argument or interpolated variable!

And to clarify, introducing the "stutter" is the primary purpose of my comment above even if we reject interpolating arbitrary expressions. It's to ensure {ident} is always a named argument, which leads to

this is backwards compatible to add on top of the "single Ident special case".

Indeed we accepted this RFC as-is, {ident:?} can also be forward extended to {(expr):?}. But you'll need to consider what to do with {expr:?} — either reject all non-ident expressions (thus {self.prop:?} is invalid), or accept some expressions (which leads to "why is my particular expression not accepted?" and true edge cases involving numbers and : and {). So I'd avoid even allowing that in the first place.


It's also by far the most common case.

Yes. But stuff like self.stuff or path.display() is also quite common. Grepping format!, panic! and (e)print(ln)! (including docs) from https://github.com/BurntSushi/ripgrep we get

Type Count
Single ident (e.g. format!("error: {}", err)) 147
Properties (e.g. format!("value {}", self.x.y.z) 22
Other expressions (e.g. format!("{}", xyz[0].replace("\n", "\n\n"))) 61
Expression with return branch (e.g. println!("{}", foo()?);) 2
Expression involving macros (e.g. format!("{} {}", $e, crate_version!())) 3

So non-single-ident covers 40% of all usages, which I won't brush them off simply as can-of-worms opener.

Also, when these expressions are interpolated, they are quite readable unlike the RFC's constructed example.

https://github.com/BurntSushi/ripgrep/blob/8892bf648cfec111e6e7ddd9f30e932b0371db68/tests/util.rs#L403-L414

// Current
            panic!("\n\n==========\n\
                    command failed but expected success!\
                    {}\
                    \n\ncommand: {:?}\
                    \ncwd: {}\
                    \n\nstatus: {}\
                    \n\nstdout: {}\
                    \n\nstderr: {}\
                    \n\n==========\n",
                   suggest, self.cmd, self.dir.dir.display(), o.status,
                   String::from_utf8_lossy(&o.stdout),
                   String::from_utf8_lossy(&o.stderr));

// Interpolated
            panic!("\n\n==========\n\
                    command failed but expected success!\
                    {(suggest)}\
                    \n\ncommand: {(self.cmd):?}\
                    \ncwd: {(self.dir.dir.display())}\
                    \n\nstatus: {(o.status)}\
                    \n\nstdout: {(String::from_utf8_lossy(&o.stdout))}\
                    \n\nstderr: {(String::from_utf8_lossy(&o.stdout))}\
                    \n\n==========\n");

https://github.com/BurntSushi/ripgrep/blob/8892bf648cfec111e6e7ddd9f30e932b0371db68/ignore/src/lib.rs#L287-L299

// Current
                write!(f, "{}", msgs.join("\n"))
...
                write!(f, "File system loop found: \
                           {} points to an ancestor {}",
                          child.display(), ancestor.display())

// Interpolated
                write!(f, r#"{(msgs.join("\n"))}"#)
...
                write!(f, "File system loop found: \
                           {(child.display())} points to an ancestor {(ancestor.display())}")

https://github.com/BurntSushi/ripgrep/blob/8892bf648cfec111e6e7ddd9f30e932b0371db68/src/search.rs#L244-L264

// Current
        write!(
            self.get_mut(),
            "
{matches} matches
{lines} matched lines
{searches_with_match} files contained matches
{searches} files searched
{bytes_printed} bytes printed
{bytes_searched} bytes searched
{search_time:0.6} seconds spent searching
{process_time:0.6} seconds
",
            matches = stats.matches(),
            lines = stats.matched_lines(),
            searches_with_match = stats.searches_with_match(),
            searches = stats.searches(),
            bytes_printed = stats.bytes_printed(),
            bytes_searched = stats.bytes_searched(),
            search_time = fractional_seconds(stats.elapsed()),
            process_time = fractional_seconds(total_duration)
        )

// Interpolated
        write!(
            self.get_mut(),
            "
{(stats.matches())} matches
{(stats.matched_lines())} matched lines
{(stats.searches_with_match())} files contained matches
{(stats.searches())} files searched
{(stats.bytes_printed())} bytes printed
{(stats.bytes_searched())} bytes searched
{(fractional_seconds(stats.elapsed())):0.6} seconds spent searching
{(fractional_seconds(total_duration)):0.6} seconds
")

@danielhenrymantilla
Copy link

danielhenrymantilla commented Oct 28, 2019

As a comparison point, there is a substantial difference between the mentioned ::fstrings crate, featuring deliberately limited interpolation (c.f., danielhenrymantilla/fstrings-rs#1 (comment)), and the ::ifmt crate, featuring "arbitrary" interpolation.

In the linked ::fstrings PR, extending fstrings interpolation to field access is being considered, but nothing more (e.g., no method calls a priori). Imho this leads to a sweet point in letting people use interpolation while preventing abuse of the feature.

@davidhewitt
Copy link
Contributor Author

I think @kennytm's interpolation proposal is reasonable, and I will add it to the alternatives section in the RFC shortly.

My initial observations about the proposed syntax for interpolation are that it's not actually any shorter than positional formatting arguments, although I would argue it reads easier (especially as the number of arguments scales):

println!("Hello, {(person)}!");
println!("Hello, {}!", person);

However, if the brackets are necessary to resolve parsing ambiguities then that is something we would have to accept. I do find iterpolation mechanisms convenient, though it can be difficult to judge how complex interpolations should be before they become hard-to-read / poor style.

I would like to contest this line though:

This RFC fuses string formatting and interpolation

The objective of this RFC is not to add interpolation to Rust's existing formatting macros. It is to add a shorthand to the macros to make them more ergonomic in a very common use case. I'd be glad to take a win on 60% of macro invocations even if this RFC doesn't improve the remaining 40%. This discussion appears to agree that interpolation can be added later in a backwards-compatible way (even if there are different arguments whether that would be desirable).

As pointed out by @petrochenkov in the WIP PR, we actually already have a similar shorthand in struct literal expressions for the special case of single identifiers:

struct Foo { bar: u8 }
let bar = 1u8;

let foo = Foo { bar: bar };
let foo = Foo { bar };        // This shorthand only accepts single identifiers

To make the original intention of the RFC stand out clearer, I will update the wording to state explicitly that adding interpolation to the macros was not the intended goal, as well as add the struct literal shorthand to the prior art section.

The RFC itself is not intended to be a way to sneak interpolation
into the formatting macros; the RFC author believes that they do
not need full interpolation support, although would not rule it
out if it was deemed desirable.

This update to the RFC text clarifies the distinction between
implicit named arguments and interpolation. It also adds a note
on prior art that Field Init Shorthand is an existing precedent
where language ergonomics have introduced a special case for
single identifiers.
Copy link

@danielhenrymantilla danielhenrymantilla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the nuances you've added.

Could you also state whether format!("{x}, {y}", y = 42) would be accepted or rejected? (it would avoid having let declarations used only for one "fstring" interpolation)

@davidhewitt
Copy link
Contributor Author

Could you also state whether format!("{x}, {y}", y = 42) would be accepted or rejected? (it would avoid having let declarations used only for one "fstring" interpolation)

I would expect it would be accepted: it seems perfectly valid to me that one named parameter can be implicit and the other explicit.

@Centril
Copy link
Contributor

Centril commented Oct 28, 2019

Also, when these expressions are interpolated, they are quite readable unlike the RFC's constructed example.

I think there's value in separating the computing of results and how they are rendered (so that the former can be separated into functions eventually). With the {binding} case, I think a good balance is struck where things don't get too intermingled.

Also, given that 60% of the cases in the data in question were for single-ident cases, it seems to me that {binding} is right to optimize for those cases whereas {{binding}} optimizes for the minority.

The rustc compiler has a lot of diagnostics so it is probably a better data set to dig into if you want more of those.

All in all, I think this RFC finds a well-struck balance.

The examples provided had incorrectly merged Scala and PHP.
Scala's string interpolation is written `s"$foo"` whereas PHP is written `"$foo"`
@tomwhoiscontrary
Copy link

Would this involve any changes to the language syntax, or can this be implemented entirely in a macro?

I realise that format_args! is not currently implemented as a true macro, but i think it's a good idea to maintain the fiction that it is, or at least could be. format_args! shouldn't do anything a user-written macro can't.

@davidhewitt
Copy link
Contributor Author

davidhewitt commented Oct 30, 2019

Would this involve any changes to the language syntax, or can this be implemented entirely in a macro?

The fstrings crate implements a PoC of this functionality using proc_macro (and proc_macro_hack, I believe). So this can indeed be implemented in a user-written macro.

Also, the test PR I wrote needed just a tiny adjustment to the internals of format_args! without any changes anywhere else.

text/0000-format-args-implicit-identifiers.md Outdated Show resolved Hide resolved
text/0000-format-args-implicit-identifiers.md Outdated Show resolved Hide resolved
text/0000-format-args-implicit-identifiers.md Show resolved Hide resolved
@joshtriplett
Copy link
Member

I'd love to have this. I find myself writing name=name, another_name=another_name all the time.

I fully support the rationale in this RFC to only support single identifiers, and defer any more complex interpolation. I do think we'll want dotted fields in the future, but I added a comment about one bit of complexity even that level of extended interpolation would produce.

@tmccombs
Copy link

tmccombs commented Nov 2, 2019

I have a couple of reasons to not support arbitrary expressions without require parenthesis ( "{(expr)}"):

  1. There is ambiguity between an integer expression and number indicating a positional argument
  2. There could potentially be ambiguity, or at least confusing with format arguments, since the colon is used both in rust syntax and as the delimiter before format arguments.

With thanks to Lonami, joshtriplett and tmccombs
@davidhewitt
Copy link
Contributor Author

I have a couple of reasons to not support arbitrary expressions without require parenthesis ("{(expr)}"):

Fully agree with your arguments. For anything beyond {ident}, I agree {(expr)} is much better than {expr}.

@danielhenrymantilla
Copy link

danielhenrymantilla commented Nov 3, 2019

A new version of ::fstrings has just been released, that supports interpolation of field accesses / dotted.paths (but not after the :, which is currently just left as is).

It can be useful for those wanting to get a feeling of what the ergonomic improvements this RFC and a future one could provide.

@tmccombs
Copy link

tmccombs commented Nov 3, 2019

Another possible alternative: No special case for identifiers, but allow arbitrary expressions within parenthesis. So, {ident} means, use the keyword argument named ident and {(ident)} means use the variable ident, and {(expr)} works for any expression.

@rfcbot
Copy link
Collaborator

rfcbot commented Dec 15, 2019

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

@davidhewitt
Copy link
Contributor Author

Very excited to see this given a chance! I'm happy to do the implementation work for this going forward.

@davidhewitt
Copy link
Contributor Author

davidhewitt commented Dec 17, 2019

Checklist for the tracking issue:

Steps:

  • Implementation for format_args! behind format_implicit_args feature (name to be bikeshed?)
  • Implement alternative solution for format_args!(concat!(...)) case that doesn't risk spurious macro hygiene
  • Implementation for panic!. Behind feature gate and / or edition switch (see unresolved question)
  • Stabilize (may depend on new edition)
  • Documentation

Unresolved Questions:

  • Solution for format_args!(concat!(...)) - perhaps try out a new unstable macro as per this comment
  • Final design of the panic! solution - perhaps based off this comment

@nikomatsakis
Copy link
Contributor

Huzzah! The @rust-lang/lang and @rust-lang/libs teams have decided to accept this RFC.

To track further discussion, subscribe to the tracking issue here:

rust-lang/rust#67984

@bstrie bstrie mentioned this pull request Jan 13, 2020
@UtherII UtherII mentioned this pull request Mar 6, 2020
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 2, 2020
…varkor

Add `format_args_capture` feature

This is the initial implementation PR for [RFC 2795](rust-lang/rfcs#2795).

Note that, as dicussed in the tracking issue (rust-lang#67984), the feature gate has been called `format_args_capture`.

Next up I guess I need to add documentation for this feature. I've not written any docs before for rustc / std so I would appreciate suggestions on where I should add docs.
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 4, 2020
…varkor

Add `format_args_capture` feature

This is the initial implementation PR for [RFC 2795](rust-lang/rfcs#2795).

Note that, as dicussed in the tracking issue (rust-lang#67984), the feature gate has been called `format_args_capture`.

Next up I guess I need to add documentation for this feature. I've not written any docs before for rustc / std so I would appreciate suggestions on where I should add docs.
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 17, 2020
…nts-as-str, r=Amanieu

Add Arguments::as_str().

There exist quite a few macros in the Rust ecosystem which use `format_args!()` for formatting, but special case the one-argument case for optimization:

```rust
#[macro_export]
macro_rules! some_macro {
    ($s:expr) => { /* print &str directly, no formatting, no buffers */ };
    ($s:expr, $($tt:tt)*) => { /* use format_args to write to a buffer first */ }
}
```

E.g. [here](https://github.com/rust-embedded/cortex-m-semihosting/blob/7a961f0fbe6eb1b29a7ebde4bad4b9cf5f842b31/src/macros.rs#L48-L58), [here](https://github.com/rust-lang-nursery/failure/blob/20f9a9e223b7cd71aed541d050cc73a747fc00c4/src/macros.rs#L9-L17), and [here](https://github.com/fusion-engineering/px4-rust/blob/7b679cd6da9ffd95f36f6526d88345f8b36121da/px4/src/logging.rs#L45-L52).

The problem with these is that a forgotten argument such as in `some_macro!("{}")` will not be diagnosed, but just prints `"{}"`.

With this PR, it is possible to handle the no-arguments case separately *after* `format_args!()`, while simplifying the macro. Then these macros can give the proper error about a missing argument, just like `print!("{}")` does, while still using the same optimized implementation as before.

This is even more important with [RFC 2795](rust-lang/rfcs#2795), to make sure `some_macro!("{some_variable}")` works as expected.
Manishearth added a commit to Manishearth/rust that referenced this pull request Jul 17, 2020
…nts-as-str, r=Amanieu

Add Arguments::as_str().

There exist quite a few macros in the Rust ecosystem which use `format_args!()` for formatting, but special case the one-argument case for optimization:

```rust
#[macro_export]
macro_rules! some_macro {
    ($s:expr) => { /* print &str directly, no formatting, no buffers */ };
    ($s:expr, $($tt:tt)*) => { /* use format_args to write to a buffer first */ }
}
```

E.g. [here](https://github.com/rust-embedded/cortex-m-semihosting/blob/7a961f0fbe6eb1b29a7ebde4bad4b9cf5f842b31/src/macros.rs#L48-L58), [here](https://github.com/rust-lang-nursery/failure/blob/20f9a9e223b7cd71aed541d050cc73a747fc00c4/src/macros.rs#L9-L17), and [here](https://github.com/fusion-engineering/px4-rust/blob/7b679cd6da9ffd95f36f6526d88345f8b36121da/px4/src/logging.rs#L45-L52).

The problem with these is that a forgotten argument such as in `some_macro!("{}")` will not be diagnosed, but just prints `"{}"`.

With this PR, it is possible to handle the no-arguments case separately *after* `format_args!()`, while simplifying the macro. Then these macros can give the proper error about a missing argument, just like `print!("{}")` does, while still using the same optimized implementation as before.

This is even more important with [RFC 2795](rust-lang/rfcs#2795), to make sure `some_macro!("{some_variable}")` works as expected.
bors added a commit to rust-lang-ci/rust that referenced this pull request Nov 15, 2021
…pture, r=Mark-Simulacrum

stabilize format args capture

Works as expected, and there are widespread reports of success with it, as well as interest in it.

RFC: rust-lang/rfcs#2795
Tracking issue: rust-lang#67984

Addressing items from the tracking issue:

- We don't support capturing arguments from a non-literal format string like `format_args!(concat!(...))`. We could add that in a future enhancement, or we can decide that it isn't supported (as suggested in rust-lang#67984 (comment) ).
- I've updated the documentation.
- `panic!` now supports capture as well.
- There are potentially opportunities to further improve diagnostics for invalid usage, such as if it looks like the user tried to use an expression rather than a variable. However, such cases are all already caught and provide reasonable syntax errors now, and we can always provided even friendlier diagnostics in the future.
@Arnavion
Copy link

The RFC doesn't document how it interacts with raw idents, so for posterity:

let r#type = 5;

// Before this feature
assert_eq!("5", format!("{}", r#type));
assert_eq!("6", format!("{type}", type = 6)); // Named explicit does not need to be raw.
assert_eq!("5", format!("{type}", type = r#type));
assert_eq!("5", format!("{type}", r#type = r#type)); // Named explicit can be raw and maps to cooked version in the format string.

println!("{type}", type = type); // Error. RHS is an invalid expr.
println!("{r#type}", r#type = r#type); // Error. Named explicit can be raw but the format string can only use the cooked version.

// New after this feature
assert_eq!("5", format!("{type}")); // Format string references cooked ident and value is from raw ident.
                                    // ie compiler is smart enough to treat this as `type = r#type` and not `type = type`

println!("{r#type}"); // Still an error. Format string can only use the cooked version.
                      // So `"{type}"` is the *only* way to write this.

rimutaka added a commit to rimutaka/concatenation_benchmarks-rs that referenced this pull request Feb 23, 2022
* Rust 1.58 onwards allows using implicit arguments in format! macro
* see rust-lang/rfcs#2795
yvt added a commit to r3-os/r3 that referenced this pull request Mar 21, 2022
Updates format strings to use the implicit named arguments
(`format_args_capture`) introduced in [RFC2795][1] whenever possible.

|        Before        |              After               |
| -------------------- | -------------------------------- |
| `xxx!("{}", self.x)` | `xxx!("{}", self.x)` (no change) |
| `xxx!("{}", x)`      | `xxx!("{x}")`                    |

[1]: rust-lang/rfcs#2795
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-fmt Proposals relating to std::fmt and formatting macros. A-macros-libstd Proposals that introduce new standard library macros disposition-merge This RFC is in PFCP or FCP with a disposition to merge it. finished-final-comment-period The final comment period is finished for this RFC. T-lang Relevant to the language team, which will review and decide on the RFC. T-libs-api Relevant to the library API team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.