RFC: Introduce `concat_bytes!()` to join `[u8]` and byte `str` analogous to `concat!` for `str` #2509

estebank · 2018-08-01T03:31:56Z

SimonSapin · 2018-08-01T05:44:39Z

I’d like to see the Motivation section expanded. Why is this useful? Consider describing use cases and adding examples. Currently this section only paraphrases the summary.

The RFC’s title and motivation section mention “byte str” and “byte char”, which I find confusing because there is no such thing in my mental model. Rather, the language syntax contains: string literals of type &'static str, character literals of type char, byte string literals (or bytes literal) of type &'static [u8; N], and byte literals (singular byte) of type u8. In particular, a byte string literal is already an array of integers.

If this is added, I’d prefer it to be a separate macro. concat! today already accepts multiple inputs of different types but always returns &'static str. It would be surprising that adding another input can not only change the return type, but also completely change the handling of integer literals (from ASCII digits serialization to raw byte values).

For that separate macro, the name bconcat! is rather obscure. It is not typical of the standard library’s to do such aggressive abbreviations in public API naming. How about concat_bytes!?

estebank · 2018-08-01T18:10:39Z

@SimonSapin sounds reasonable!

I’d like to see the Motivation section expanded. Why is this useful? Consider describing use cases and adding examples. Currently this section only paraphrases the summary.

Will expand. I can definitely see the case where someone would want to do concat_bytes!(b"foo", 3, 2).

The RFC’s title and motivation section mention “byte str” and “byte char”, which I find confusing because there is no such thing in my mental model. Rather, the language syntax contains: string literals of type &'static str, character literals of type char, byte string literals (or bytes literal) of type &'static [u8; N], and byte literals (singular byte) of type u8. In particular, a byte string literal is already an array of integers.

The confusion seems to be around "byte char" (b'x'), which seems to be referred to as "byte literal" in the docs, but that seems slightly misleading to me as a single char can have multiple bytes.

If this is added, I’d prefer it to be a separate macro. concat! today already accepts multiple inputs of different types but always returns &'static str. It would be surprising that adding another input can not only change the return type, but also completely change the handling of integer literals (from ASCII digits serialization to raw byte values).

Very reasonable. It would be easy enough to point people in the right direction when extending the current error.

For that separate macro, the name bconcat! is rather obscure. It is not typical of the standard library’s to do such aggressive abbreviations in public API naming. How about concat_bytes!?

Works for me. The discoverability issue could be handled when someone attempts to use concat!(b"").

joshtriplett · 2018-08-10T18:43:55Z

I agree with @SimonSapin's objection; I think this should use a separate macro concat_bytes!.

Apart from that, though, I'd love to see this.

jsgf · 2018-08-17T23:39:56Z

Whenever any of the arguments to concat!() is a byte literal, its output will be a byte literal, and the other arguments will be evaluated on their byte contents.

In general I agree with the suggestion of making the type with concat_bytes!, but I think it is also necessary - none of the parameters to concat!()/concat_bytes!() need be a string (concat!(1,2,3) => "123", for example), so there wouldn't be any type cue. You could consider trying to infer the type from the context, but that's getting complex.

Edit: I guess you could require a dummy b"" parameter, but that's pretty hacky.

jsgf · 2018-08-17T23:58:09Z

XX |     concat!(256, b"val");
   |             ^^^ this value is larger than `255`

I think this should have identical behaviour to string concat - ie emit b"256val" in this case. If you want an arbitrary byte then you could do b'\xff'.

SimonSapin · 2018-08-18T04:47:28Z

Well, b'\xff' is 256_u8. It’s only another literal syntax for the same type, just like 0xFF_u8 is yet another. Technically since macro inputs are token tree concat_bytes! could tell the difference between the two, but since we make it look like a function that takes comma-separated values I think it would feel very weird for this "function" to behave so differently with two arguments that are spelled with different syntax but are the same at the type level and expression-value level.

SimonSapin · 2018-08-18T04:50:13Z

Maybe, to avoid this confusion, concat_bytes! should not accept single-integer arguments. Only bytes arrays/slices (with either b"foo" byte string literal syntax or [b'f', 0x6F, 111] array literal syntax)

jsgf · 2018-08-18T07:16:13Z

My point is that concat!(123) -> "123": &str, and I think that concat_bytes!() should have identical behaviour, except that its result type is &[u8] - that is, they should accept the same inputs, and produce byte-for-byte identical output, albeit a slightly different type. The only extension is that concat_bytes!() should also accept binary string literals.

estebank · 2018-08-19T21:37:55Z

@jsgf sounds reasonable for concat and concat_bytes to behave exactly the same with only a difference in the output type. The original PR opened without an RFCS did get concat to change behavior if an b"" was present, but it'd be too "magical" in practice for my taste.

SimonSapin · 2018-08-21T15:42:20Z

That would be one way to define it and it’s not too terrible but:

It’s almost the same as concat!(…).as_bytes() (except it could return a known-size array rather than a slice)
It would implicitly encode Unicode characters as UTF-8, without a visible indication that this is happening. Compare with byte-string literals, where we’ve chosen to disallow non-ASCII characters entirely to "force" users to consider what character encodings they want. (Non-ASCII bytes can be included with hex backslash escapes like \xFF.)

I would prefer concat_bytes! to not accept "normal" (Unicode) string literals or character literals at all.

estebank · 2018-08-21T18:42:59Z

It’s almost the same as concat!(…).as_bytes() (except it could return a known-size array rather than a slice)

For some reason I though arbitrary slices couldn't be const, alas they are. TIL.

I would prefer concat_bytes! to not accept "normal" (Unicode) string literals or character literals at all.

Given that the concat_bytes macro would be a builtin, we can perform very deep evaluation of the values being passed in. We could accept ascii only strings and fail otherwise, or (maybe more interestingly), we could warn on these and provide a way of doing it without having to rely on it, for example suggest concat_bytes!("felíz ", "año") -> concat_bytes!("fel\u{ED}z ", "a\u{F1}o") in the same way that b"" could have \xFF suggested. My reasoning being that unicode literals in strings make it clear how wide they are, making the concern less of a foot-gun.

SimonSapin · 2018-08-21T19:20:49Z

"fel\u{ED}z " is an interesting case (like the rest of the U+0080 to U+00FF range) because the two-digit hexadecimal escape looks like it might represent a single \xED byte but it doesn’t. That string is represented as b"fel\xC3\xADz ", not b"fel\xEDz ".

This is exactly the kind of confusion I’d prefer to avoid.

We could have the proc macro inspect Unicode strings and reject those that contain non-ASCII characters, but at that point why not reject Unicode strings entirely and suggest using byte strings instead?

estebank · 2018-08-21T20:38:59Z

@SimonSapin I'm ok with rejecting unicode strings, as long as we suggest the appropriate hex representation of values outside of the ascii range, as this seems like something easy to include and very useful for end users.

SimonSapin · 2018-08-21T20:42:36Z

Sure, making a custom error message (that mentions UTF-8) for this case sounds good.

whatisaphone · 2020-01-09T22:20:17Z

I'd love to see this capability in the stdlib.

My motivation (which I don't think was mentioned yet) is building null-terminated strings at compile-time, for C interop.

macro_rules! zstr {
    ($str:literal) => {
        concat_bytes!($str, b"\0");
    };
}

zstr!(b"abc") == b"abc\0"

Lokathor · 2020-01-09T23:14:04Z

You can do that already with normal strings. I use a null_str! macro myself

whatisaphone · 2020-01-09T23:20:21Z

Very true! That only works with UTF-8 strings though. I'm unfortunately working with Windows-1252. I guess that was an important detail.

joshtriplett · 2020-08-05T17:35:45Z

@nikomatsakis, @Mark-Simulacrum, and I talked about this one today. We generally think this is a good idea. We feel that, while as a builtin macro this is in the lang team's scope, we'd be comfortable with the @rust-lang/libs team evaluating this as if it were not a builtin macro (in which case it would be entirely within the scope of libs).

I do think this should be a separate macro like concat_bytes! rather than overloading concat!.

Amanieu · 2020-08-05T22:05:36Z

I'm happy to start an FCP on behalf of the libs team once the RFC text is updated to use concat_bytes!.

jplatte · 2020-10-21T17:51:37Z

I'm interested in moving this forward. Should I open a PR on @estebank's branch to update the RFC text?

joshtriplett · 2020-10-21T21:11:37Z

@jplatte You should coordinate with @estebank (here or via Zulip). A PR to help update the RFC text seems like a good step, though, after you've coordinated to confirm that'll work. And thank you!

BurntSushi · 2021-03-17T18:21:43Z

This LGTM. Although, I don't think I'd mind seeing string/char literals supported in concat_bytes!. @SimonSapin mentioned above that this would just be similar to concat!("something").as_bytes(), but AIUI, it would enable things like, concat_bytes!("☃", b"\xFF", "foo"). That is, it would let you mix and match. An undoubtedly niche case, but is there any other way to do such a thing?

m-ou-se · 2021-03-17T19:15:30Z

@rfcbot concern argument types

With concat!() it turned out that it accepting numbers/bools wasn't a great idea: println!(..) expanded to something like print!(concat!(.., "\n")), which caused println!(123) to be accepted, since concat!() converts it silently to a string. This proposed concat_bytes!() macro doesn't do number formatting, but would still accept integers as arguments, which are used as single bytes.

Considering this, I have two small related concerns about the types of arguments to accept:

Inconsistency with concat!(): concat!(123, "\n") produces "123\n", but concat_bytes!(123, b"\n") produces b"\x7b\n".
Making the same mistakes possible as with concat!(): a macro expanding to concat_bytes!($arg, b"\0") such as the zstr!() example above would also accept zstr!(123), which was probably not intended.

SimonSapin · 2021-03-17T19:24:29Z

~~@BurntSushi If Unicode arguments are not supported directly you could still mix and match by converting arguments: concat_bytes!("☃".as_bytes(), b"\xFF")~~

BurntSushi · 2021-03-17T20:11:47Z

@SimonSapin Ah thanks for that. I'm totally fine with that.

SimonSapin · 2021-03-17T21:49:10Z

Oh sorry, I got confused with other macros whose expansion includes expression arguments. The point of this one is to expand to a single literal, so it needs to parse the token of its arguments which therefore can’t be arbitrary expressions.

BurntSushi · 2021-03-17T22:26:56Z

@SimonSapin Ah, maybe that was why I thought the as_bytes() work-around wouldn't work when I wrote my first comment, but it was totally subconscious. Because of that limitation, how do you feel about allowing string literals in concat_bytes!? It's somewhat niche, but there doesn't seem to be a huge drawback to me for allowing it. I suppose we could also just punt on that since it seems like something that could be added backwards compatibly.

SimonSapin · 2021-03-18T05:32:44Z

I’d prefer for the conversion and choice of encoding to be explicit when converting Unicode to bytes (even if the conversion is only type-level because the memory representation happens to match), but my pre-Rust-1.0 proposal to rename str::as_bytes to str::as_utf8 was not accepted so 🤷

jplatte · 2021-03-18T08:24:06Z

Actually no, I don't see how that could possibly work. concat_bytes! operates on literals, not constant expressions.

BurntSushi · 2021-03-18T09:37:51Z

Oh sorry, I got confused with other macros whose expansion includes expression arguments. The point of this one is to expand to a single literal, so it needs to parse the token of its arguments which therefore can’t be arbitrary expressions.

Yeah I share the sentiment. I think my retort would just be that it would otherwise be not possible to mix non-UTF-8 with UTF-8 via the nice syntax of Unicode strings. For me personally, I think that might outweigh the desire for encoding explicitness.

But I do not feel that strongly about this at all.

jplatte · 2021-03-23T11:09:53Z

Is now a good time to also rename the file from 0000-byte-concat.md to 2509-byte-concat.md or is that automated and I don't need to do anything?

joshtriplett · 2021-04-14T19:25:32Z

We discussed this in today's @rust-lang/libs meeting, and came to a resolution on the question of integer literals:

We'd like to reject bare integer literals, and only allow arrays of integer literals. So, concat_bytes!(b"hello", [1, 2, 3], "world") would be accepted, but concat_bytes!(b"hello", 1, 2, 3, "world") would not be.

It would also be nice to have a diagnostic with a suggestion, that suggests wrapping a series of integer literals with square brackets. So, concat_bytes!(b"hello", 1, 2, 3, "world") should get a rustfix-applicable suggestion to become concat_bytes!(b"hello", [1, 2, 3], "world").

bstrie · 2021-05-05T19:10:08Z

Minor issue that doesn't need to block acceptance of the RFC, but recently there have been some discussions about whether or not new macros ought to be properly namespaced rather than just being effectively added to the prelude outright (I'm not even sure if macro namespacing existed or was mature when this RFC was originally written). Is this new item useful enough that it should live in the prelude, or should it rather be, say, std::array::concat_bytes?

jplatte · 2021-05-27T13:05:31Z

@m-ou-se Is your concern resolved with the latest revision?

m-ou-se · 2021-05-27T13:25:59Z

Is now a good time to also rename the file from 0000-byte-concat.md to 2509-byte-concat.md or is that automated and I don't need to do anything?

Usually that's done by whoever merges the RFC after the FCP is finished. But feel free to rename it yourself, to save that person one step in the rfc-merging process. ^^

Is your concern resolved with the latest revision?

Yes

@rfcbot resolve argument types

UTF-8

I'd personally prefer for this macro to accept regular string literals for UTF-8, but we can always add that later if that turns out to be useful, as it'd be backwards compatible.

rfcbot · 2021-05-27T13:26:03Z

🔔 This is now entering its final comment period, as per the review above. 🔔

rfcbot · 2021-06-06T13:38:57Z

The final comment period, with a disposition to merge, as per the review above, is now complete.

As the automated representative of the governance process, I would like to thank the author for their work and everyone else who contributed.

The RFC will be merged soon.

jplatte · 2021-07-26T23:30:39Z

Ping @joshtriplett

m-ou-se · 2021-07-28T15:44:28Z

🎉

This is now tracked in rust-lang/rust#87555

Fishrock123 · 2021-07-29T17:58:06Z

This RFC appears to contain a contradiction:

For example, concat_bytes!(42, b"va", b'l', [1, 2]) evaluates to
[42, 118, 97, 108, 1, 2].

And then later

An earlier version of this RFC proposed to support integer literals outside of
arrays, but that was rejected since it would make the output of
byte_concat!(123, b"\n") inconsistent with the equivalent concat!
invocation.

RFC: Teach concat!() to join [u8] and byte str

ba2d9d7

estebank mentioned this pull request Aug 1, 2018

Teach concat!() to concatenate byte str literals rust-lang/rust#52932

Closed

Centril added the T-libs-api Relevant to the library API team, which will review and decide on the RFC. label Aug 1, 2018

This comment has been minimized.

Sign in to view

Centril mentioned this pull request Oct 7, 2018

Tweaking concat! macro to accept bytes string #1187

Closed

Centril added the A-macros-libstd Proposals that introduce new standard library macros label Nov 22, 2018

This comment has been minimized.

Sign in to view

Centril added the T-lang Relevant to the language team, which will review and decide on the RFC. label Jan 10, 2020

KodrAus added the Libs-Tracked Libs issues that are tracked on the team's project board. label Jul 29, 2020

jplatte mentioned this pull request Mar 23, 2021

Update byte-concat RFC estebank/rfcs#2

Merged

jplatte added 2 commits May 26, 2021 13:42

byte-concat: Delete unneeded sections

32ed992

byte-concat: Address argument type concern

431dadf

rfcbot added final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. and removed proposed-final-comment-period Currently awaiting signoff of all team members in order to enter the final comment period. labels May 27, 2021

rfcbot added finished-final-comment-period The final comment period is finished for this RFC. to-announce and removed final-comment-period Will be merged/postponed/closed in ~10 calendar days unless new substational objections are raised. labels Jun 6, 2021

m-ou-se mentioned this pull request Jul 28, 2021

Tracking Issue for RFC2509: concat_bytes!() rust-lang/rust#87555

Open

4 tasks

m-ou-se merged commit 431dadf into rust-lang:master Jul 28, 2021

Fishrock123 mentioned this pull request Jul 29, 2021

Implement concat_bytes! rust-lang/rust#87599

Merged

RFC: Introduce concat_bytes!() to join [u8] and byte str analogous to concat! for str #2509

RFC: Introduce concat_bytes!() to join [u8] and byte str analogous to concat! for str #2509

Conversation

estebank commented Aug 1, 2018 • edited by m-ou-se Loading

SimonSapin commented Aug 1, 2018

estebank commented Aug 1, 2018

This comment has been minimized.

joshtriplett commented Aug 10, 2018

jsgf commented Aug 17, 2018 • edited Loading

jsgf commented Aug 17, 2018

SimonSapin commented Aug 18, 2018

SimonSapin commented Aug 18, 2018

jsgf commented Aug 18, 2018

estebank commented Aug 19, 2018

SimonSapin commented Aug 21, 2018

estebank commented Aug 21, 2018

SimonSapin commented Aug 21, 2018

estebank commented Aug 21, 2018

SimonSapin commented Aug 21, 2018 • edited Loading

This comment has been minimized.

whatisaphone commented Jan 9, 2020

Lokathor commented Jan 9, 2020

whatisaphone commented Jan 9, 2020 • edited Loading

joshtriplett commented Aug 5, 2020

Amanieu commented Aug 5, 2020

jplatte commented Oct 21, 2020

joshtriplett commented Oct 21, 2020 • edited Loading

BurntSushi commented Mar 17, 2021

m-ou-se commented Mar 17, 2021

SimonSapin commented Mar 17, 2021 • edited Loading

BurntSushi commented Mar 17, 2021

SimonSapin commented Mar 17, 2021

BurntSushi commented Mar 17, 2021

SimonSapin commented Mar 18, 2021

jplatte commented Mar 18, 2021

BurntSushi commented Mar 18, 2021

jplatte commented Mar 23, 2021

joshtriplett commented Apr 14, 2021 • edited Loading

bstrie commented May 5, 2021

jplatte commented May 27, 2021

m-ou-se commented May 27, 2021

rfcbot commented May 27, 2021

rfcbot commented Jun 6, 2021

jplatte commented Jul 26, 2021

m-ou-se commented Jul 28, 2021

Fishrock123 commented Jul 29, 2021

RFC: Introduce `concat_bytes!()` to join `[u8]` and byte `str` analogous to `concat!` for `str` #2509

RFC: Introduce `concat_bytes!()` to join `[u8]` and byte `str` analogous to `concat!` for `str` #2509

estebank commented Aug 1, 2018 •

edited by m-ou-se

Loading

jsgf commented Aug 17, 2018 •

edited

Loading

SimonSapin commented Aug 21, 2018 •

edited

Loading

whatisaphone commented Jan 9, 2020 •

edited

Loading

joshtriplett commented Oct 21, 2020 •

edited

Loading

SimonSapin commented Mar 17, 2021 •

edited

Loading

joshtriplett commented Apr 14, 2021 •

edited

Loading