From 997dc251e4848538f89e925af95cd8af727b4952 Mon Sep 17 00:00:00 2001 From: AlekseyTs Date: Thu, 16 Jun 2022 11:24:23 -0700 Subject: [PATCH 1/2] Add UTF8 byte representation concatenation operator to utf8-string-literals.md --- proposals/utf8-string-literals.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/proposals/utf8-string-literals.md b/proposals/utf8-string-literals.md index a35ac5f44a..fbf9ca5e9f 100644 --- a/proposals/utf8-string-literals.md +++ b/proposals/utf8-string-literals.md @@ -63,6 +63,22 @@ When the input text for the literal is a malformed UTF16 string, then the langua var bytes = "hello \uD801\uD802"u8; // Error: the input string is not valid UTF16 ``` +### Addition operator + +A new bullet point will be added to [ยง11.9.5 Addition operator](https://github.com/dotnet/csharpstandard/blob/draft-v7/standard/expressions.md#1195-addition-operator) as follows. + +- UTF8 byte representation concatenation: + + ```csharp + ReadOnlySpan operator +(ReadOnlySpan x, ReadOnlySpan y); + ``` + + This binary `+` operator performs byte sequences concatenation and is applicable if and only if both operands are semantically UTF8 byte representations. + An operand is semantically a UTF8 byte representation when it is eiher a value of a `u8` literal, or a value produced by the UTF8 byte representation concatenation operator. + + The result of the UTF8 byte representation concatenation is a ```ReadOnlySpan``` that consists of the bytes of the left operand followed by the bytes of the right operand. A null terminator is placed beyond the last byte in memory (and outside the length of the ```ReadOnlySpan```) in order to handle some +interop scenarios where the call expects null terminated strings. + ### Lowering The language will lower the UTF8 encoded strings exactly as if the developer had typed the resulting `byte[]` literal in code. For example: @@ -78,6 +94,17 @@ ReadOnlySpan span = new ReadOnlySpan(new byte[] { 0x68, 0x65, 0x6c, That means all optimizations that apply to the `new byte[] { ... }` form will apply to utf8 literals as well. This means the call site will be allocation free as C# will optimize this be stored in the `.data` section of the PE file. +Multiple consequitive applications of UTF8 byte representation concatenation operators are collapsed into a single creation of `ReadOnlySpan` with byte array containing the final byte sequence. + +```c# +ReadOnlySpan span = "h"u8 + "el"u8 + "lo"u8; + +// Equivalent to + +ReadOnlySpan span = new ReadOnlySpan(new byte[] { 0x68, 0x65, 0x6c, 0x6c, 0x6f, 0x00 }). + Slice(0,5); // The `Slice` call will be optimized away by the compiler. +``` + ## Drawbacks ### Relying on core APIs The compiler implementation will use `UTF8Encoding` for both invalid string detection as well as translation to `byte[]`. The exact APIs will possibly depend on which target framework the compiler is using. But `UTF8Encoding` will be the workhorse of the implementation. @@ -407,3 +434,4 @@ Examples where we leave perf on the table https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-01-26.md https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-04-18.md +https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-06-06.md From 1577ecb9eba896104d6ab02d7e08db2a5122f24a Mon Sep 17 00:00:00 2001 From: AlekseyTs Date: Thu, 16 Jun 2022 15:51:45 -0700 Subject: [PATCH 2/2] Update proposals/utf8-string-literals.md --- proposals/utf8-string-literals.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/utf8-string-literals.md b/proposals/utf8-string-literals.md index fbf9ca5e9f..6cf61ecfde 100644 --- a/proposals/utf8-string-literals.md +++ b/proposals/utf8-string-literals.md @@ -94,7 +94,7 @@ ReadOnlySpan span = new ReadOnlySpan(new byte[] { 0x68, 0x65, 0x6c, That means all optimizations that apply to the `new byte[] { ... }` form will apply to utf8 literals as well. This means the call site will be allocation free as C# will optimize this be stored in the `.data` section of the PE file. -Multiple consequitive applications of UTF8 byte representation concatenation operators are collapsed into a single creation of `ReadOnlySpan` with byte array containing the final byte sequence. +Multiple consecutive applications of UTF8 byte representation concatenation operators are collapsed into a single creation of `ReadOnlySpan` with byte array containing the final byte sequence. ```c# ReadOnlySpan span = "h"u8 + "el"u8 + "lo"u8;