-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Efficient params and string formatting #2293
Changes from all commits
3d4b9aa
60df1b0
a901f5e
1411f2c
eafd1d2
6829da0
dbee66e
be08f9c
d3527f4
fc0eeac
d4ab7b9
a09c16d
48cc009
ac55957
6e47720
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,334 @@ | ||
# Efficient Params and String Formatting | ||
|
||
## Summary | ||
This combination of features will increase the efficiency of formatting `string` values and passing of `params` style | ||
arguments. | ||
|
||
## Motivation | ||
The allocation overhead of formatting `string` values can dominate the performance of many text based applications: | ||
from the boxing penalty of `struct` types, the `object[]` allocation for `params` and the intermediate `string` | ||
allocations during `string.Format` calls. In order to maintain efficiency such applications often need to abandon | ||
productivity features such as `params` and `string` interpolation and move to non-standard, hand coded solutions. | ||
|
||
Consider MSBuild as an example. This is written using a lot of modern C# features by developers who are conscious of | ||
performance. Yet in one representative build sample MSBuild will generate 262MB of `string` allocation | ||
using minimal verbosity. Of that 1/2 of the allocations are short lived allocations inside `string.Format`. These | ||
features would remove much of that on .NET Desktop and get it down to nearly zero on .NET Core due to the availability of `Span<T>` | ||
|
||
The set of language features described here will enable applications to continue using these features, with very | ||
little or no churn to their application code base, while removing the unintended allocation overhead in the majority of | ||
cases. | ||
|
||
## Detailed Design | ||
There are a set of features that will be used here to achieve these results: | ||
|
||
- Expanding `params` to support a broader set of collection types. | ||
- Allowing for developers to customize how `string` interpolation is achieved. | ||
- Allowing for interpolated `string` to bind to more efficient `string.Format` overloads. | ||
|
||
### Extending params | ||
The language will allow for `params` in a method signature to have the types `Span<T>`, `ReadOnlySpan<T>` and | ||
`IEnumerable<T>`. The same rules for invocation will apply to these new types that apply to `params T[]`: | ||
|
||
- Can't overload where the only difference is a `params` keyword. | ||
- Can invoke by passing a series of arguments that are implicitly convertible to `T` or a single `Span<T>` / | ||
`ReadOnlySpan<T>` / `IEnumerable<T>` argument. | ||
- Must be the last parameter in a method signature. | ||
- Etc ... | ||
|
||
The `Span<T>` and `ReadOnlySpan<T>` variants will be referred to as `Span<T>` below for simplicity. In cases where the | ||
behavior of `ReadOnlySpan<T>` differs it will be explicitly called out. | ||
|
||
The advantage the `Span<T>` variants of `params` provides is it gives the compiler great flexbility in how it allocates | ||
the backing storage for the `Span<T>` value. With a `params T[]` the compiler must allocate a new `T[]` for every | ||
invocation of a `params` method. Re-use is not possible because it must assume the callee stored and reused the | ||
parameter. This can lead to a large inefficiency in methods with lots of `params` invocations. | ||
|
||
Given `Span<T>` variants are `ref struct` the callee cannot store the argument. Hence the compiler can optimize the | ||
call sites by taking actions like re-using the argument. This can make repeated invocations very efficient as compared | ||
to `T[]`. The language though will make no specific guarantees about how such callsites are optimized. Only note that | ||
the compiler is free to use values other than `T[]` when invoking a `params Span<T>` method. | ||
|
||
One such potential implementation is the following. Consider all `params` invocation in a method body. The compiler | ||
could allocate an array which has a size equal to the largest `params` invocation and use that for all of the | ||
invocations by creating appropriately sized `Span<T>` instances over the array. For example: | ||
|
||
``` csharp | ||
static class OneAllocation { | ||
static void Use(params Span<string> spans) { | ||
... | ||
} | ||
|
||
static void Go() { | ||
Use("jaredpar"); | ||
Use("hello", "world"); | ||
Use("a", "longer", "set"); | ||
} | ||
} | ||
``` | ||
|
||
The compiler could choose to emit the body of `Go` as follows: | ||
|
||
``` csharp | ||
static void Go() { | ||
var args = new string[3]; | ||
args[0] = "jaredpar"; | ||
Use(new Span<int>(args, start: 0, length: 1)); | ||
|
||
args[0] = "hello"; | ||
args[1] = "world"; | ||
Use(new Span<int>(args, start: 0, length: 2)); | ||
|
||
args[0] = "a"; | ||
args[1] = "longer"; | ||
args[2] = "set"; | ||
Use(new Span<int>(args, start: 0, length: 3)); | ||
} | ||
``` | ||
|
||
This can significantly reduce the number of arrays allocated in an application. Allocations can be even further | ||
reduced if the runtime provides utilities for smarter stack allocation of arrays. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. could you expand on this further? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The later sections go into a few of the possible scenarios here around stack allocation. |
||
|
||
This optimization cannot always be applied though. Even though the callee cannot capture the `params` argument it can | ||
still be captured in the caller when there is a `ref` or a `out / ref` parameter that is itself a `ref struct` | ||
type. | ||
|
||
``` csharp | ||
static class SneakyCapture { | ||
static ref int M(params Span<T> span) => ref span[0]; | ||
|
||
static void Oops() { | ||
// This now holds onto the memory backing the Span<T> | ||
ref int r = ref M(42); | ||
} | ||
} | ||
``` | ||
|
||
These cases are statically detectable though. It potentially occurs whenever there is a `ref` return or a `ref struct` | ||
parameter passed by `out` or `ref`. In such a case the compiler must allocate a fresh `T[]` for every invocation. | ||
|
||
Several other potential optimization strategies are discussed at the end of this document. | ||
|
||
The `IEnumerable<T>` variant is a merely a convenience overload. It's useful in scenarios which have frequent uses of | ||
`IEnumerable<T>` but also have lots of `params` usage. When invoked in `T` argument form the backing storage will | ||
be allocated as a `T[]` just as `params T[]` is done today. | ||
|
||
### params overload resolution changes | ||
This proposal means the language now has four variants of `params` where before it had one. It is sensible for methods | ||
to define overloads of methods that differ only on the type of a `params` declarations. | ||
|
||
Consider that `StringBuilder.AppendFormat` would certainly add a `params ReadOnlySpan<object>` overload in addition to | ||
the `params object[]`. This would allow it to substantially improve performance by reducing collection allocations | ||
without requiring any changes to the calling code. | ||
|
||
To facilitate this the language will introduce the following overload resolution tie breaking rule. When the candidate | ||
methods differ only by the `params` parameter then the candidates will be preferred in the following order: | ||
|
||
1. `ReadOnlySpan<T>` | ||
1. `Span<T>` | ||
1. `T[]` | ||
1. `IEnumerable<T>` | ||
|
||
This order is the most to the least efficient for the general case. | ||
|
||
### Variant | ||
CoreFX is prototyping a new managed type named [Variant](https://github.com/dotnet/corefxlab/pull/2595). This type | ||
is meant to be used in APIs which expect heterogeneous values but don't want the overhead brought on by using `object` | ||
as the parameter. The `Variant` type provides universal storage but avoids the boxing allocation for the most commonly | ||
used types. Using this type in APIs like `string.Format` can eliminate the boxing overhead in the majority of cases. | ||
|
||
This type itself is not necessarily special to the language. It is being introduced in this document separately though | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are you going to be putting the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JeremyKuhne owns that. Pretty sure it's one corefxlab already. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. can't stop laughing about the typos they just keep coming There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't worry they won't stop. |
||
as it becomes an implementation detail of other parts of the proposal. | ||
|
||
### Efficient interpolated strings | ||
Interpolated strings are a popular yet inefficient feature in C#. The most common syntax, using an interpolated `string` | ||
as a `string`, translates into a `string.Format(string, params object[])` call. That will incur boxing allocations for | ||
all value types, intermediate `string` allocations as the implementation largely uses `object.ToString` for formatting | ||
as well as array allocations once the number of arguments exceeds the amount of parameters on the "fast" overloads of | ||
`string.Format`. | ||
|
||
The language will change its interpolation lowering to consider alternate overloads of `string.Format`. It will | ||
consider all forms of `string.Format(string, params)` and pick the "best" overload which satisfies the argument types. | ||
The "best" `params` overload will be determined by the rules discussed above. This means interpolated `string` can now | ||
bind to very efficient overloads like `string.Format(string format, params ReadOnlySpan<Variant> args)`. In many cases | ||
this will remove all intermediate allocations. | ||
|
||
### Customizable interpolated strings | ||
Developers are able to customize the behavior of interpolated strings with `FormattableString`. This contains the data | ||
which goes into an interpolated string: the format `string` and the arguments as an array. This though still has the | ||
boxing and argument array allocation as well as the allocation for `FormattableString` (it's an `abstract class`). Hence | ||
it's of little use to applications which are allocation heavy in `string` formatting. | ||
|
||
To make interpolated string formatting efficient the language will recognize a new type: | ||
`System.ValueFormattableString`. All interpolated strings will have a target type conversion to this type. This will | ||
be implemented by translating the interpolated string into the call `ValueFormattableString.Create` exactly as is done | ||
for `FormattableString.Create` today. The language will support all `params` options described in this document when | ||
looking for the most suitable `ValueFormattableString.Create` method. | ||
|
||
``` csharp | ||
readonly struct ValueFormattableString { | ||
public static ValueFormattableString Create(Variant v) { ... } | ||
public static ValueFormattableString Create(string s) { ... } | ||
public static ValueFormattableString Create(string s, params ReadOnlySpan<Variant> collection) { ... } | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would you need a ReadOnlySpan of hte string portions of hte interpolated string, along with the ReadOnlySPan of interpolation values? Or am i misunderstanding how this would be used? For example, if you have:
how is the ValueFormattableString created? I would prefer a form that gave me |
||
} | ||
|
||
class ConsoleEx { | ||
static void Write(ValueFormattableString f) { ... } | ||
} | ||
|
||
class Program { | ||
static void Main() { | ||
ConsoleEx.Write(42); | ||
ConsoleEx.Write($"hello {DateTime.UtcNow}"); | ||
|
||
// Translates into | ||
ConsoleEx.Write(ValueFormattableString.Create((Variant)42)); | ||
ConsoleEx.Write(ValueFormattableString.Create( | ||
"hello {0}", | ||
new Variant(DateTime.UtcNow)); | ||
} | ||
} | ||
``` | ||
|
||
Overload resolution rules will be changed to prefer `ValueFormattableString` over `string` when the argument is an | ||
interpolated string. This means it will be valuable to have overloads which differ only on `string` and | ||
`ValueFormattableString`. Such an overload today with `FormattableString` is not valuable as the compiler will always | ||
prefer the `string` version (unless the developer uses an explicit cast). | ||
|
||
## Open Issues | ||
|
||
### ValueFormattableString breaking change | ||
The change to prefer `ValueFormattableString` during overload resolution over `string` is a breaking change. It is | ||
possible for a developer to have defined a type called `ValueFormattableString` today and use it in method overloads | ||
with `string`. This proposed change would cause the compiler to pick a different overload once this set of features | ||
was implemented. | ||
|
||
The possibility of this seems reasonably low. The type would need the full name `System.ValueFormattableString` and it | ||
would need to have `static` methods named `Create`. Given that developers are strongly discouraged from defining any | ||
type in the `System` namespace this break seems like a reasonable compromise. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It would be great if the Language/Compiler codified this (similer to how some languages carve out There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Leaving There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We've essentially drawn the line on types defined under There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Great. I wasn't certain if htat was something official, or more informal. I was basically saying that it would be nice if really just stated it officially. |
||
|
||
### Expanding to more types | ||
Given we're in the area we sohuld consider adding `IList<T>`, `ICollection<T>` and `IReadOnlyList<T>` to the set of | ||
collections for which `params` is supported. In terms of implementation it will cost a small amount over the other | ||
work here. | ||
|
||
LDM needs to decide if the complication to the language is worth it though. The addition of `IEnumerable<T>` removes | ||
a very specific friction point. Lacking this `params` solution many customers were forced to allocate `T[]` from an | ||
`IEnumerable<T>` when calling a `params` method. The addition of `IEnumerable<T>` fixes this though. There is no | ||
specific friction point that the other interfaces fix here. | ||
|
||
## Considerations | ||
|
||
### Variant2 and Variant3 | ||
The CoreFX team also has a non-allocating set of storage types for up to three `Variant` arguments. These are a single | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Was the '3' brought about by an sort of analysis of existing codebases? i'd be really curious what the 'tail' looks like her.e i.e. if 3 gets us 95% of the way, but 4 gets us to 99.9%, that would be interesting to know. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @JeremyKuhne can comment on that. |
||
`Variant`, `Variant2` and `Variant3`. All have a pair of methods for getting an allocation free `Span<Variant>` off of | ||
them: `CreateSpan` and `KeepAlive`. This means for a `params Span<Variant>` of up to three arguments the call site | ||
can be entirely allocation free. | ||
|
||
``` csharp | ||
static class ZeroAllocation { | ||
static void Use(params Span<Variant> spans) { | ||
... | ||
} | ||
|
||
static void Go() { | ||
Use("hello", "world"); | ||
} | ||
} | ||
``` | ||
|
||
The `Go` method can be lowered to the following: | ||
|
||
``` csharp | ||
static class ZeroAllocation { | ||
static void Go() { | ||
Variant2 _v; | ||
_v.Variant1 = new Variant("hello"); | ||
_v.Variant2 = new Variant("word"); | ||
Use(_v.CreateSpan()); | ||
_v.KeepAlive(); | ||
} | ||
} | ||
``` | ||
|
||
This requires very little work on top of the proposal to re-use `T[]` between `params Span<T>` calls. The compiler | ||
already needs to manage a temporary per call and do clean up work after (even if in one case it's just marking | ||
an internal temp as free). | ||
|
||
Note: the `KeepAlive` function is only necessary on desktop. On .NET Core the method will not be available and hence | ||
the compiler won't emit a call to it. | ||
|
||
### CLR stack allocation helpers | ||
The CLR only provides only | ||
[localloc](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.localloc?redirectedfrom=MSDN&view=netframework-4.7.2) | ||
for stack allocation of contiguous memory. This instruction is limited in that it only works for `unmanaged` types. | ||
This means it can't be used as a universal solution for efficiently allocating the backing storage for `params | ||
Span<T>`. | ||
|
||
This limitation is not some fundamental restriction though but instead more an artifact of history. The CLR could choose | ||
to add new op codes / intrinsics which provide universal stack allocation. These could then be used to allocate the | ||
backing storage for most `params Span<T>` calls. | ||
|
||
``` csharp | ||
static class BetterAllocation { | ||
static void Use(params Span<string> spans) { | ||
... | ||
} | ||
|
||
static void Go() { | ||
Use("hello", "world"); | ||
} | ||
} | ||
``` | ||
|
||
The `Go` method can be lowered to the following: | ||
|
||
``` csharp | ||
static class ZeroAllocation { | ||
static void Go() { | ||
Span<T> span = RuntimeIntrinsic.StackAlloc<string>(length: 2); | ||
span[0] = "hello"; | ||
span[1] = "world"; | ||
Use(span); | ||
} | ||
} | ||
``` | ||
|
||
While this approach is very heap efficient it does cause extra stack usage. In an algorithm which has a deep stack and | ||
lots of `params` usage it's possible this could cause a `StackOverflowException` to be generated where a simple `T[]` | ||
allocation would succeed. | ||
|
||
Unfortunately C# is not set up for the type of inter-method analysis where it could make an educated determination of | ||
whether or not call should use stack or heap allocation of `params`. It can only really consider each call on its | ||
own. | ||
|
||
The CLR is best setup for making this type of determination at runtime. Hence we'd likely have the runtime provide two | ||
methods for universal stack allocation: | ||
|
||
1. `Span<T> StackAlloc<T>(int length)`: this has the same behaviors and limitations of `localloc` except it can work on | ||
any type `T`. | ||
1. `Span<T> MaybeStackAlloc<T>(int length)`: this runtime can choose to implement this by doing a stack or heap | ||
allocation. The runtime can then use the execution context in which it's called to determine how the `Span<T>` is | ||
allocated. The caller though will always treat it as if it were stack allocated. | ||
|
||
For very simple cases, like one to two arguments, the C# compiler could always use `StackAlloc<T>` variant. This is | ||
unlikely to significantly contribute to stack exhaustion in most cases. For other cases the compiler could choose to | ||
use `MaybeStackAlloc<T>` instead and let the runtime make the call. | ||
|
||
How we choose will likely require a deeper investigation and examination of real world apps. But if these new intrinsics | ||
are available then it will give us this type of flexibility. | ||
|
||
### Why not varargs? | ||
The existing | ||
[varargs](https://docs.microsoft.com/en-us/cpp/windows/variable-argument-lists-dot-dot-dot-cpp-cli?view=vs-2017) | ||
feature wsa considered here as a possible solution. This feature though is meant primarily for C++/CLI scenarios and | ||
has known holes for other scenarios. Additionally there is significant cost in porting this to Unix. Hence it wasn't | ||
seen as a viable solution. | ||
|
||
## Related Issues | ||
This spec is related to the following issues: | ||
|
||
- https://github.com/dotnet/csharplang/issues/1757 | ||
- https://github.com/dotnet/csharplang/issues/179 | ||
- https://github.com/dotnet/corefxlab/pull/2595 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we're going to go to this, wouldit be oke to also allow going to ICollection/IList/IReadOnlyList at teh same time (effectively, all the types that T[] already implements)? It feels like it would be a prime time to just roll this in since we're working in this area.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would you need to support the others, since they all implement
IEnumerable<T>
? Generally speaking, types that implement things likeICollection
orIList
can also efficiently implement their iterators (and do in the case of mostCoreFX
types).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that the currently proposed types cover most of the scenarios. That is,
Span<T>
/ROSpan<T>
make sense since they provide allocation-less calls (and declare the immutability of the input data) andIEnumerable<T>
allows easier interop with LINQ (it avoids the need to realize a query).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because i want to expose this api to users:
Allowing people to just pass their existing list to me. Or, they can pass raw values, and it will be an array, an that doesn't need to be converted into an IReadOnlyList since it already is one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll add it to the open issues. We didn't cover it in LDM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much! I would understand if this didn't happen. However, it's been one of those language niggles for ages. If it's cheap, this would be nice to have :) It would def clean up a few of my apis.
Note: @tannergooding and i discussed this on Gitter. It is definitely a strange thing that the language has baked in understanding that an array has an implicit reference conversion to IList and its ilk. And, yet, for 'params' i must specify only a
T[]
type. It seems like it would be pretty simple to just relax that to "anything an array has reference implicit conversion to".