diff --git a/proposals/format.md b/proposals/format.md new file mode 100644 index 0000000000..774b621362 --- /dev/null +++ b/proposals/format.md @@ -0,0 +1,334 @@ +# Efficient Params and String Formatting + +## Summary +This combination of features will increase the efficiency of formatting `string` values and passing of `params` style +arguments. + +## Motivation +The allocation overhead of formatting `string` values can dominate the performance of many text based applications: +from the boxing penalty of `struct` types, the `object[]` allocation for `params` and the intermediate `string` +allocations during `string.Format` calls. In order to maintain efficiency such applications often need to abandon +productivity features such as `params` and `string` interpolation and move to non-standard, hand coded solutions. + +Consider MSBuild as an example. This is written using a lot of modern C# features by developers who are conscious of +performance. Yet in one representative build sample MSBuild will generate 262MB of `string` allocation +using minimal verbosity. Of that 1/2 of the allocations are short lived allocations inside `string.Format`. These +features would remove much of that on .NET Desktop and get it down to nearly zero on .NET Core due to the availability of `Span` + +The set of language features described here will enable applications to continue using these features, with very +little or no churn to their application code base, while removing the unintended allocation overhead in the majority of +cases. + +## Detailed Design +There are a set of features that will be used here to achieve these results: + +- Expanding `params` to support a broader set of collection types. +- Allowing for developers to customize how `string` interpolation is achieved. +- Allowing for interpolated `string` to bind to more efficient `string.Format` overloads. + +### Extending params +The language will allow for `params` in a method signature to have the types `Span`, `ReadOnlySpan` and +`IEnumerable`. The same rules for invocation will apply to these new types that apply to `params T[]`: + +- Can't overload where the only difference is a `params` keyword. +- Can invoke by passing a series of arguments that are implicitly convertible to `T` or a single `Span` / +`ReadOnlySpan` / `IEnumerable` argument. +- Must be the last parameter in a method signature. +- Etc ... + +The `Span` and `ReadOnlySpan` variants will be referred to as `Span` below for simplicity. In cases where the +behavior of `ReadOnlySpan` differs it will be explicitly called out. + +The advantage the `Span` variants of `params` provides is it gives the compiler great flexbility in how it allocates +the backing storage for the `Span` value. With a `params T[]` the compiler must allocate a new `T[]` for every +invocation of a `params` method. Re-use is not possible because it must assume the callee stored and reused the +parameter. This can lead to a large inefficiency in methods with lots of `params` invocations. + +Given `Span` variants are `ref struct` the callee cannot store the argument. Hence the compiler can optimize the +call sites by taking actions like re-using the argument. This can make repeated invocations very efficient as compared +to `T[]`. The language though will make no specific guarantees about how such callsites are optimized. Only note that +the compiler is free to use values other than `T[]` when invoking a `params Span` method. + +One such potential implementation is the following. Consider all `params` invocation in a method body. The compiler +could allocate an array which has a size equal to the largest `params` invocation and use that for all of the +invocations by creating appropriately sized `Span` instances over the array. For example: + +``` csharp +static class OneAllocation { + static void Use(params Span spans) { + ... + } + + static void Go() { + Use("jaredpar"); + Use("hello", "world"); + Use("a", "longer", "set"); + } +} +``` + +The compiler could choose to emit the body of `Go` as follows: + +``` csharp + static void Go() { + var args = new string[3]; + args[0] = "jaredpar"; + Use(new Span(args, start: 0, length: 1)); + + args[0] = "hello"; + args[1] = "world"; + Use(new Span(args, start: 0, length: 2)); + + args[0] = "a"; + args[1] = "longer"; + args[2] = "set"; + Use(new Span(args, start: 0, length: 3)); + } +``` + +This can significantly reduce the number of arrays allocated in an application. Allocations can be even further +reduced if the runtime provides utilities for smarter stack allocation of arrays. + +This optimization cannot always be applied though. Even though the callee cannot capture the `params` argument it can +still be captured in the caller when there is a `ref` or a `out / ref` parameter that is itself a `ref struct` +type. + +``` csharp +static class SneakyCapture { + static ref int M(params Span span) => ref span[0]; + + static void Oops() { + // This now holds onto the memory backing the Span + ref int r = ref M(42); + } +} +``` + +These cases are statically detectable though. It potentially occurs whenever there is a `ref` return or a `ref struct` +parameter passed by `out` or `ref`. In such a case the compiler must allocate a fresh `T[]` for every invocation. + +Several other potential optimization strategies are discussed at the end of this document. + +The `IEnumerable` variant is a merely a convenience overload. It's useful in scenarios which have frequent uses of +`IEnumerable` but also have lots of `params` usage. When invoked in `T` argument form the backing storage will +be allocated as a `T[]` just as `params T[]` is done today. + +### params overload resolution changes +This proposal means the language now has four variants of `params` where before it had one. It is sensible for methods +to define overloads of methods that differ only on the type of a `params` declarations. + +Consider that `StringBuilder.AppendFormat` would certainly add a `params ReadOnlySpan` overload in addition to +the `params object[]`. This would allow it to substantially improve performance by reducing collection allocations +without requiring any changes to the calling code. + +To facilitate this the language will introduce the following overload resolution tie breaking rule. When the candidate +methods differ only by the `params` parameter then the candidates will be preferred in the following order: + +1. `ReadOnlySpan` +1. `Span` +1. `T[]` +1. `IEnumerable` + +This order is the most to the least efficient for the general case. + +### Variant +CoreFX is prototyping a new managed type named [Variant](https://github.com/dotnet/corefxlab/pull/2595). This type +is meant to be used in APIs which expect heterogeneous values but don't want the overhead brought on by using `object` +as the parameter. The `Variant` type provides universal storage but avoids the boxing allocation for the most commonly +used types. Using this type in APIs like `string.Format` can eliminate the boxing overhead in the majority of cases. + +This type itself is not necessarily special to the language. It is being introduced in this document separately though +as it becomes an implementation detail of other parts of the proposal. + +### Efficient interpolated strings +Interpolated strings are a popular yet inefficient feature in C#. The most common syntax, using an interpolated `string` +as a `string`, translates into a `string.Format(string, params object[])` call. That will incur boxing allocations for +all value types, intermediate `string` allocations as the implementation largely uses `object.ToString` for formatting +as well as array allocations once the number of arguments exceeds the amount of parameters on the "fast" overloads of +`string.Format`. + +The language will change its interpolation lowering to consider alternate overloads of `string.Format`. It will +consider all forms of `string.Format(string, params)` and pick the "best" overload which satisfies the argument types. +The "best" `params` overload will be determined by the rules discussed above. This means interpolated `string` can now +bind to very efficient overloads like `string.Format(string format, params ReadOnlySpan args)`. In many cases +this will remove all intermediate allocations. + +### Customizable interpolated strings +Developers are able to customize the behavior of interpolated strings with `FormattableString`. This contains the data +which goes into an interpolated string: the format `string` and the arguments as an array. This though still has the +boxing and argument array allocation as well as the allocation for `FormattableString` (it's an `abstract class`). Hence +it's of little use to applications which are allocation heavy in `string` formatting. + +To make interpolated string formatting efficient the language will recognize a new type: +`System.ValueFormattableString`. All interpolated strings will have a target type conversion to this type. This will +be implemented by translating the interpolated string into the call `ValueFormattableString.Create` exactly as is done +for `FormattableString.Create` today. The language will support all `params` options described in this document when +looking for the most suitable `ValueFormattableString.Create` method. + +``` csharp +readonly struct ValueFormattableString { + public static ValueFormattableString Create(Variant v) { ... } + public static ValueFormattableString Create(string s) { ... } + public static ValueFormattableString Create(string s, params ReadOnlySpan collection) { ... } +} + +class ConsoleEx { + static void Write(ValueFormattableString f) { ... } +} + +class Program { + static void Main() { + ConsoleEx.Write(42); + ConsoleEx.Write($"hello {DateTime.UtcNow}"); + + // Translates into + ConsoleEx.Write(ValueFormattableString.Create((Variant)42)); + ConsoleEx.Write(ValueFormattableString.Create( + "hello {0}", + new Variant(DateTime.UtcNow)); + } +} +``` + +Overload resolution rules will be changed to prefer `ValueFormattableString` over `string` when the argument is an +interpolated string. This means it will be valuable to have overloads which differ only on `string` and +`ValueFormattableString`. Such an overload today with `FormattableString` is not valuable as the compiler will always +prefer the `string` version (unless the developer uses an explicit cast). + +## Open Issues + +### ValueFormattableString breaking change +The change to prefer `ValueFormattableString` during overload resolution over `string` is a breaking change. It is +possible for a developer to have defined a type called `ValueFormattableString` today and use it in method overloads +with `string`. This proposed change would cause the compiler to pick a different overload once this set of features +was implemented. + +The possibility of this seems reasonably low. The type would need the full name `System.ValueFormattableString` and it +would need to have `static` methods named `Create`. Given that developers are strongly discouraged from defining any +type in the `System` namespace this break seems like a reasonable compromise. + +### Expanding to more types +Given we're in the area we sohuld consider adding `IList`, `ICollection` and `IReadOnlyList` to the set of +collections for which `params` is supported. In terms of implementation it will cost a small amount over the other +work here. + +LDM needs to decide if the complication to the language is worth it though. The addition of `IEnumerable` removes +a very specific friction point. Lacking this `params` solution many customers were forced to allocate `T[]` from an +`IEnumerable` when calling a `params` method. The addition of `IEnumerable` fixes this though. There is no +specific friction point that the other interfaces fix here. + +## Considerations + +### Variant2 and Variant3 +The CoreFX team also has a non-allocating set of storage types for up to three `Variant` arguments. These are a single +`Variant`, `Variant2` and `Variant3`. All have a pair of methods for getting an allocation free `Span` off of +them: `CreateSpan` and `KeepAlive`. This means for a `params Span` of up to three arguments the call site +can be entirely allocation free. + +``` csharp +static class ZeroAllocation { + static void Use(params Span spans) { + ... + } + + static void Go() { + Use("hello", "world"); + } +} +``` + +The `Go` method can be lowered to the following: + +``` csharp +static class ZeroAllocation { + static void Go() { + Variant2 _v; + _v.Variant1 = new Variant("hello"); + _v.Variant2 = new Variant("word"); + Use(_v.CreateSpan()); + _v.KeepAlive(); + } +} +``` + +This requires very little work on top of the proposal to re-use `T[]` between `params Span` calls. The compiler +already needs to manage a temporary per call and do clean up work after (even if in one case it's just marking +an internal temp as free). + +Note: the `KeepAlive` function is only necessary on desktop. On .NET Core the method will not be available and hence +the compiler won't emit a call to it. + +### CLR stack allocation helpers +The CLR only provides only +[localloc](https://docs.microsoft.com/en-us/dotnet/api/system.reflection.emit.opcodes.localloc?redirectedfrom=MSDN&view=netframework-4.7.2) + for stack allocation of contiguous memory. This instruction is limited in that it only works for `unmanaged` types. + This means it can't be used as a universal solution for efficiently allocating the backing storage for `params + Span`. + +This limitation is not some fundamental restriction though but instead more an artifact of history. The CLR could choose +to add new op codes / intrinsics which provide universal stack allocation. These could then be used to allocate the +backing storage for most `params Span` calls. + +``` csharp +static class BetterAllocation { + static void Use(params Span spans) { + ... + } + + static void Go() { + Use("hello", "world"); + } +} +``` + +The `Go` method can be lowered to the following: + +``` csharp +static class ZeroAllocation { + static void Go() { + Span span = RuntimeIntrinsic.StackAlloc(length: 2); + span[0] = "hello"; + span[1] = "world"; + Use(span); + } +} +``` + +While this approach is very heap efficient it does cause extra stack usage. In an algorithm which has a deep stack and +lots of `params` usage it's possible this could cause a `StackOverflowException` to be generated where a simple `T[]` +allocation would succeed. + +Unfortunately C# is not set up for the type of inter-method analysis where it could make an educated determination of +whether or not call should use stack or heap allocation of `params`. It can only really consider each call on its +own. + +The CLR is best setup for making this type of determination at runtime. Hence we'd likely have the runtime provide two +methods for universal stack allocation: + +1. `Span StackAlloc(int length)`: this has the same behaviors and limitations of `localloc` except it can work on +any type `T`. +1. `Span MaybeStackAlloc(int length)`: this runtime can choose to implement this by doing a stack or heap +allocation. The runtime can then use the execution context in which it's called to determine how the `Span` is +allocated. The caller though will always treat it as if it were stack allocated. + +For very simple cases, like one to two arguments, the C# compiler could always use `StackAlloc` variant. This is +unlikely to significantly contribute to stack exhaustion in most cases. For other cases the compiler could choose to +use `MaybeStackAlloc` instead and let the runtime make the call. + +How we choose will likely require a deeper investigation and examination of real world apps. But if these new intrinsics +are available then it will give us this type of flexibility. + +### Why not varargs? +The existing +[varargs](https://docs.microsoft.com/en-us/cpp/windows/variable-argument-lists-dot-dot-dot-cpp-cli?view=vs-2017) +feature wsa considered here as a possible solution. This feature though is meant primarily for C++/CLI scenarios and +has known holes for other scenarios. Additionally there is significant cost in porting this to Unix. Hence it wasn't +seen as a viable solution. + +## Related Issues +This spec is related to the following issues: + +- https://github.com/dotnet/csharplang/issues/1757 +- https://github.com/dotnet/csharplang/issues/179 +- https://github.com/dotnet/corefxlab/pull/2595 +