From 9dd686331bba99b3909a501f2557c8abdbd611d9 Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Wed, 23 Sep 2020 16:29:03 -0700 Subject: [PATCH 1/9] Proposal: Low level struct improvements This design covers adding the following features to C# for low level struct performance improvements: - Allowing `ref struct` to contain `ref` fields. - Allowing `struct` and `ref struct` to return `ref` to their fields. - Allowing safe fixed buffers for unmanaged and managed types. Note: the section on providing parameter escape notations is incomplete at this time. I will send a PR in the future once I finalize this portion. Feel free to still leave comments though. --- proposals/low-level-struct-improvements.md | 972 +++++++++++++++++++++ 1 file changed, 972 insertions(+) create mode 100644 proposals/low-level-struct-improvements.md diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md new file mode 100644 index 0000000000..c368254aee --- /dev/null +++ b/proposals/low-level-struct-improvements.md @@ -0,0 +1,972 @@ +Low Level Struct Improvements +===== + +## Summary +This proposal is an aggregation of several different proposals for `struct` +performance improvements. The goal being a design which takes into account the +various proposals to create a single overarching feature set for `struct` +improvements. + +## Motivation +Over the last few releases C# has added a number of low level performance +features to the language: `ref` returns, `ref struct`, function pointers, +etc. ... These enabled .NET developers to create write highly performant code +while continuing to leverage the C# language rules for type and memory safety. +It also allowed the creation of fundamental performance types in the .NET +libraries like `Span`. + +As these features have gained traction in the .NET ecosystem developers, both +internal and external, have been providing us with information on remaining +friction points in the ecosystem. Places where they still need to drop to +`unsafe` code to get their work, or require the runtime to special case types +like `Span`. + +This proposal aims to address many of these concerns by building on top of our +existing low level features. Specifically it aims to: + +- Allow `ref struct` types to declare `ref` fields. +- Allow the runtime to fully define `Span` using the C# type system and +remove special case type like `ByReference` +- Allow `struct` types to return `ref` to their fields. +- Allow the declaration of safe `fixed` buffers for managed and unmanaged types +in `struct` + +## Detailed Design +The rules for `ref struct` safety are defined in the +[span-safety document](https://github.com/dotnet/csharplang/blob/master/proposals/csharp-7.2/span-safety.md). +This document will describe the required changes to this document as a result of +this proposal. Once accepted as an approved feature these changes will be +incorporated into that document. + +### Provide ref fields +The language will allow developers to declare `ref` fields inside of a +`ref struct`. This can be useful for example when encapsulating large +mutable `struct` instances or defining high performance types like `Span` +in libraries besides the runtime. + +Today `ref` fields accomplished in the runtime by using the `ByReference` +type which the runtime treats effective as a `ref` field. This means though +that only the runtime repository can take full advantage of `ref` field like +behavior and all uses of it require manual verification of safety. Part of the +[motivation for this work](https://github.com/dotnet/runtime/issues/32060) is +to remove `ByReference` and use proper `ref` fields in all code bases. +The challenging part about allowing `ref` fields declarations though comes in +defining rules such that `Span` can be defined using `ref` fields without +breaking compatibility with existing code. + +Before diving into the problems here it should be noted that `ref` fields only +require a small number of targeted changes to our existing span safety rules. In +some cases it's not even to support new features but to rationalize our existing +`Span` usage of `ref` data. This section of the proposal is quite involved +though because I feel it's important to communicate the "why" of these changes +in as much detail as possible and providing supporting samples. This is to both +ensure the changes are sound as well as giving future developers a better +understanding of the choices made here. + +To understand the challenges here let's first consider how `Span` will look +once `ref` fields are supported. + +```cs +// This is the eventual definition of Span once we add ref fields into the +// language +readonly ref struct Span +{ + ref readonly T _field; + int _length; + + // This constructor does not exist today however will be added as a + // part of changing Span to have ref fields. It is a convenient, and + // safe, way to create a length one span over a stack value that today + // requires unsafe code. + public Span(ref T value) + { + ref _field = ref value; + _length = 1; + } +} +``` + +The constructor defined here presents a problem because its return values must +necessarily have restricted lifetimes for many inputs. Consider that if a +local parameter is passed by `ref` into this constructor that the returned +`Span` must have a *safe-to-escape* scope of the local's declaration scope. + +```cs +Span CreatingAndReturningSpan() +{ + int i = 42; + + // This must be an error in the new design because it stores stack + // state in the Span. + return new Span(ref i); + + // This must be legal in the new design because it is legal today (it + // cannot store stack state) + return new Span(new int[] { }); +} +``` + +At the same time it is legal to have methods today which take a `ref` parameter +and return a `Span`. These methods bear a lot of similarity to the newly +added `Span` constructor: take a `ref`, return a `Span`. However the +lifetime of the return value of these methods is never restricted by the inputs. +The existing span safety rules consider such values as effectively always +*safe-to-escape* outside the enclosing method. + +```cs +class ExistingScenarios +{ + Span CreateSpan(ref T p) + { + // The implementation of this method is irrelevant. From the point of + // the consumer the returned value is always safe to return. + ... + } + + Span Examples(ref T p, T[] array) + { + // Legal today + return CreateSpan(ref p); + + // Legal today, must remain legal + T local = default; + return CreateSpan(ref local); + + // Legal for any possible value that could be used as an argument + return CreateSpan(...); + } +} +``` + +The reason that all of the above samples are legal is because in the +existing design there is no way for the return `Span` to store a reference +to the input state of the method call. This is because the span safety rules +explicitly depend on `Span` not having a constructor which takes a `ref` +parameter and stores it as a field. + +```cs +class ExistingAssumptions +{ + Span CreateSpan(ref T p) + { + // !!! Cannot happen today !!! + // The existing span safety rules specifically call out that this method + // cannot exist hence they can assume all returns from CreateSpan are + // safe to return. + return new Span(ref p); + } +} +``` + +The rules we define for `ref` fields must ensure the `Span` constructor +properly restricts the *safe-to-escape* scope of constructed objects in the +cases it captures `ref` state. At the same time it must ensure that we don't +break the existing consumption rules for methods like `CreateSpan`. + +```cs +class GoalOfRefFields +{ + Span CreateSpan(ref T p) + { + // ERROR: the existing consumption rules for CreateSpan believe this + // can never happen hence we must continue to enforce that it cannot + return new Span(ref p); + + // Okay: this is legal today + return new Span(new int[] { }); + } + + Span ConsumptionCompatibility() + { + // Okay: this is legal today and must remain legal. + int local = 42; + return CreateSpan(ref local); + + // Okay: the arguments don't actually matter here. Literally any value + // could be passed to this method and the return of it would still be + // *safe-to-escape* outside the enclosing method. + return CreateSpan(...); + } +} +``` + +This tension between allowing constructors such as `Span(ref T field)` and +ensuring compatibility with `ref struct` returning methods like `CreateSpan` +is a key pivot point in the design of `ref` fields. + +To do this we will change the escape rules for a constructor invocation, which +today are the same as method invocation, on a `ref struct` that **directly** +contains a `ref` field as follows: +- If the constructor contains any `ref struct`, `ref`, `out` or `in` parameters +then the *safe-to-escape* of the return will be the current scope +- Else the *safe-to-escape* will be the outside the method scope + +Lets examine these rules in the context of samples to better understand their +impact. + +```cs +ref struct RS +{ + ref int _field; + + public RS(int[] array, int index) + { + ref _field = ref array[index]; + } + + public RS(ref int i) + { + ref _field = ref i; + } + + static RS CreateRS(ref int i) + { + // The implementation of this method is irrelevant to the safety rule + // examples below. The returned value is always *safe-to-escape* outside + // the enclosing method scope + } + + static RS RuleExamples(ref int i, int[] array) + { + var rs1 = new RS(ref i); + + // ERROR by bullet 1: the safe-to-escape scope of 'rs1' is the current + // scope. + return rs1; + + var rs2 = new RS(array, 0); + + // Okay by bullet 2: the safe-to-escape scope of 'rs2' is outside the + // method scope. + return rs2; + + int local = 42; + + // ERROR by bullet 1: the safe-to-escape scope is the current scope + return new RS(ref local); + return new RS(ref i); + + // Okay because rules for method calls have not changed. This is legal + // today hence it must be legal in the presence of ref fields. + return CreateRS(ref local); + return CreateRS(ref i); + } +} +``` + +It is important to note that for the purposes of the rule above any use of +constructor chaining via `this` is considered a constructor invocation. The +result of the chained constructor call is considered to be returning to the +original constructor hence *safe-to-escape* rules come into play. That is +important in avoiding unsafe examples like the following: + +```cs +ref struct RS1 +{ + ref int _field; + public RS1(ref int p) + { + ref _field = ref p; + } +} + +ref struct RS2 +{ + RS1 _field; + public RS2(RS1 p) + { + // Okay + _field = p; + } + + public RS2(ref int i) + { + // ERROR: The *safe-to-escape* scope of the constructor here is the + // current method scope while the *safe-to-escape* scope of `this` is + // outside the current method scope hence this assignment is illegal + _field = new RS1(ref i); + } + + public RS2(ref int i) + // ERROR: the *safe-to-escape* return of :this the current method scope + // but the 'this' parameter has a *safe-to-escape* outside the current + // method scope + : this(new RS1(ref i)) + { + + } +} +``` + +The limiting of the constructor rules to just `ref struct` that directly contain + `ref` field is another important compatibility concern. Consider that the +majority of `ref struct` defined today indirectly contain `Span` references. +That mean by extension they will indirectly contain `ref` fields once `Span` +adopts `ref` fields. Hence it's important to ensure the *safe-to-return* rules +of constructors on these types do not change. That is why the restrictions +must only apply to types that directly contain a `ref` field. + +Example of where this comes into play. + +```cs +ref struct Container +{ + LargeStruct _largeStruct; + Span _span; + + public Container(in LargeStruct largeStruct, Span span) + { + _largeStruct = largeStruct; + _span = span; + } +} +``` + +Much like the `CreateSpan` example before the *safe-to-escape* return of the +`Container` constructor is not impacted by the `largeStruct` parameter. If the +new constructor rules were applied to this type then it would break +compatibility with existing code. The existing rules are also sufficient for +existing constructors to prevent them from simulating `ref` fields by storing +them into `Span` fields. + +```cs +ref struct RS4 +{ + Span _span; + + public RS4(Span span) + { + // Legal today and the rules for this constructor invocation + // remain unchanged + _span = span; + } + + public RS4(ref int i) + { + // ERROR. Bullet 1 of the new constructor rules gives this newly created + // Span a *safe-to-escape* of the current scope. The 'this' parameter + // though has a *safe-to-escape* outside the current method. Hence this + // is illegal by assignment rules because it's assigning a smaller scope + // to a larger one. + _span = new Span(ref i); + } + + // Legal today, must remain legal for compat. If the new constructor rules + // applied to 'RS4' though this would be illegal. This is why the new + // constructor rules have a restriction to directly defining a ref field + // + // Only ref struct which explicitly opt into ref fields would see a breaking + // change here. + static RS4 CreateContainer(ref int i) => new RS4(ref i); +} +``` + +This design also requires that the rules for field lifetimes be expanded as the +rules today simply don't account for them. It's important to note that our +expansion of the rules here is not defining new behavior but rather accounting +for behavior that has long existed. The safety rules around using `ref struct` +fully acknowledge and account for the possibility that `ref struct` will +contain `ref` state and that `ref` state will be exposed to consumers. The most +prominent example of this is the indexer on `Span`: + +``` cs +readonly ref struct Span +{ + public ref T this[int index] => ...; +} +``` + +This directly exposes the `ref` state inside `Span` and the span safety +rules account for this. Whether that was implemented as `ByReference` or `ref` +fields is immaterial to those rules. As a part of allowing `ref` fields though +we must define their rules such that they fit into the existing consumption +rules for `ref struct`. Specifically this must account for the fact that it's +legal *today* for a `ref struct` to return it's `ref` state as `ref` to the +consumer. + +To understand the proposed changes it's helpful to first review the existing +rules for method invocation around *ref-safe-to-escape* and how they account for +a `ref struct` exposing `ref` state today: + +> An lvalue resulting from a ref-returning method invocation e1.M(e2, ...) is *ref-safe-to-escape* the smallest of the following scopes: +> 1. The entire enclosing method +> 2. The *ref-safe-to-escape* of all ref and out argument expressions (excluding the receiver) +> 3. For each in parameter of the method, if there is a corresponding expression that is an lvalue, its *ref-safe-to-escape*, otherwise the nearest enclosing scope +> 4. the *safe-to-escape* of all argument expressions (including the receiver) + +The fourth item provides the critical safety point around a `ref struct` +exposing `ref` state to callers. When the `ref` state stored in a `ref struct` +refers to the stack then the *safe-to-escape* scope for that `ref struct` will +be at most the scope which defines the state being referred to. Hence limiting +the *ref-safe-to-escape* of invocations of a `ref struct` to the +*safe-to-escape* scope of the receiver ensures the lifetimes are correct. + +Consider as an example the indexer on `Span` which is returning `ref` fields +by `ref` today. The fourth item here is what provides the safety here: + +```cs +ref int Examples() +{ + Span s1 = stackalloc int[5]; + // ERROR: illegal because the *safe-to-escape* scope of `s1` is the current + // method scope hence that limits the *ref-safe-to-escape" to the current + // method scope as well. + return ref s1[0]; + + // SUCCESS: legal because the *safe-to-escape* scope of `s2` is outside + // the current method scope hence the *ref-safe-to-escape* is as well + Span s2 = default; + return ref s2[0]; +} +``` + +To account for `ref` fields the *ref-safe-to-escape* rules for fields will be +adjusted to the following: + +> An lvalue designating a reference to a field, e.F, is *ref-safe-to-escape* (by reference) as follows: +> - If `F` is a `ref` field and `e` is `this`, it is *ref-safe-to-escape* from the enclosing method. +> - Else if `F` is a `ref` field it's *ref-safe-to-escape* scope is the *safe-to-escape* scope of `e`. +> - Else if `e` is of a reference type, it is *ref-safe-to-escape* from the enclosing method. +> - Else it's *ref-safe-to-escape* is taken from the *ref-safe-to-escape* of `e`. + +This explicitly allows for `ref` fields being returned as `ref` from a +`ref struct` but not normal fields (that will be covered later). + +```cs +ref struct RS +{ + ref int _refField; + int _field; + + // Okay: this falls into bullet one above. + public ref int Prop1 => ref _refField; + + // ERROR: This is bullet four above and the *ref-safe-to-escape* of `this` + // in a `struct` is the current method scope. + public ref int Prop2 => ref _field; + + public RS(int[] array) + { + ref _refField = ref array[0]; + } + + public RS(ref int i) + { + ref _refField = ref i; + } + + public RS CreateRS() => ...; + + public ref int M1(RS rs) + { + ref int local1 = ref rs.Prop1; + + // Okay: this falls into bullet two above and the *safe-to-escape* of + // `rs` is outside the current method scope. Hence the *ref-safe-to-escape* + // of `local1` is outside the current method scope. + return ref local; + + // Okay: this falls into bullet two above and the *safe-to-escape* of + // `rs` is outside the current method scope. Hence the *ref-safe-to-escape* + // of `local1` is outside the current method scope. + // + // In fact in this scenario you can guarantee that the value returned + // from Prop1 must exist on the heap. + RS local2 = CreateRS(); + return ref local2.Prop1; + + // ERROR: the *safe-to-escape* of `local4` here is the current method + // scope by the revised constructor rules. This falls into bullet two + // above and hence based on that allowed scope. + int local3 = 42; + var local4 = new RS(ref local3); + return ref local4.Prop1; + + } +} +``` + +The rules for assignment also need to be adjusted to account for `ref` fields. +This design only allows for `ref` assignment of a `ref` field during object +construction. Specifically in the constructor of the declaring type, inside +`init` accessors and inside object initializer expressions. Further the `ref` +being assigned to the `ref` field must have *ref-safe-to-escape* greater than +the receiver of the field: + +- Constructors: The value must be *ref-safe-to-escape* outside the constructor +- `init` accessors: The value limited to heap values as accessors can't have +`ref` parameters, and for object +- object initializers: The value can have any *ref-safe-to-escape* value as this +will feed into the calculation of the *safe-to-escape* of the constructed +object by existing rules. + +This design does not allow for general `ref` field assignment outside object +construction due to existing limitations on lifetimes. Specifically it poses +challenges for scenarios like the following: + +```cs +ref struct SmallSpan +{ + public ref int _field; + + // Notice once again we're back at the same problem as the original + // CreateSpan method: a method returning a ref struct and taking a ref + // parameter + SmallSpan TrickyRefAssignment(ref int i) + { + // *safe-to-escape* is outside the current method by current rules. + SmallSpan s = default; + + // The *ref-safe-to-escape* of 'i' is the same as the *safe-to-escape* + // of 's' hence most assignment rules would allow it. + ref s._field = ref i; + + // ERROR: this must be disallowed for the exact same reasons we can't + // return a Span wrapping the parameter: the consumption rules + // believe such state smuggling cannot exist + return s; + } + + SmallSpan BadUsage() + { + // Legal today and must remain legal (and safe) + int i = 0; + return TrickyRefAssignment(ref i); + } +} +``` + +There are designs choices we could make to allow more flexible `ref` +re-assignment of fields. For example it could be allowed in cases where we knew +the receiver had a *safe-to-escape* scope that was not outside the current +method scope. Further we could provide syntax for making such downward facing +values easier to declare: essentially values that have *safe-to-escape* scopes +restricted to the current method. Such a design is discussed [here](https://github.com/dotnet/csharplang/discussions/1130)). +However extra complexity of such rules do not seem to be worth the limited cases +this enables. Should compelling samples come up we can revisit this decision. + +This means though that `ref` fields are largely in practice `ref readonly`. The +one exception being inside object initializers. + +A `ref` field will be emitted into metadata using the `ELEMENT_TYPE_BYREF` +signature. This is no different than how we emit `ref` locals or `ref` +arguments. For example `ref int _field` will be emitted as +`ELEMENT_TYPE_BYREF ELEMENT_TYPE_I4`. This will require us to update ECMA335 +to allow this entry but this should be rather straight forward. + +Developers can continue to initialize a `ref struct` with a `ref` field using +the `default` expression in which case all declared `ref` fields will have the +value `null`. Any attempt to use such fields will result in a +`NullReferenceException` being thrown. + +```cs +struct S1 +{ + public ref int Value; +} + +S1 local = default; +local.Value.ToString(); // throws NullReferenceException +``` + +While the C# language pretends that a `ref` cannot be `null` this is legal at the +runtime level and has well defined semantics. Developers who introduce `ref` +fields into their types need to be aware of this possibility and should be +**strongly** discouraged from leaking this detail into consuming code. Instead +`ref` fields should be validated as non-null using the [runtime helpers](https://github.com/dotnet/runtime/pull/40008) +and throwing when an uninitialized `struct` is used incorrectly. + +```cs +struct S1 +{ + private ref int Value; + + public int GetValue() + { + if (System.Runtime.CompilerServices.Unsafe.IsNullRef(ref Value)) + { + throw new InvalidOperationException(...); + } + + return Value; + } +} +``` + +Misc Notes: +- A `ref` field can only be declared inside of a `ref struct` +- A `ref` field cannot be declared `static` +- A `ref` field can only be `ref` assigned in the constructor of the declaring +type. +- The reference assembly generation process must preserve the presence of a +`ref` field inside a `ref struct` +- A `ref readonly struct` must declare it's `ref` fields as `ref readonly` +- The span safety rules for constructors, fields and assignment must be updated +as outlined in this document. + +### Provide struct this escape annotation +The rules for the scope of `this` in a `struct` limit the *ref-safe-to-escape* +scope to the current method. That means neither `this`, nor any of its fields +can return by reference to the caller. + +```cs +struct S +{ + int _field; + // Error: this, and hence _field, can't return by ref + public ref int Prop => ref _field; +} +``` + +There is nothing inherently wrong with a `struct` escaping `this` by reference. +Instead the justification for this rule is that it strikes a balance between the +usability of `struct` and `interfaces`. If a `struct` could escape `this` by +reference then it would significantly reduce the use of `ref` returns in +interfaces. + +```cs +interface I1 +{ + ref int Prop { get; } +} + +struct S1 : I1 +{ + int _field; + public ref int Prop => _ref field; + + // When T is a struct type, like S1 this would end up returning a reference + // to the parameter + static ref int M(T p) where T : I1 => ref p.Prop; +} +``` + +The justification here is reasonable but it also introduces unnecessary +friction on `struct` members that don't participate in interface invocations. + +One key compatibility scenario that we have to keep in mind when approaching +changes here is the following: + +```cs +struct S1 +{ + ref int GetValue() => ... +} + +class Example +{ + ref int M() + { + // Okay: this is always allowed no matter how `local` is initialized + S1 local = default; + return local.GetValue(); + } +} +``` + +This works because the safety rules for `ref` return today do not take into +account the lifetime of `this` (because it can't return a `ref` to internal +state). This means that `ref` returns from a `struct` can return outside the +enclosing method scope except in cases where there are `ref` parameters or a +`ref struct` which is not *safe-to-escape* outside the enclosing method scope. +Hence the solution here is not as easy as allowing `ref` return of fields in +non-interface methods. + +To remove this friction the language will provide the attribute `[RefEscapes]`. +When this attribute is applied to an instance method, instance property or +instance accessor of a `struct` or `ref struct` then the `this` parameter will +be considered *ref-safe-to-escape* outside the enclosing method. + +This allows for greater flexibility in `struct` definitions as they can begin +returning `ref` to their fields. That allows for types like `FrugalList`: + +```cs +struct FrugalList +{ + private T _item0; + private T _item1; + private T _item2; + + public int Count = 3; + + public ref T this[int index] + { + [RefEscapes] + get + { + switch (index) + { + case 0: return ref _item1; + case 1: return ref _item2; + case 1: return ref _item3; + default: throw null; + } + } + } +} +``` + +This will naturally, by the existing rules in the span safety spec, allow +for returning transitive fields in addition to direct fields. + +```cs +struct ListWithDefault +{ + private FrugalList _list; + private T _default; + + public ref T this[int index] + { + [RefEscapes] + get + { + if (index >= _list.Count) + { + return ref _default; + } + + return ref _list[index]; + } + } +} +``` + +Members which contain the `[RefEscapes]` attribute cannot be used to implement +interface members. This would hide the lifetime nature of the member at +the `interface` call site and would lead to incorrect lifetime calculations. + +To account for this change the "Parameters" section of the span safety document +will be updated to include the following: + +- If the parameter is the `this` parameter of a `struct` type, it is +*ref-safe-to-escape* to the top scope of the enclosing method unless the +method is annotated with `[RefEscapes]` in which case it is *ref-safe-to-escape* +outside the enclosing method. + +Misc Notes: +- A member marked as `[RefEscapes]` can not implement an `interface` method. +- The `RefEscapesAttribute` will be defined in the +`System.Runtime.CompilerServices` namespace. + +### Safe fixed size buffers +The language will relax the restrictions on fixed sized arrays such that the +can be declared in safe code and the element type can be managed or unmanaged. +This will make types like the following legal: + +```cs +internal struct CharBuffer +{ + internal fixed char Data[128]; +} +``` + +These declarations, much like their `unsafe` counter parts, will define a +sequence of `N` elements in the containing type. These members can be accessed +with an indexer and can also be converted to `Span` and `ReadOnlySpan` +instances. + +When indexing into a `fixed` buffer of type `T` the `readonly` state of the +container must be taken into account. If the container is `readonly` then the +indexer returns `ref readonly T` else it returns `ref T`. + +Accessing a `fixed` buffer without an indexer has no natural type however it is +convertible to `Span` types. In the case the container is `readonly` the +buffer is implicitly convertible to `ReadOnlySpan`, else it can implicitly +convert to `Span` or `ReadOnlySpan` (the `Span` conversion is +considered *better*). + +The resulting `Span` instance will have a length equal to the size declared +on the `fixed` buffer. The *safe-to-escape* scope of the returned value will +be equal to the *safe-to-escape* scope of the container. + +For each `fixed` declaration in a type where the element type is `T` the +language will generate a corresponding `get` only indexer method whose return +type is `ref T`. The indexer will be annotated with the `[RefEscapes]` attribute +as the implementation will be returning fields of the declaring type. The +accessibility of the member will match the accessibility on the `fixed` field. + +For example, the signature of the indexer for `CharBuffer.Data` will be the +following: + +```cs +[RefEscapes] +internal ref char <>DataIndexer(int index) => ...; +``` + +If the provided index is outside the declared bounds of the `fixed` array then +an `IndexOutOfRangeException` will be thrown. In the case a constant value is +provided then it will be replaced with a direct reference to the appropriate +element. Unless the constant is outside the declared bounds in which case a +compile time error would occur. + +The backing storage for the buffer will be generated using the +`[InlineArray]` attribute. This is a mechanism discussed in [isuse 12320](https://github.com/dotnet/runtime/issues/12320) +which allows specifically for the case of efficiently declaring sequence of +fields of the same type. + +This particular issue is still under active discussion and the expectation is +that the implementation of this feature will follow however that discussion +goes. + +### Provide parameter escape annotations +**THIS SECTION STILL IN DEVELOPMENT** +One of the rules that causes repeated friction in low level code is the +"Method Arguments must Match" rule. That rule states that in the case a a method +call has at least one `ref struct` passed by `ref / out` then none of the +other parameters can have a *safe-to-escape* value narrower than that parameter. +By extension if there are two such parameters then the *safe-to-escape* of +all parameters must be equal. + +This rule exists to prevent scenarios like the following: + +```cs +struct RS +{ + Span _field; + void Set(Span p) + { + _field = p; + } + + static void DangerousCode(ref RS p) + { + Span span = stackalloc int[] { 42 }; + + // Error: if allowed this would let the method return a pointer to + // the stack + p.Set(span); + } +} +``` + +This rule exists because the language must assume that these values can escape +to their maximum allowed lifetime. In many cases though the method +implementations do not escape these values. Hence the friction caused here is +unnecessary. + +To remove this friction the language will provide the attribute +`[DoesNotEscape]`. When applied to parameters the *safe-to-escape* scope of +the parameter will be considered the top scope of the declaring method. +It cannot return outside of it. Likewise the attribute can be applied to +instance members, instance properties or instance accessors and it will have +the same effect on the `this` parameter. + +To account for this change the "Parameters" section of the span safety document +will be updated to include the following: + +- If the parameter is marked with `[DoesNotEscape]`it is *safe-to-escape* to the +top scope of the containing method. Because this value cannot escape from the +method it is not considered a part of the general *safe-to-escape* input set +when calculating returns of this method. + +**THAT RULE ABOVE NEEDS WORK** + +```cs +struct RS +{ + Span _field; + void Set([DoesNotEscape] Span p) + { + // Error: the *safe-to-escape* of p is the top scope of the method while + // the *safe-to-escape* of 'this' is outside the method. Hence this is + // illegal by the standard assignment rules + _field = p; + } + + static RS M(ref RS rs1, [DoesNotEscape]RS rs2) + { + Span span = stackalloc int[] { 42 }; + + // Okay: The parameter here is not a part of the calculated "must match" + // set because it can't be returned hence this is legal. + p.Set(span); + + // Error: the *safe-to-escape* scope of 'rs2' is the top scope of this + // method + return rs2; + } +} +``` + +Misc Notes: +- The `DoesNotEscapeAttribute` will be defined in the +`System.Runtime.CompilerServices` namespace. + +## Considerations + +### Keywords vs. attributes +This design calls for using attributes to annotate the new lifetime rules for +`struct` members. This also could've been done just as easily with +contextual keywords. For instance: `scoped` and `escapes` could have been +used instead of `DoesNotEscape` and `RefEscapes`. + +Keywords, even the contextual ones, have a much heavier weight in the language +than attributes. The use cases these features solve, while very valuable, +impact a small number of developers. Consider that only a fraction of +high end developers are defining `ref struct` instances and then consider that +only a fraction of those developers will be using these new lifetime features. +That doesn't seem to justify adding a new contextual keyword to the language. + +This does mean that program correctness will be defined in terms of attributes +though. That is a bit of a gray area for the language side of things but an +established pattern for the runtime. + +## Open Issues + +## Future Considerations + +## Related Information + +### Issues +The following issues are all related to this proposal: + +- https://github.com/dotnet/csharplang/issues/1130 +- https://github.com/dotnet/csharplang/issues/1147 +- https://github.com/dotnet/csharplang/issues/992 +- https://github.com/dotnet/csharplang/issues/1314 +- https://github.com/dotnet/csharplang/issues/2208 +- https://github.com/dotnet/runtime/issues/32060 + +### Proposals +The following proposals are related to this proposal: + +- https://github.com/dotnet/csharplang/blob/725763343ad44a9251b03814e6897d87fe553769/proposals/fixed-sized-buffers.md + +### Fun Samples + +```cs +ref struct StackLinkedListNode +{ + T _value; + ref StackLinkedListNode _next; + + public T Value => _value; + + public bool HasNext => !Unsafe.IsNullRef(ref _next); + + public ref StackLinkedListNode Next + { + get + { + if (!HasNext) + { + throw new InvalidOperationException("No next node"); + } + + return ref _next; + } + } + + public StackLinkedListNode(T value) + { + this = default; + _value = value; + } + + public StackLinkedListNode(T value, ref StackLinkedListNode next) + { + _value = value; + ref _next = ref next; + } +} +``` From b83c02c2494938db904bff55f0cc3ed2c12d345f Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Thu, 24 Sep 2020 09:21:52 -0700 Subject: [PATCH 2/9] Update proposals/low-level-struct-improvements.md Co-authored-by: Ruikuan --- proposals/low-level-struct-improvements.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index c368254aee..062f255cf0 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -698,7 +698,7 @@ struct FrugalList { case 0: return ref _item1; case 1: return ref _item2; - case 1: return ref _item3; + case 2: return ref _item3; default: throw null; } } From 8fff276cc8323b46f99b5108bd6a933e32e60adc Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Thu, 24 Sep 2020 09:22:40 -0700 Subject: [PATCH 3/9] Apply suggestions from code review Co-authored-by: Ruikuan --- proposals/low-level-struct-improvements.md | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index 062f255cf0..fcf15cf366 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -749,7 +749,7 @@ Misc Notes: `System.Runtime.CompilerServices` namespace. ### Safe fixed size buffers -The language will relax the restrictions on fixed sized arrays such that the +The language will relax the restrictions on fixed sized arrays such that they can be declared in safe code and the element type can be managed or unmanaged. This will make types like the following legal: @@ -879,7 +879,7 @@ struct RS // Okay: The parameter here is not a part of the calculated "must match" // set because it can't be returned hence this is legal. - p.Set(span); + rs2.Set(span); // Error: the *safe-to-escape* scope of 'rs2' is the top scope of this // method From 610ce4b4b4bcfe278358ee9ace4a49fe92f9359a Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Thu, 24 Sep 2020 10:20:35 -0700 Subject: [PATCH 4/9] Apply suggestions from code review Co-authored-by: Joseph Musser --- proposals/low-level-struct-improvements.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index fcf15cf366..b7e8eae785 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -201,7 +201,7 @@ contains a `ref` field as follows: then the *safe-to-escape* of the return will be the current scope - Else the *safe-to-escape* will be the outside the method scope -Lets examine these rules in the context of samples to better understand their +Let's examine these rules in the context of samples to better understand their impact. ```cs @@ -381,7 +381,7 @@ rules account for this. Whether that was implemented as `ByReference` or `ref fields is immaterial to those rules. As a part of allowing `ref` fields though we must define their rules such that they fit into the existing consumption rules for `ref struct`. Specifically this must account for the fact that it's -legal *today* for a `ref struct` to return it's `ref` state as `ref` to the +legal *today* for a `ref struct` to return its `ref` state as `ref` to the consumer. To understand the proposed changes it's helpful to first review the existing @@ -425,9 +425,9 @@ adjusted to the following: > An lvalue designating a reference to a field, e.F, is *ref-safe-to-escape* (by reference) as follows: > - If `F` is a `ref` field and `e` is `this`, it is *ref-safe-to-escape* from the enclosing method. -> - Else if `F` is a `ref` field it's *ref-safe-to-escape* scope is the *safe-to-escape* scope of `e`. +> - Else if `F` is a `ref` field its *ref-safe-to-escape* scope is the *safe-to-escape* scope of `e`. > - Else if `e` is of a reference type, it is *ref-safe-to-escape* from the enclosing method. -> - Else it's *ref-safe-to-escape* is taken from the *ref-safe-to-escape* of `e`. +> - Else its *ref-safe-to-escape* is taken from the *ref-safe-to-escape* of `e`. This explicitly allows for `ref` fields being returned as `ref` from a `ref struct` but not normal fields (that will be covered later). @@ -600,7 +600,7 @@ Misc Notes: type. - The reference assembly generation process must preserve the presence of a `ref` field inside a `ref struct` -- A `ref readonly struct` must declare it's `ref` fields as `ref readonly` +- A `ref readonly struct` must declare its `ref` fields as `ref readonly` - The span safety rules for constructors, fields and assignment must be updated as outlined in this document. From 7af7ca3dc753b45bbbe4ec7a13cf2457394fff4b Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Mon, 28 Sep 2020 06:40:40 -0700 Subject: [PATCH 5/9] Rename to ThisRefEscapes The name `[RefEscapes]` was causing a lot of confusion in the discussions. People were mis-interpretting it as a general feature that could apply to other constructs when in reality it's a very specific and targetted feature. Renaming to `[ThisRefEscapes]` to make it clear this is a very targetted feature. --- proposals/low-level-struct-improvements.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index b7e8eae785..03222d889e 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -672,7 +672,7 @@ enclosing method scope except in cases where there are `ref` parameters or a Hence the solution here is not as easy as allowing `ref` return of fields in non-interface methods. -To remove this friction the language will provide the attribute `[RefEscapes]`. +To remove this friction the language will provide the attribute `[ThisRefEscapes]`. When this attribute is applied to an instance method, instance property or instance accessor of a `struct` or `ref struct` then the `this` parameter will be considered *ref-safe-to-escape* outside the enclosing method. @@ -691,7 +691,7 @@ struct FrugalList public ref T this[int index] { - [RefEscapes] + [ThisRefEscapes] get { switch (index) @@ -717,7 +717,7 @@ struct ListWithDefault public ref T this[int index] { - [RefEscapes] + [ThisRefEscapes] get { if (index >= _list.Count) @@ -731,7 +731,7 @@ struct ListWithDefault } ``` -Members which contain the `[RefEscapes]` attribute cannot be used to implement +Members which contain the `[ThisRefEscapes]` attribute cannot be used to implement interface members. This would hide the lifetime nature of the member at the `interface` call site and would lead to incorrect lifetime calculations. @@ -740,11 +740,11 @@ will be updated to include the following: - If the parameter is the `this` parameter of a `struct` type, it is *ref-safe-to-escape* to the top scope of the enclosing method unless the -method is annotated with `[RefEscapes]` in which case it is *ref-safe-to-escape* +method is annotated with `[ThisRefEscapes]` in which case it is *ref-safe-to-escape* outside the enclosing method. Misc Notes: -- A member marked as `[RefEscapes]` can not implement an `interface` method. +- A member marked as `[ThisRefEscapes]` can not implement an `interface` method. - The `RefEscapesAttribute` will be defined in the `System.Runtime.CompilerServices` namespace. @@ -781,7 +781,7 @@ be equal to the *safe-to-escape* scope of the container. For each `fixed` declaration in a type where the element type is `T` the language will generate a corresponding `get` only indexer method whose return -type is `ref T`. The indexer will be annotated with the `[RefEscapes]` attribute +type is `ref T`. The indexer will be annotated with the `[ThisRefEscapes]` attribute as the implementation will be returning fields of the declaring type. The accessibility of the member will match the accessibility on the `fixed` field. @@ -789,7 +789,7 @@ For example, the signature of the indexer for `CharBuffer.Data` will be the following: ```cs -[RefEscapes] +[ThisRefEscapes] internal ref char <>DataIndexer(int index) => ...; ``` @@ -898,7 +898,7 @@ Misc Notes: This design calls for using attributes to annotate the new lifetime rules for `struct` members. This also could've been done just as easily with contextual keywords. For instance: `scoped` and `escapes` could have been -used instead of `DoesNotEscape` and `RefEscapes`. +used instead of `DoesNotEscape` and `ThisRefEscapes`. Keywords, even the contextual ones, have a much heavier weight in the language than attributes. The use cases these features solve, while very valuable, From fbdda3c0c877239cd6def07ecc17d097d43d05a0 Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Mon, 28 Sep 2020 06:59:34 -0700 Subject: [PATCH 6/9] Make ref field assignment more flexible Based on feedback from several people I've decided that we need to make the rules for `ref` fields more flexible when the values being passed around are known to refer to the heap. In those cases there is simply no reason to restrict the created `ref struct` as it is always *ref-safe-to-escape* to the enclosing method as well as not requiring any adjustment to the method invocation rules. This does mean that we need to add the notion of "refers to the heap" to the span safety document. That doesn't change any assumptions there. It's just that before this change the notion of "refers to the heap" wasn't necessary to describe our safety rules. --- proposals/low-level-struct-improvements.md | 42 +++++++++++++++++----- 1 file changed, 34 insertions(+), 8 deletions(-) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index 03222d889e..2cad30cef5 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -197,8 +197,11 @@ is a key pivot point in the design of `ref` fields. To do this we will change the escape rules for a constructor invocation, which today are the same as method invocation, on a `ref struct` that **directly** contains a `ref` field as follows: -- If the constructor contains any `ref struct`, `ref`, `out` or `in` parameters -then the *safe-to-escape* of the return will be the current scope +- If the constructor contains any `ref`, `out` or `in` parameters, and the +arguments do not all refer to the heap, then the *safe-to-escape* of the return +will be the current scope +- Else if the constructor contains any `ref struct` parameters then the +*safe-to-escape* of the return will be the current scope - Else the *safe-to-escape* will be the outside the method scope Let's examine these rules in the context of samples to better understand their @@ -488,18 +491,25 @@ ref struct RS The rules for assignment also need to be adjusted to account for `ref` fields. This design only allows for `ref` assignment of a `ref` field during object -construction. Specifically in the constructor of the declaring type, inside +construction or when the value is known to refer to the heap. Object +construction includes in the constructor of the declaring type, inside `init` accessors and inside object initializer expressions. Further the `ref` -being assigned to the `ref` field must have *ref-safe-to-escape* greater than -the receiver of the field: +being assigned to the `ref` field in this case must have *ref-safe-to-escape* +greater than the receiver of the field: - Constructors: The value must be *ref-safe-to-escape* outside the constructor -- `init` accessors: The value limited to heap values as accessors can't have -`ref` parameters, and for object +- `init` accessors: The value limited to values that are known to refer to the +heap as accessors can't have `ref` parameters - object initializers: The value can have any *ref-safe-to-escape* value as this will feed into the calculation of the *safe-to-escape* of the constructed object by existing rules. +A `ref` field can only be assigned outside a constructor when the value is known +to refer to the heap. That is allowed because it is both safe at the assignment +location (meets the field assignment rules for ensuring the value being +assigned has a lifetime at least as large as the receiver) as well as requires +no updates to the existing method invocation rules. + This design does not allow for general `ref` field assignment outside object construction due to existing limitations on lifetimes. Specifically it poses challenges for scenarios like the following: @@ -527,6 +537,19 @@ ref struct SmallSpan return s; } + SmallSpan SafeRefAssignment() + { + int[] array = new int[] { 42, 13 }; + SmallSpan s = default; + + // Okay: the value being assigned here is known to refer to the heap + // hence it is allowed by our rules above because it requires no changes + // to existing method invocation rules (hence preserves compat) + ref s._field = ref array[i]; + + return s; + } + SmallSpan BadUsage() { // Legal today and must remain legal (and safe) @@ -546,7 +569,8 @@ However extra complexity of such rules do not seem to be worth the limited cases this enables. Should compelling samples come up we can revisit this decision. This means though that `ref` fields are largely in practice `ref readonly`. The -one exception being inside object initializers. +main exceptions being object initializers and when the value is known to refer +to the heap. A `ref` field will be emitted into metadata using the `ELEMENT_TYPE_BYREF` signature. This is no different than how we emit `ref` locals or `ref` @@ -603,6 +627,8 @@ type. - A `ref readonly struct` must declare its `ref` fields as `ref readonly` - The span safety rules for constructors, fields and assignment must be updated as outlined in this document. +- The span safety rules need to include the definition of `ref` values that +"refer to the heap". ### Provide struct this escape annotation The rules for the scope of `this` in a `struct` limit the *ref-safe-to-escape* From a34b6d76ab2a2bee38db86311730f1f3967aae85 Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Mon, 28 Sep 2020 07:14:51 -0700 Subject: [PATCH 7/9] Fixed buffer updates --- proposals/low-level-struct-improvements.md | 49 ++++++++++++++++++++++ 1 file changed, 49 insertions(+) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index 2cad30cef5..3a0c4e231b 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -825,6 +825,21 @@ provided then it will be replaced with a direct reference to the appropriate element. Unless the constant is outside the declared bounds in which case a compile time error would occur. +There will also be a named accessor generated for each `fixed` buffer that +provides by value `get` and `set` operations. Having this means that `fixed` +buffers will more closely resemble existing array semantics by having a `ref` +accessor as well as byval `get` and `set` operations. This means compilers will +have the same flexibility when emitting code consuming `fixed` buffers as they +do when consuming arrays. This should be operations like `await` over `fixed` +buffers easier to emit. + +This also has the added benefit that it will make `fixed` buffers easier to +consume from other languages. Named indexers is a feature that has existed since +the 1.0 release of .NET. Even languages which cannot directly emit a named +indexer can generally consume them (C# is actually a good example of this). + +There will also be a by value `get` and `set` accessor generated for every + The backing storage for the buffer will be generated using the `[InlineArray]` attribute. This is a mechanism discussed in [isuse 12320](https://github.com/dotnet/runtime/issues/12320) which allows specifically for the case of efficiently declaring sequence of @@ -939,6 +954,40 @@ established pattern for the runtime. ## Open Issues +### Allow fixed buffer locals +This design allows for safe `fixed` buffers that can support any type. One +possible extension here is allowing such `fixed` buffers to be declared as +local variables. This would allow a number of existing `stackalloc` operations +to be replaced with a `fixed` buffer. It would also expand the set of scenarios +we could have stack style allocations as `stackalloc` is limited to unmanaged +element types while `fixed` buffers are not. + +```cs +class FixedBufferLocals +{ + void Example() + { + Span span = stakalloc int[42]; + int buffer[42]; + } +} +``` + +This holds together but does require us to extend the syntax for locals a bit. +Unclear if this is or isn't worth the extra complexity. Possible we could decide +no for now and bring back later if sufficient need is demonstrated. + +### Allow multi-dimensional fixed buffers +Should the design for `fixed` buffers be extended to include multi-dimensional +style arrays? Essentially allowing for declarations like the following: + +```cs +struct Dimensions +{ + int array[42, 13]; +} +``` + ## Future Considerations ## Related Information From da1b2a4b55aaee1b73d184a710f7c7218e17b337 Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Mon, 28 Sep 2020 07:19:46 -0700 Subject: [PATCH 8/9] modreq clarification --- proposals/low-level-struct-improvements.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index 3a0c4e231b..da22d5a535 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -770,7 +770,10 @@ method is annotated with `[ThisRefEscapes]` in which case it is *ref-safe-to-esc outside the enclosing method. Misc Notes: -- A member marked as `[ThisRefEscapes]` can not implement an `interface` method. +- A member marked as `[ThisRefEscapes]` can not implement an `interface` method +or be `overrides` +- A member marked as `[ThisRefEscapes]` will be emitted with a `modreq` on that +attribute. - The `RefEscapesAttribute` will be defined in the `System.Runtime.CompilerServices` namespace. From 67e7e40db4878736740681f3b83245970fb83c0d Mon Sep 17 00:00:00 2001 From: Jared Parsons Date: Mon, 28 Sep 2020 07:23:44 -0700 Subject: [PATCH 9/9] Small update --- proposals/low-level-struct-improvements.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/proposals/low-level-struct-improvements.md b/proposals/low-level-struct-improvements.md index da22d5a535..5426fde8b1 100644 --- a/proposals/low-level-struct-improvements.md +++ b/proposals/low-level-struct-improvements.md @@ -980,6 +980,8 @@ This holds together but does require us to extend the syntax for locals a bit. Unclear if this is or isn't worth the extra complexity. Possible we could decide no for now and bring back later if sufficient need is demonstrated. +Example of where this would be beneficial: https://github.com/dotnet/runtime/pull/34149 + ### Allow multi-dimensional fixed buffers Should the design for `fixed` buffers be extended to include multi-dimensional style arrays? Essentially allowing for declarations like the following: