Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Low level struct improvements #3936

Merged
merged 9 commits into from
Sep 29, 2020
Merged

Conversation

jaredpar
Copy link
Member

This design covers adding the following features to C# for low level
struct performance improvements:

  • Allowing ref struct to contain ref fields.
  • Allowing struct and ref struct to return ref to their fields.
  • Allowing safe fixed buffers for unmanaged and managed types.

Note: the section on providing parameter escape notations is incomplete
at this time. I will send a PR in the future once I finalize this
portion. Feel free to still leave comments though.

This design covers adding the following features to C# for low level
struct performance improvements:

- Allowing `ref struct` to contain `ref` fields.
- Allowing `struct` and `ref struct` to return `ref` to their fields.
- Allowing safe fixed buffers for unmanaged and managed types.

Note: the section on providing parameter escape notations is incomplete
at this time. I will send a PR in the future once I finalize this
portion. Feel free to still leave comments though.
@VSadov
Copy link
Member

VSadov commented Sep 24, 2020

I like this!!! 👍

@JamesNK
Copy link
Member

JamesNK commented Sep 24, 2020

I think I have a situation where this would be useful. Would it solve this highlighted problem? - https://github.com/protocolbuffers/protobuf/blob/a690227398d307419d45b1f352494a176873897b/csharp/src/Google.Protobuf/ParseContext.cs#L77-L80

One issue is the field is being updated outside of the constructor/initialization. From the spec:

There are designs choices we could make to allow more flexible ref re-assignment of fields. For example it could be allowed in cases where we knew the receiver had a safe-to-escape scope that was not outside the current method scope. Further we could provide syntax for making such downward facing values easier to declare: essentially values that have safe-to-escape scopes restricted to the current method. Such a design is discussed here). However extra complexity of such rules do not seem to be worth the limited cases this enables. Should compelling samples come up we can revisit this decision.

This means though that ref fields are largely in practice ref readonly. The one exception being inside object initializers.


- If the parameter is the `this` parameter of a `struct` type, it is
*ref-safe-to-escape* to the top scope of the enclosing method unless the
method is annotated with `[RefEscapes]` in which case it is *ref-safe-to-escape*
Copy link
Member

@VSadov VSadov Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the ref-escape of [RefEscapes] members will have to match ref-escape of the container.
If we are permitting returning refs to fields, we must ensure the containing variable is not dead by the time the ref is used.

I.E accessing such ref-returning members would work as if taking a ref to a field or the variable itself.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree but I believe that falls out from the proposed rules. Essentially this is meant to change the ref-safe-to-escape scope for this when the [RefEscapes] attribute is applied. that will require an update to the Parameters section of the span safety rules. That should then feed into the existing Field rules for determining appropriate scopes.

@YairHalberstadt
Copy link
Contributor

YairHalberstadt commented Sep 24, 2020

To me the rules around how I can or can't use a ref field in a ref struct are extremely detailed and arcane. I can't imagine remembering them all, and I like remembering language details :-) I'm worried that for any particular use case that doesn't work straight out the box developers are going to waste hours fumbling around trying to find a way around it.

I think either of the following would be much more palatable:

  1. Take the breaking change. If a method accepts a ref, and returns a ref struct or vice versa, the returned scope is equivalent to the passed in scope.

  2. Any method which reads or writes a ref field in a struct must be marked with a modreq. Any method which uses those must be marked with a modreq. For methods marked with this modreq, if a method accepts a ref, and returns a ref struct or vice versa, the returned scope is equivalent to the passed in scope. It is a language decision whether or not you need to explicitly mark this via an attribute since it is a breaking change, but at least the error message is dead simple - add this attribute.

outside the enclosing method.

Misc Notes:
- A member marked as `[RefEscapes]` can not implement an `interface` method.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this true even if the interface is annotated as RefEscapes?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intent for [RefEscapes] when writing this proposal was that should apply to struct only. Having it work on struct and interface is fine though; it's straight forward to add consistent rules here. I didn't add interface originally because the scenario seemed pretty limited. Then again I have an other proposal in the works to expand the use of ref struct into generics so it's possible this will be more common.

Think I'll change this so that it can apply to struct or interface and see how that conversation goes.

```

Members which contain the `[RefEscapes]` attribute cannot be used to implement
interface members. This would hide the lifetime nature of the member at
Copy link
Member

@davkean davkean Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And overrides/virtuals?

Copy link
Contributor

@Joe4evr Joe4evr Sep 24, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, the only overrides that can exist in structs are GetHashCode() and Equals(object).....

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the exact structure of object today I don't think it's a problem to have [RefEscapes] on override. There is no method on object that has a ref return hence it can't really feed into this problem.

Generally though the compiler does not make assumptions about object (or most other types). Hence we do need to consider a hypothetical corelib where a ref returning method is added to object. Lets say the following:

class Object 
{
  public virtual ref int GetIt() => ...;
}

Even in this case I believe it would be safe for a struct to override this method and return ref to its internal state. That is because there are only two ways in which this method can be invoked:

  1. Through a reference to the struct. In that case the rules outlined here would come into effect and everything would be safe.
  2. Through a reference to object. At that point the struct is no longer on the stack because it's a boxed object. Hence the compiler interpretation of the ref as being safe-to-escape outside the enclosing method scope is correct because it's returning a ref to the boxed object, not a stack value.

Even after writing that justification out I'm unsure if it should still be allowed. It feels cute and doesn't really open up any scenarios. Going to think on this a bit and see how I feel later in the day.

@YairHalberstadt
Copy link
Contributor

After discussing with @tannergooding on discord, I think what's more important to me from a usability perspective than the actual rules, are the diagnostics which going along with them.

For example, let's say I write:

  public Span<int> M(ref int x) {
        return new Span<int>(ref x);
    }

I think it's important that the error message is really simple and points me to the correct solution:

parameter 'ref int x' must be marked with [RefEscapes]

Instead of:

Span<int>(ref x) cannot be returned from M since x has a scope greater than the current method.

Similiarly if I then use M:

public Span<int> UseM(ref int x)
{
    return M(ref x);
}

The error message should be:

Parameter 'ref int x' must be marked with [RefEscapes]

ensuring compatibility with `ref struct` returning methods like `CreateSpan<T>`
is a key pivot point in the design of `ref` fields.

To do this we will change the escape rules for a constructor invocation, which
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not instead of changing the rules for ref struct constructors (which is a breaking change), mark them with [RefEscapes] explicitly, and consider this to be the return type of a constructor?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These rules only apply to constructors on ref struct which directly contain ref fields. Given that ref fields don't exist today this can't be a breaking change because there are no constructors that they presently apply to.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except for Span<T>, right?

Copy link
Contributor

@Ruikuan Ruikuan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fix some typos.

proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
jaredpar and others added 2 commits September 24, 2020 09:21
Co-authored-by: Ruikuan <liruikuan@outlook.com>
Co-authored-by: Ruikuan <liruikuan@outlook.com>
@jaredpar
Copy link
Member Author

Thanks @Ruikuan !

Copy link
Contributor

@jnm2 jnm2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay for English

proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
proposals/low-level-struct-improvements.md Outdated Show resolved Hide resolved
@jaredpar
Copy link
Member Author

@JamesNK

I think I have a situation where this would be useful. Would it solve this highlighted problem? -

Not as presently written. The rules are pretty restrictive, they assert that any use of a constructor that has a ref parameter on a ref struct with a ref field is always limited to the current method scope. Hence in this case you wouldn't be able to assign that into ctx here.

In this case though the invocation is safe. It's passing a ref to a known heap location hence there is no reason to restrict the return. An earlier version of the doc had this exception but I pulled it out because I thought it might be too much of a special case. This is the second time though this has come up so I'm leaning towards finding a way to add it back.

@jaredpar
Copy link
Member Author

@YairHalberstadt

I think what's more important to me from a usability perspective than the actual rules, are the diagnostics which going along with them.

Agree. Good diagnostics will be important.

@tannergooding
Copy link
Member

tannergooding commented Sep 24, 2020

Just wanted to confirm something that was discussed in Discord (as its my understanding this is the case).

RefEscapes would allow the following code to work (it is illegal today):

public Span<int> M([RefEscapes]ref int x) {
    return new Span<int>(ref x);
}

While DoesNotEscape would allow the following, currently also illegal:

public ref int M([DoesNotEscape] Span<int> x, int[] y)
{
    return ref y[0];
}

public ref int X(int[] y)
{
    return ref M(stackalloc int[5], y);
}

In the case of the latter, DoesNotEscape would do basic analysis to validate it doesn't escape (I'm not sure if any validation could be done or should be done for RefEscapes).

Would these be just attributes, modreq, modopt?
Would these attributes only be valid on in, out, ref and ref struct?

Co-authored-by: Joseph Musser <me@jnm2.com>
@jaredpar
Copy link
Member Author

Thanks @jnm2. With only three typo suggestions before you, I was beginning to worry that maybe I'd gotten better at English. 😄

@jaredpar
Copy link
Member Author

@tannergooding

RefEscapes would allow the following code to work (it is illegal today):

No it would not. The [RefEscapes] attribute can only be applied to instance members of struct (explicitly called out in the proposal). It cannot be applied to individual parameters of members. Further the [RefEscapes] attribute only impacts the ref-safe-to-escape scope of this. That in turn causes a tiny tweak to the ref-safe-to-escape rules for method invocations returning a ref value. What your discussing requires changes to the safe-to-escape return of all methods.

Essentially the behavior you are looking for here is changing how safe-to-escape is interpreted for method calls. That is a very different feature than [RefEscapes]. Sure it's possible to add such annotations to the language but my intuition is that it's not really worth it.

One thing to remember is that the rules today around how this return pattern is calculated came in large part to our own usability feedback implementing Span<T>. If we were starting from scratch today I think that we would very likely end up with the exact same rules because it optimizes for the majority use case.

While DoesNotEscape would allow the following, currently also illegal:

Correct. This is the exact intended purpose of DoesNotEscape.

If you have real world examples of this pattern I would be eager to see them. It will help ensure the feature is solving the intended problem as well as making me feel more confident that it's worth adding to the language.

Would these be just attributes, modreq, modopt?

modreq

Would these attributes only be valid on in, out, ref and ref struct?

Applicability:

  • RefEscapes applicable to struct instance members only
  • DoesNotEscape likely applicable to parameters and instance members. This section is still a work in progress so less certainty in all the places it will apply

@jnm2
Copy link
Contributor

jnm2 commented Sep 24, 2020

@jaredpar I think the usability of the English language is highly questionable :D

@tannergooding
Copy link
Member

The [RefEscapes] attribute can only be applied to instance members of struct (explicitly called out in the proposal). It cannot be applied to individual parameters of members.

Ah, I see. I had seen it applied to the getter and misinterpreted that.

RefEscapes applicable to struct instance members only

So this is Methods that have a return, or fields and such as well? And it basically just indicates that the ref return is going to be this or a subset of this?

Essentially the behavior you are looking for here is changing how safe-to-escape is interpreted for method calls. That is a very different feature than [RefEscapes].

Hmmm, I had interpreted it as changing how safe-to-escape is interpreted because that seems like a useful feature.
Being able to annotate that a ref is this allows ref returns to non ref-fields. However, being able to annotate that a ref escapes allows things like MemoryMarshal.CreateSpan to exist and to exist "safely" without leaving a hole around something that can't otherwise be expressed.

@jaredpar
Copy link
Member Author

@tannergooding

So this is Methods that have a return, or fields and such as well? And it basically just indicates that the ref return is going to be this or a subset of this?

It indicates that the ref return is potentially returning state inside of this. That effectively makes the following edit to method invocation rules for calculating ref-safe-to-escape when the target method is annotated with [RefEscapes]

  • the ref-safe-to-escape of all ref and out argument expressions (excluding the receiver)

This is a very specific and narrow fix because it's a narrow and specific problem. 😄

Being able to annotate that a ref is this allows ref returns to non ref-fields

No it does not. I understand on the surface they seem related but in terms of how the span safety rules work they are very different problems and require different solutions. The [RefEscapes] proposal is all about changing how ref-safe-to-escape validation works (and it's a very targeted change). The behavior you are asking for is about how safe-to-escape returns are calculated. Attempting to use one solution both unnecessarily complicates the space and very likely creates other side effects that you almost certainly do not want.

I think the name [RefEscapes] is adding to the confusion here. Think of the name as [ThisRefEscapes] instead.

However, being able to annotate that a ref escapes allows things like MemoryMarshal.CreateSpan to exist and to exist "safely" without leaving a hole around something that can't otherwise be expressed.

Annotating that a ref escapes though is somewhat rhetorical because ref can escape today, and they can escape by ref:

ref int Identity(ref int p) => ref p;

The issue is more that ref cannot be smuggled in the proposal. Or rather a ref that sneakily escapes by ref inside a value.

Again though, this is solvable with another annotation. The issue is that when looking at the issues and samples the extra complexity did not seem warranted. All of the features in this proposal line up to specific customer asks based on samples, real scenarios, etc ... The same does not exist for this request. Further the existing rules, and the decisions we made about them originally WRT usability, provide reasonable evidence this is not actually needed as a feature.

If you feel passionately about this I think the way to move it forward is with samples, evidence, etc ....

@tannergooding
Copy link
Member

If you feel passionately about this I think the way to move it forward is with samples, evidence, etc ....

Not super passionate and I don't think I have any specific examples today. I'd have to go back through the various hacks I'm using right now and double-check.

I think I have a better understanding of the differences now though, so thanks!

I think the name [RefEscapes] is adding to the confusion here. Think of the name as [ThisRefEscapes] instead.

Any reason why we couldn't use this name instead, to help reduce confusion?

@jaredpar
Copy link
Member Author

@tannergooding

Any reason why we couldn't use this name instead, to help reduce confusion?

Because once proposals are published they're immutable. 😄

Joking aside it's one of the changes I'm considering.

@gdkchan
Copy link

gdkchan commented Sep 27, 2020

Haven't seen anything about multi-dimensional fixed buffers in the proposal, but it would be useful too.
Example:

struct Matrix4x4
{
    public fixed float M[4][4];
}

It would be equivalent to the following C struct:

typedef struct {
    float M[4][4];
} Matrix4x4;

I'm not sure how one would pass it around though. According to the current proposal, I believe the safe method to pass the fixed buffers around is passing as Span<T>. However there's no multi-dimensional Span<T>, so it wouldn't work for this case.

@Thealexbarney
Copy link

Thealexbarney commented Sep 27, 2020

Haven't seen anything about multi-dimensional fixed buffers in the proposal, but it would be useful too.

I agree. They would be useful.

Multi-dimensional arrays could work without relatively major changes, but only if accessing a single element or getting a span of the lowest dimension are the only things allowed.

Example:

struct MyStruct
{
    public fixed int MyArray[4][5][6];
}

int element = myStruct.MyArray[2][3][4];

Could be equivalent to (Bounds checks skipped for brevity):

struct MyStruct
{
    public fixed int MyArray[120];

    public int GetMyArray(int dim1, int dim2, int dim3)
    {
        return MyArray[dim1 * 5 * 6 + dim2 * 6 + dim3];
    }
}

int element = myStruct.GetMyArray(2, 3, 4);

Getting a single dimension of the array would be more difficult. Getting a span from the least significant dimension wouldn't be difficult:

public Span<int> Get(int dim1, int dim2)
{
    return MyArray.Slice(dim1 * 5 * 6 + dim2 * 6, 6);
}

Getting higher dimensions might be done by encoding the dimension sizes in the type. C# can't do anything like MultiDim<int, 4, 5, 6>, so the only way I see would be to generate a struct for each dimension. However the types would get ugly rather quickly, not to mention that these compiler-generated types would need to be accessible by user-facing code.

Pseudocode example:

public ref struct Array4_5_6
{
    private Span<int> _span;

    public Array5_6 GetDim2(int index)
    {
        return new Array5_6(_span.Slice(index * 5 * 6, 5 * 6));
    }

    public Span<int> GetDim3(int dim1, int dim2)
    {
        return _span.Slice(dim1 * 5 * 6 + dim2 * 6, 6);
    }
}

public ref struct Array5_6
{
    private Span<int> _span;

    public Span<int> GetDim2(int index)
    {
        return _span.Slice(index * 6, 6);
    }

    public int Get(int dim1, int dim2)
    {
        return _span[dim1 * 6 + dim2];
    }
}

The name `[RefEscapes]` was causing a lot of confusion in the
discussions. People were mis-interpretting it as a general feature that
could apply to other constructs when in reality it's a very specific and
targetted feature. Renaming to `[ThisRefEscapes]` to make it clear this
is a very targetted feature.
Based on feedback from several people I've decided that we need to make
the rules for `ref` fields more flexible when the values being passed
around are known to refer to the heap. In those cases there is simply no
reason to restrict the created `ref struct` as it is always
*ref-safe-to-escape* to the enclosing method as well as not requiring
any adjustment to the method invocation rules.

This does mean that we need to add the notion of "refers to the heap" to
the span safety document. That doesn't change any assumptions there.
It's just that before this change the notion of "refers to the heap"
wasn't necessary to describe our safety rules.
@jaredpar
Copy link
Member Author

@gdkchan

I had not considered multi-dimensional array support here. The idea of this proposal was essentially to extend the existing support for fixed sized buffers to include supporting all element types and not just unmanaged types. Multi-dimensional support is essentially expanding the scope of the original feature quite a bit. Given that it's existed for the entirety of C# and there hasn't been a big push to add this support I'm fairly skeptical that it's worth the extra complexity.

Added it to the open issue section though so we can discuss in LDM when we discuss the rest of the feature (this specific section hasn't been covered yet).

@jaredpar
Copy link
Member Author

Thanks for the feedback everyone! Have updated the proposal based on various discussions here. Particularly:

  1. Renamed [RefEscapes] to [ThisRefEscapes] to make the semantics of the feature much clearer.
  2. Made constructors and ref field assignment more flexible when the values being used are known to refer to the heap. Scenarios provided by @JamesNK and @davidwrighton made me feel the extra complexity in the rules are justified (and really it's actually a fairly small change).
  3. Responded to several feedback points by @VSadov on how fixed buffers should be emitted to make them more closely resemble existing arrays.

@jaredpar
Copy link
Member Author

Thanks again for the feedback everyone. At this point the initial design of ref field, [ThisRefEscapes] and safe fixed buffers is a good place. The ref field portion has largely been presented to LDM while the [ThisRefEscapes] + safe fixed buffers should be presented this week / next week. Likely another round of edits will follow after that.

Merging now because I want to make progress on the [DoesNotEscape] section. Using a separate PR for that as I noted that section should not be considered in this PR as it was decidedly incomplete.

@YairHalberstadt
Copy link
Contributor

@jaredpar is there any championed issue for this proposal?

Thanks

@333fred
Copy link
Member

333fred commented Dec 24, 2020

There are a few. #1147 is ref fields.

@YairHalberstadt
Copy link
Contributor

Would it be useful to have a single one for this proposal? Are there any for returning a field of a struct by ref?

@jaredpar
Copy link
Member Author

jaredpar commented Jan 4, 2021

I've been using #1147 as the primary championed issue here.

@CodingMadness
Copy link

@jaredpar is something of this possible by current (.NET 6.0 version, VS2022) ?Would be nice to hear :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.