Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define "source breaking change", "binary breaking change" in the context of language versioning. #3717

Closed
BillWagner opened this issue Nov 16, 2017 · 27 comments

Comments

@BillWagner
Copy link
Member

BillWagner commented Nov 16, 2017

From @jskeet via LiveFyre:

It would be really nice to see a section on versioning concerns here. Is changing "public struct" to "readonly public struct" a breaking change? Is adding the "in" modifier to a method parameter a breaking change? I'll investigate for myself, but it would be great to see that information here.

More discussion on the topic:

From @kburgoyne

Since "in" means the callee is using a reference to the caller's struct rather than a copy of the caller's struct, there is the potential for it to break code. The key is whether there is any concern for the caller's copy to be changed while the callee is holding a ref to it. "in" really only says the callee won't/can't change the struct. It doesn't say the struct can't be changed elsewhere (by the caller or anything else that can access it). Obviously we're really talking multitasking/multithreading. In purely single threading nothing else has a chance to be executing while the callee is executing, so nothing else has the opportunity to change the struct.

More @jskeet

It feels like more than "potential" to break code - I suspect it's simply not binary compatible, in the same way that changing Foo(int x) to Foo(ref int x) isn't binary compatible. I suspect this will limit where it can be used for public library projects :(

/cc @VSadov @jaredpar

@jaredpar
Copy link
Member

Is changing "public struct" to "readonly public struct" a breaking change?

No. It is both safe from a source and binary perspective. CoreFx is already making progress in converting a number of structs to be readonly

dotnet/corefx#24997

Is adding the "in" modifier to a method parameter a breaking change?

This is safe from a source perspective but breaking from a binary perspective. The in modifier, much like out, is simply a modifier on ref parameters. Hence it's effectively changing a normal parameter, to a ref parameter that C# allows you to pass without any qualifiers.

I suspect this will limit where it can be used for public library projects

For locations where binary compat is important you can simply add an overload that takes in. That preserves compat and allows for the use of in when desired.

@udlose
Copy link

udlose commented Nov 16, 2017

@jaredpar - re your comment:

For locations where binary compat is important you can simply add an overload that takes in. That preserves compat and allows for the use of in when desired.

Per https://docs.microsoft.com/en-us/dotnet/csharp/reference-semantics-with-value-types,

You cannot create overloads of a method that differ only in the presence of in, out or ref

So it appears to contradict your suggestion. Am I misinterpreting?

@kburgoyne
Copy link

I actually think it's a bad idea to not require "in" on the call. The thread-safety issue means the caller should "acknowledge" the lack of thread-safety when passing using "in" versus non-"in". Without requiring "in" on the call, the call looks like any historical call which is thread-safe insofar as the struct arguments. But yet if the method signature has "in" then the call isn't thread-safe even through the calling code lures developers into seeing it as being safe.

Requiring "in" also means the method signature cannot be changed between thread-safe (no-"in") and thread-unsafe ("in") without triggering compiler errors on the calling code and requiring a programmer to recognize the thread safety issue. By far "mostly" this is a concern when changing non-"in" to "in" which seems to be most likely to encounter thread-safety issues for "good" code.

I'm also envisioning "possibly" it also being a concern for code in which the called method waits for a semaphore to release before accessing updated values in a struct ref passed via "in". This certainly sounds on the surface like maybe a pretty distasteful coding practice, but the language specs says it should work.

@jaredpar
Copy link
Member

@udlose that sentence is confusingly worded. The difference is subtle and as I'm trying to type up the explanation I'm beginning to understand why it's hard to construct a single sentence that adequately explains the problem.

A longer way of saying it is:

It is okay for two overloads to differ by in, ref or out when one overload has any of the modifiers but the other has none.
It is not okay for two overloads to differ by in, ref or out when both of the overloads have one of the modifiers

Or in code terms

// Okay
void M(int i) { ... } 
void M(ref int i) { ... }

// Not Okay
void G(in int i) { ... } 
void G(ref int i) { ... }

@jaredpar
Copy link
Member

Requiring "in" also means the method signature cannot be changed between thread-safe (no-"in") and thread-unsafe ("in") without triggering compiler errors on the calling code and requiring a programmer to recognize the thread safety issue.

Why would either of these necessarily be more thread safe than the other? For that matter what do you mean by thread-safe here: threading has far too many scenarios to be covered by a single term. Except of course immutable.

@udlose
Copy link

udlose commented Nov 16, 2017

@jaredpar - hah! that's cheating :)
Thanks for the clarification. Could someone please update the docs to include your clarification?

@BillWagner
Copy link
Member Author

@udlose I will update it. I was extending the current standard language for in. It's not adequate, and I like @jaredpar 's description.

Aside: How to describe (trying to stay in one sentence) generated significant discussion in the C# standards meetings.

@kburgoyne
Copy link

By "thread safe" I mean "safe for use in multithreading scenarios". When a struct is passed-by-copy (as is the case today without "in"), the callee is insensitive to whether the original struct gets subsequently modified by another thread. However when the struct is passed by reference (even using "in"), the original struct is being referenced by the callee. Thus if the original struct is modified by another thread, the callee will "see" the change.

@jaredpar
Copy link
Member

By "thread safe" I mean "safe for use in multithreading scenarios".

Which multithreading scenarios? As noted there are too many, many of which have contradictory correctness rules, to describe in a single all up term.

@udlose
Copy link

udlose commented Nov 16, 2017

@BillWagner - Thank you. oh, to be a fly on the wall! It'd be great to hear those discussions.

@kburgoyne
Copy link

Basically any potential multithreading scenarios in which the struct being passed is being accessed by multiple threads. If the struct is passed-by-copy (historical), then the callee doesn't care because it has its own local stack copy. If the struct is passed by-ref ("in"), then the callee does care whether multiple threads are accessing the original struct.

If a semaphore or other lock is protecting the struct, the callee need not participate in the lock when the struct is passed-by-value. However when the struct is pass by reference, the callee may need to participate in the lock if there is concern about potentially reading a partially updated struct.

Note these concerns apply when passing by "ref" also, except "ref" means also caring about writes (not just reads) to the struct if other threads can access it.

@vladd
Copy link

vladd commented Nov 17, 2017

@kburgoyne

Obviously we're really talking multitasking/multithreading. In purely single threading nothing else has a chance to be executing while the callee is executing, so nothing else has the opportunity to change the struct.

Well, I'm not sure about it; the method can call a callback which can mutate the original struct. Somthing like this:


void F(int x, Action callback)
{
    callback();
    Console.WriteLine(x);
}

int x = 0;
F(x, () => x = 1);

Replacing int x with in int x changes output from 0 to 1. Is this behaviour considered a breaking change?

edit by @BillWagner to attribute quote to original author (see matching edit on issue description)

@jaredpar
Copy link
Member

If the struct is passed-by-copy (historical), then the callee doesn't care because it has its own local stack copy. If the struct is passed by-ref ("in"), then the callee does care whether multiple threads are accessing the original struct.

This is only true for the data directly in the struct though. Data which is indirectly referenced, via say a field to a class object, is just as vulnerable in either scenario.

This is also not applicable when the struct itself is readonly and sitting in a readonly location. In that case "by val" just slows things down.

@kburgoyne
Copy link

@vladd - Touche! Good catch. It is a little less "nefarious" as far as a developer trying to understand the possibilities since the callee is at least participating in that case. But it is certainly another entirely legitimate "difference".

I'm not convinced the not requiring "in" on the call may have been fully explored. I can understand the "friendliness" of not requiring it, but I think there is way too much to be said for making the understanding between the caller and callee more explicit. Particularly when it comes to a future programmer trying to trace through older code -- or even me trying to trace through my code from last week :-).

@kburgoyne
Copy link

@jaredpar -- All true. The debate really isn't about when to use "in" or not. The debate is only regarding clarity (or guidelines) for when a developer decides to use one or the other. The class reference in the struct is true both with and without "in", just like if the class reference was passed as its own parameter. This has more to do with the struct members that would be "safe" during pass-by-copy which become potentially "unsafe" during pass-by-ref.

@VSadov
Copy link
Member

VSadov commented Nov 17, 2017

@kburgoyne what you are describing is basically aliasing of variables. Multithreading does not need to be involved to observe it.
Concurrent use of shared state can have effect of making everything more complex, but it is helpful to separate concerns of sharing/aliasing and threading.

Consider examples:

  • a field X is passed as out parameter P to A method. Method assigns to the P, then makes assignment to X and sees that P has changed. The scenario is completely deterministic with no threading involved.
    Can this happen? - yes. Does this happen often - no.

  • a field X has a struct type and is a receiver of a call to a method M. Method assigns to this.Something, then makes assignment to X.Something and sees that this.Something has changed.
    Again - we are observing aliasing. X.Somehting and this.Something end up bound to the same storage

Also note that if X is readonly, then the method is called on a copy. Then you can observe that there is no aliasing.

Aliasing has complicated, but very well defined behavior. Code that takes advantage of aliasing tend to be more subtle - that is correct. However, the scenarios where you can change one variable and see another seemingly unrelated variable changing its value are not very new.

Specifying in at the call site guarantees aliasing, just like ref would. So if you want to take dependency on aliasing, you can ensure it.

Most users though don`t write concurrent/racey code and do not rely on aliasing to a degree that it cannot be traced and becomes source of bugs.
Basically, even though the language embraces shared state (for good or worse), it is not generally a big problem and the remedies are either "do not overdo it", or "do not make things unnecessary mutable".

@kburgoyne
Copy link

@VSadov -- I think we're getting sidetracked debating multi-threading when it is nothing more than one example of the situation. The underlying issue is that using "in" versus not using "in" does have potential side effects and switching from non-"in" to "in" is potentially a "breaking change" -- which is where this thread got started. My assumption has been that at a minimum, documentation should point this out. Particularly for developers who may not be used to thinking about the details of what's happening at the machine level.

There is also a significant "it's safer" argument to be made for requiring "in" on the call. Lacking that (which I find to be a very bad idea), there is going to need to be some pretty involved documentation behind spec'ing how the compiler selects between "void Foo(int x) {}" and "void Foo(in int x) {}" for the wealth of variants of a "Foo(x: /*different things*/)" call. If the caller is required to have "in", all that "what does the compiler do?" goes away and the intent of the caller is made explicitly clear. It's also symmetrical with the use of "ref".

@bgrainger
Copy link

bgrainger commented Nov 19, 2017

For locations where binary compat is important you can simply add an overload that takes in. That preserves compat and allows for the use of in when desired.

@jaredpar How do you call the non-in overload?

public class C {
    public void M() {
        Point3D a = default, b = default;
        
        // error CS0121: The call is ambiguous between the following methods or properties: 'C.CalculateDistance(Point3D, Point3D)' and 'C.CalculateDistance(in Point3D, in Point3D)'
        CalculateDistance(a, b);
        
        CalculateDistance(in a, in b);
    }
    
    public static double CalculateDistance(Point3D point1, Point3D point2) { ... }
    public static double CalculateDistance(in Point3D point1, in Point3D point2) { ... }
}

SharpLab

Obviously, if in was always required at the callsite, the non-in call would be unambiguous (same as overloading with ref today).

Edit: discussed previously at dotnet/csharplang#945 (and the basic answer in C# 7.2 is "you can't").

@udlose
Copy link

udlose commented Nov 20, 2017

Hopefully, I'm stating the obvious here....I think that writing code that is explicit and clearly states its intention is of the highest importance. Ambiguity in intention is dangerous.

If it was designed to be optional to avoid it being a breaking change, I'd argue that is the wrong reason. Being clear in intention, especially when a new language feature is added, should be considered more important. I'd prefer it to be breaking so that if my devs go sprinkling in everywhere in method declarations, they have to add it to the callsites as well. Again, making it clear.

IMO, in should be required - not optional.

@jaredpar
Copy link
Member

@bgrainger

How do you call the non-in overload?

This is the issue tracking support for in at the callsite being a tie breaker. Unfortunately we just ran out of time to finish this in the 7.2 release

dotnet/csharplang#945

@udlose
Copy link

udlose commented Nov 20, 2017

@jaredpar - so has it been decided on whether or not in will be required at callsites? From that issue, it is difficult to tell. Please consider my comment above.

@jaredpar
Copy link
Member

@udlose

so has it been decided on whether or not in will be required at callsites?

It is not required.

The reason for adding ref and out at the callsite is to make visible that the callee can modify the arguments. In both cases it is possible for the callee to pretty significantly alter the code flow of the caller simply by changing a parameter modifier. As such it was deemed worth the extra overhead to require the ref and out modifier at the call site. This forces both sites to agree that the callee can change the values.

In the case of in though there is no such problem. The addition of in doesn't change the code flow of the caller, but rather the callee itself. No need to force the caller to change here as they aren't impacted by the change in the majority of cases.

This is not something that was decided with the in feature in 7.2, but rather it dates all the way back to C# 4.0. In this release the compiler began looking at COM API metadata to change how callers interacted with APIs. This had two visible effects on callsites:

  • Allowed callers to omit arguments when the parameter was both optional and ref.
  • Allowed callers to omit ref in the case the parameter was marked as in

Our decision around in at callsite is both driven from the reasoning above and to be consistent with our prior decisions around COM APIs.

@bbarry
Copy link

bbarry commented Nov 20, 2017

It is not required.

Which means at the moment you cannot add an overload which accepts an in parameter to an existing class because doing so breaks source level compatibility.

given assembly A and B:

namespace A
{
  public class Ca
  {
    public void M(int i) => Console.WriteLine(i);
  }
}
namespace B
{
  public class Cb
  {
    public void N()
    {
      var i = 0;
      new A.Ca().M(i);
    }
  }
}
  • Changing A.M(int i) to A.M(in int i) will break binary compatibility (require a rebuild of B)
  • Adding new overload A.M(in int i) will break source compatibility (B cannot be built due to error CS0121)

@mairaw mairaw added the discussion Indicates issues that are being discussed label Nov 20, 2017
@mairaw mairaw added this to the Backlog milestone Nov 20, 2017
@eerhardt
Copy link
Member

Another versioning question that I didn't see covered above: Is it a breaking change to add ref readonly to a public static property?

In CoreFX we’ve shipped:

    public partial struct Vector3 : System.IEquatable<System.Numerics.Vector3>, System.IFormattable
        {
        public static System.Numerics.Vector3 UnitX { get { throw null; } }
        public static System.Numerics.Vector3 UnitY { get { throw null; } }
        public static System.Numerics.Vector3 UnitZ { get { throw null; } }
        public static System.Numerics.Vector3 Zero { get { throw null; } }
         }

Is it possible to add ref readonly to these properties without it being a binary breaking change?

@VSadov
Copy link
Member

VSadov commented Nov 21, 2017

adding ref readonly would not be source breaking, but would be a binary breaking change.

Which is unfortunate, since ref readonly would be immensely helpful here.

@VSadov
Copy link
Member

VSadov commented Nov 21, 2017

Same can be said about operator + (in Vector3 left, in Vector3 right)
Binary breaking, but very beneficial.

@BillWagner BillWagner removed this from the Backlog milestone Mar 12, 2018
@BillWagner BillWagner added the P1 label Mar 13, 2018
@BillWagner BillWagner added this to the Sprint 137 (June 11 - June 29) milestone May 21, 2018
@BillWagner BillWagner self-assigned this Jun 29, 2018
@BillWagner BillWagner changed the title Are there breaking changes / versioning issues in C# 7.2 Define "source breaking change", "binary breaking change" in the context of language versioning. Jul 2, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants