Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Strongly-typed type aliases #58

Closed
rcook opened this issue Jan 23, 2015 · 20 comments
Closed

Strongly-typed type aliases #58

rcook opened this issue Jan 23, 2015 · 20 comments

Comments

@rcook
Copy link
Contributor

rcook commented Jan 23, 2015

I would love to see strongly-typed type aliases in the C# programming language. These would be roughly equivalent to types declared in Haskell using newtype. Such aliases would be very different from regular aliases declared using using statements since the source-level type would ultimately be erased by the compiler. However, they would behave as distinct types from the original type during compilation time.

I think this would be very helpful for dealing with easily abused types such as string. Here's an example using a hypothetical new language structure using the newtype keyword:

using System;
newtype EmailAddress = System.String;

class Program
{
    static EmailAddress CreateEmailAddress(string text)
    {
        // Valid: use cast-style syntax to "convert" string to EmailAddress
        return (EmailAddress)text;
    }

    static void UseEmailAddress(EmailAddress emailAddress)
    {
        // Valid: everything has ToString
        Console.WriteLine(emailAddress);

        // Invalid: EmailAddress does not directly expose Length
        Console.WriteLine(emailAddress.Length);

        // Valid: EmailAddress instance can be explictly converted back to System.String
        // Does not result in runtime conversion since EmailAddress type is erased by
        // compiler
        Console.WriteLine(((string)emailAddress).Length);
    }

    static void Main()
    {
        // Valid
        UseEmailAddress(CreateEmailAddress("rcook@rcook.org"));

        // Invalid
        UseEmailAddress("rcook@rcook.org");
    }
}

This is purely syntactic sugar, however, and the compiler will emit all references to type EmailAddress as references to System.String instead. Perhaps some additional metadata could be applied to arguments of these alias types to provide a hint to compilers about the original (source-level) type assigned to it. There are obviously many questions that arise. The main one, I think, is what to do about methods that are declared on the original type: if they are all projected onto this new "virtual" type, then we lose all advantages of this new type safety. I believe that the compiler should prohibit the developer from calling any of the original type's members on a reference to the alias type without inserting an explicit cast back to the underlying type. Such a cast would be purely syntactic and would not incur any runtime overhead.

In traditional C# programming, the developer would most likely wrap a string inside an EmailAddress wrapper class to ensure type safety and to curate access to the underlying string. My proposed language feature would enable the developer to more elegantly express certain concepts without bloating the code with wrapper classes. Importantly, it would also allow generation of extremely efficient code.

More issues:

  • What is the scope of a type declared with newtype?

Perhaps newtype could look more like the definition of a class or struct:

namespace MyNewtypeDemo
{
    using System;

    // No inheritance: this is an alias
    // Only methods declared inside the "newtype" declaration
    // can access the members of "string": no casting required
    // We'll reuse the contextual keyword "value"
    public newtype EmailAddress : string
    {
        // Constructor: can assign to "value" which is the actual string
        // Can only initialize via constructor
        public EmailAddress(string text)
        {
            if (!IsValidEmailAddress(text))
            {
                throw new ArgumentException();
            }

            // No cast required since compiler knows that "value" is a string
            value = text;
        }

        // Length is not normally accessible but we can wrap call to value.Length
        public int Length
        {
            get { return value.Length; }
        }

        public string ToActualString()
        {
            return value;
        }

        private static bool IsValidEmailAddress(string text)
        {
            // Validate text etc.
            return true;
        }
    }
}
@mirhagk
Copy link

mirhagk commented Jan 23, 2015

I am also for newtype because despite the fact that wrapper classes would solve many of the same problems, they are non trivial to write. It's the same reason why C# has a problem with most things being mutable, it's just the easiest way to write code, so it's the default one for most cases.

newtype should be a very lightweight construct in both runtime cost as well as developer effort. I'm okay with sacrificing some runtime performance here (possibly actually compiling to a wrapper class) if it's decided to give better results. Type erasure would remove some runtime cost (I'm not sure what would happen with value types, so it may not remove it all, it may require boxing), but it should not be taken lightly, because Java is still paying for it's decision to have generics be type-erased at compile time. But regardless type erasure is more of an implementation detail of how it compiles to CLR, and affects mostly reflection. I'd suggest deferring the topic of type erasure for it until the rest of what is desired is figured out, so type erasure does not make us limit features for the sake of performance.

As for the syntax, I think something in-between the 2 you've mentioned is probably the best bet.

public newtype EmailAddress : public string
{
    int Length {get;}
    string ToString();
}

Syntactically it will look like an interface description, where the implementations come from the underlying type. A simple declaration of public newtype EmailAddress:string should also be created that only exposes Object's methods.

Semantically I think EmailAddress should be considered as an abstract class that inherits from Object, with a new conceptual class StringEmailAddress that provides an implementation for EmailAddress. This conceptual class is not exposed in any way to the developer.

Explicit casts between StringEmailAddress and string should be allowed, and the only way to create it. If wrapper methods like: static EmailAddress CreateEmailAddress(string text) are created, this will basically just hide away the explicit cast along with some potential validation.

The public string section denotes whether such casts are public or private. The basic idea for this is to make it so that such casts must go through wrapper methods, making the fact that it's a string fully an implementation detail, and hiding the explict casts from any consumers. This is a piece that I'm not 100% on how exactly it'd work. Where does the private limit it to, does it mean the helper methods are static methods inside the implementation? Currently explicit operator must be public, does that mean private explicit operators should be allowed?

I think the point of newtype should be similar to the point of automatic getters and setters. A very easy to create thing that gives the flexibility of changing the implementation without changing the public API and requiring consumers to change. newtype should require no code changes (and ideally not have to recompile consumers) in order to "promote" it to a class that is not a simple facade over string.

The design goals for newtype should be the following:

  1. Ease of creation (the easier it is to create, the more likely programmers will actually use it, giving a huge win for semantic types)
  2. Flexibility in turning into full-fledged type (if it breaks consumers to convert to a full fledged type, then it will be difficult to consider whether to use this or a "real" type, and it won't eliminate the majority of wrapper types).
  3. Minimize casts back to underlying type (every time it's cast to the underlying type, that's another piece of code dependent on the implementation, it'd be nice if this could be restricted to certain methods, perhaps only extension methods can do so?)
  4. Minimize runtime overhead (so that the argument can't be made that a programmer uses string for "performance" reasons)

I'm flexible on implementation details, but I think the above goals in that order should be considered. Some are at odds with each other, and in those cases I think the first goals should be given priority.

@theoy theoy added this to the Unknown milestone Jan 23, 2015
@svick
Copy link
Contributor

svick commented Jan 23, 2015

@rcook

Perhaps newtype could look more like the definition of a class or struct:

If this was the case, wouldn't it mean that newtype saves you exactly one line of code (defining the private string value; field)? I don't think that would be worth it.

@mirhagk

If wrapper methods like: static EmailAddress CreateEmailAddress(string text) are created, this will basically just hide away the explicit cast along with some potential validation.

I think it's a bad idea to have two ways to create a type, one of them validating and the other not, so this feature should not encourage that. This, along with some of your points, indicate it might make sense to allow only what you call private newtype.

@rcook
Copy link
Contributor Author

rcook commented Jan 23, 2015

@svick

If this was the case, wouldn't it mean that newtype saves you exactly one line of code (defining the private string value; field)? I don't think that would be worth it.

Yes, you're absolutely right. From a syntactic sugar standpoint, it doesn't gain us much. However, my other interest in having such a language feature is in generating efficient runtime code. I'm happy to declare my newtype with syntax that is more or less the same as that of a struct or class. However, I want use of the objects of this type to incur as little overhead as possible over using a raw string. Essentially, I want type safety at compile time and minimal code overhead at runtime.

In many ways this is similar to some of the motivations behind enum types when a language implements them properly. These are essentially constrained integers. I want to be able to constrain strings, or any other type I can think of.

Some links of interest:

@mirhagk
Copy link

mirhagk commented Jan 23, 2015

Yes I agree that figuring out private explicit casts would be the preferred approach.

Although since casts can be written to do anything (including doing validation) perhaps the solution is to just always cast and then you can put validation logic in the explicit cast? Not sure what best practice would be here.

@mirhagk
Copy link

mirhagk commented Jan 23, 2015

@rcook I think in terms of efficient code generation, something like .NET Native should fairly easily be able to remove any overhead (since it has limitations on reflection anyways, and reflection is the only place you should really be able to notice a difference in whether the type exists or not).

@mikedn
Copy link

mikedn commented Jan 23, 2015

I don't think you really want this to be just a compile time thing, if you do that then "strongly type" goes out of the window as soon as you cross the project/assembly boundary.

You can simply make EmailAddress a struct then you'll get pretty good performance. It doesn't require an additional allocation, it has the same size as the string reference itself. You can add an implicit conversion to string and that will basically be a no-op.

The only trouble with using a struct is that when used with generics (as in List<EmailAddress>) it requires the JIT compiler to produce unshared code. If you use a lot of such types things may go ugly.

@mburbea
Copy link

mburbea commented Jan 26, 2015

Another problem by using a struct is obviously boxing. There is still plenty of framework code that you might call that knows nothing about an EmailAddress or FirstName field. Sure you could cast them to string first but that kind of defeats the point of the whole thing.

@svick
Copy link
Contributor

svick commented Jan 26, 2015

@mburbea What kind of framework code specifically do you have in mind?

The only case I think think of are methods similar to TaskFactory.StartNew(Action<object>, object). And those are relatively rarely used, I think.

@mburbea
Copy link

mburbea commented Jan 26, 2015

Heck even simple things like String.Format has only overloads that accepts object. A lot of APIs do not accept generic parameters and just take a plain object.
However, also some of the overloads like Task.ContinueWith do allow passing a second argument to avoid creating a new delegate to capture variables, but they only take a single state object.

@MadsTorgersen
Copy link
Contributor

We don't expect to do this. You can accomplish this using structs, and we expect source generators #5561 to make that less painful.

@macias
Copy link

macias commented Sep 17, 2022

@MadsTorgersen this is unfortunate (at least). How do you accomplish it with struct? Either I am missing something or you would have to constantly type something like wrapper.Value.

Anyway, it is 2022 and working with primitive types (like long, int) was a pain years ago and it is pain still. This is because I cannot differentiate that this long is id for nodes, and this long is id for roads, and this long... and so on. For C# they are all the same.

@CyrusNajmabadi
Copy link
Member

How do you accomplish it with struct?

Wrap the primitive with a struct (or a record-struct).

or you would have to constantly type something like wrapper.Value.

Only if you actually want to work with the underlying value. But then you're losing the strong typing. The point of strong typign would be not have that happen, but to instead inforce you only use the actual strong type, while still benefiting from a small internal representation that would not leak out.

For C# they are all the same.

Just have a NodeId\RoadId\EtcId struct for those cases. Now htey will be different without issue, while still all only being 64bits large.

@macias
Copy link

macias commented Sep 17, 2022

or you would have to constantly type something like wrapper.Value.

Only if you actually want to work with the underlying value.

At the end of the day I have to.

The point of strong typign would be not have that happen, but to instead inforce you only use the actual strong type, while still
benefiting from a small internal representation that would not leak out.

I understand it. My point is, struct wrappers are not solution, they are merely workarounds. You have to type more, your code is bloated just because some feature is missing.

If it was the case only single feature in given language is missing, fine, but there are much more and as the effect I constantly write code which I shouldn't.

@CyrusNajmabadi
Copy link
Member

struct wrappers are not solution, they are merely workarounds.

I don't get the distinction.

You have to type more,

You're always going to have to type more. You need some way to define this new alias, and that means more code.

your code is bloated

There should be no bloat. That's the point here. These are structs, so they have the exact size of the data they wrap. It's bloat free.

just because some feature is missing.

This is the feature though

@macias
Copy link

macias commented Sep 17, 2022

@CyrusNajmabadi you do understand the difference between one extra line and multiple/thousands of .Value polluting the code (this is what I meant by bloated code, source code)?

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi you do understand the difference between one extra line and multiple/thousands of .Value polluting the code (this is what I meant by bloated code, source code)?

It's unclear why you need multiple/thousands of .Value polluting your code.

@macias
Copy link

macias commented Sep 18, 2022

someWrapper1 +??? someWrapper2

@CyrusNajmabadi
Copy link
Member

@macias I don't know what that code is meant to convey.

It's also unclear why you're adding NodeIds or RoadIds. However, if that is something you intend to do in your domain, then that's fine. There's no reason you have to use .Value in thousands of places to do so.

@nyan-cat
Copy link

It's unclear why you need multiple/thousands of .Value polluting your code.

@CyrusNajmabadi could you give us an example on how you implement this with struct or any other way currently available in the language, with no need to write .Value or something like that? Don't forget about serialization - those aliases should serialize as if they were aliased types, not something like {"Value": 5}

@CyrusNajmabadi
Copy link
Member

I'm not sure what you're asking for. What are three cases where you would need to write .Value?

You're making the claim you would have thousands of these. I'm the one asking for examples as I don't get why that would be necessary.

Don't forget about serialization - those aliases should serialize as if they were aliased types, not something like {"Value": 5}

Then write your serialization code to do that.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests