Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Discussion: Proposal to make FormattableString a struct #3981

Closed
justinvp opened this issue Feb 17, 2015 · 7 comments
Closed

Discussion: Proposal to make FormattableString a struct #3981

justinvp opened this issue Feb 17, 2015 · 7 comments

Comments

@justinvp
Copy link
Contributor

@AlexGhiondea, @jaredpar, @nguerrera, @ericstj:

I'd like to discuss a potential alternative approach to FormattableString that avoids the heap allocation associated with this type being a class and still allows for the same optimizations of the current design with minimal impact to the language/compiler. If there's a better place for this discussion, please let me know.

Was a design along the lines of the proposal (below) considered, and if so, why was it rejected?

Background

Here's the current design of the new System.FormattableString and System.Runtime.CompilerServices.FormattableStringFactory types:

namespace System
{
    public abstract class FormattableString : IFormattable
    {
        public static string Invariant(FormattableString formattable);

        public abstract string Format { get; }
        public abstract object[] GetArguments();
        public abstract int ArgumentCount { get; }
        public abstract object GetArgument(int index);
        public override string ToString();
        public abstract string ToString(IFormatProvider formatProvider);
        string IFormattable.ToString(string ignored, IFormatProvider formatProvider);
    }
}

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString Create(string format, params object[] arguments);
    }
}

These types enable the new string interpolation language feature to format strings using the invariant culture:

float latitude = 47.644961f;
float longitude = -122.130690f;
string coords = FormattableString.Invariant($"({latitude},{longitude})");

Which compiles as:

float latitude = 47.644961f;
float longitude = -122.130690f;
string coords = FormattableString.Invariant(FormattableStringFactory.Create("({0},{1})", new object[] { latitude, longitude }));

The upside of this design is that it affords future optimizations that can avoid the argument array allocation and boxing when there are 1, 2, and 3 arguments (via private subclasses of FormattableString):

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString Create<T>(string format, T argument);
        public static FormattableString Create<T1, T2>(string format, T1 argument1, T2 argument2);
        public static FormattableString Create<T1, T2, T3>(string format, T1 argument1, T2 argument2, T3 argument3);
    }
}

The downside is that this always results in a heap allocation of System.FormattableString.

This means that using the string interpolation feature to format with the invariant culture in this way will always be more costly than the default behavior of formatting with the current culture (assuming similar optimized overloads to String.Format).

Proposal

Here's an alternative design that replaces the FormattableString class with a FormattableString<TArgument> struct:

namespace System
{
    public static class FormattableString
    {
        public static string Invariant<TArguments>(FormattableString<TArguments> formattable) where TArguments : IFormattable
    }

    public struct FormattableString<TArguments> : IEquatable<FormattableString<TArguments>>, IFormattable
        where TArguments : IFormattable
    {
        public FormattableString(string format, TArguments arguments);
        public string Format { get; }
        public TArguments Arguments { get; }
        public override string ToString();
        public string ToString(IFormatProvider formatProvider);
        string IFormattable.ToString(string ignored, IFormatProvider formatProvider);
        public bool Equals(FormattableString<TArguments> other);
        public override bool Equals(object obj);
        public override int GetHashCode();
    }
}

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString<Arguments> Create(string format, params object[] arguments);

        public struct Arguments : IEquatable<Arguments>, IFormattable
        {
            public Arguments(object[] arguments);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }
    }
}

Future optimized overloads would look like the following:

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString<Arguments<T>> Create<T>(string format, T argument);
        public static FormattableString<Arguments<T1, T2>> Create<T1, T2>(string format, T1 argument1, T2 argument2);
        public static FormattableString<Arguments<T1, T2, T3>> Create<T1, T2, T3>(string format, T1 argument1, T2 argument2, T3 argument3);

        public struct Arguments<T> : IEquatable<Arguments<T>>, IFormattable
        {
            public Arguments(T argument);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments<T> other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }

        public struct Arguments<T1, T2> : IEquatable<Arguments<T1, T2>>, IFormattable
        {
            public Arguments(T1 argument1, T2 argument2);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments<T1, T2> other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }

        public struct Arguments<T1, T2, T3> : IEquatable<Arguments<T1, T2, T3>>, IFormattable
        {
            public Arguments(T1 argument1, T2 argument2, T3 argument3);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments<T1, T2, T3> other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }
    }
}

The huge upside: no GC heap allocation for the class when using string interpolation with the invariant culture.

This does require additional Arguments struct types. But these types could live in the System.Runtime.CompilerServices namespace (and can be nested within FormattableStringFactory) where users wouldn't have to know or care about them. Users would use the Invariant method with this approach in the exact same way as with the current design, with the benefit of no heap allocation.

This approach fits well with apparent future related work:

  • System.Text.Formatting

    System.Text.Formatting APIs are similar to the existing StringBuilder and TextWriter APIs. They are designed to format values into text streams and to build complex strings. But these APIs are optimized for creating text for the Web. They do formatting with minimum GC heap allocations (1/6 of allocations in some scenarios) and can format directly to UTF8 streams. This can result in significant performance wins for software that does a lot of text formatting for the Web, e.g. generating HTML, JSON, XML.

    • Comment: One can imagine functionality similar to StringBuilder.AppendFormat in potential future System.Text.Formatting APIs that enables the use of the string interpolation language feature with these APIs without the FormattableString GC heap allocation
  • params Arguments<T> to avoid the params array allocation

    • Comment: I'm not suggesting that this proposal's Arguments structs (specific to FormattableString<TArguments>) be the same as the params Arguments<T> struct(s). I just wanted to call attention to the same motivation behind the feature (avoiding unnecessary allocations)

Additional Benefit

As an added bonus, the proposed design enables a new scenario: "deferred string formatting" for APIs that want to avoid all allocations (even avoiding the allocation of the formatted string itself) if the formatting doesn't need to happen.

For example, logging APIs like NLog typically have methods that avoid calling String.Format and avoid allocating the arguments array and boxing when a log level isn't enabled:

public class Logger
{
    public void Write<TArgument>(LogLevel level, string message, TArgument argument)
    { 
        if (IsEnabled(level))
        {
            WriteInternal(level, string.Format(message, new object[] { argument }));
        }
    }

    public void Write<TArgument1, TArgument2>(LogLevel level, string message, TArgument1 argument1, TArgument2 argument2)
    { 
        if (IsEnabled(level))
        {
            WriteInternal(level, string.Format(message, new object[] { argument1, argument2 }));
        }
    }

    public void Write<TArgument1, TArgument2, TArgument3>(LogLevel level, string message, TArgument1 argument1, TArgument2 argument2, TArgument3 argument3)
    { 
        if (IsEnabled(level))
        {
            WriteInternal(level, string.Format(message, new object[] { argument1, argument2, argument3 }));
        }
    }
}

Usage:

long id = 1;
int foo = 500;
_logger.Write(LogLevel.Verbose, "Id: {0} - Foo {1} happened", id, foo);

The proposed design would allow the string interpolation language feature to be used in such APIs without any allocation penalty. Such methods would be similar to FormattableString.Invariant, with an implementation that could choose not to call ToString at all if the particular log level isn't enabled, in which case no allocations would occur:

public class Logger
{
    public void Write<TArgument>(LogLevel level, FormattableString<TArgument> formattable)
        where TArgument : IFormattable
    {
        if (IsEnabled(level))
        {
            WriteInternal(level, formattable.ToString());
        }
    }
}

Usage:

long id = 1;
int foo = 500;
_logger.Write(LogLevel.Verbose, $"Id: {id} - Foo {foo} happened");

Conclusion

I have an implementation of this approach and would be happy to submit a Pull Request (even if folks just want to see what the implementation looks like for discussion).

The compilers (and associated tests) would need to be updated to look for the generic FormattableString<TArgument> struct instead of the non-generic FormattableString class and allow the result of the FormattableStringFactory.Create call to be passed to generic methods like FormattableString.Invariant.

@stephentoub
Copy link
Member

cc @gafter, @KrzysztofCwalina

@gafter
Copy link
Member

gafter commented Feb 17, 2015

To answer your question:

Was a design along the lines of the proposal (below) considered, and if so, why was it rejected?

No, it was not considered. And I'm glad.

I do not think the benefits of this proposal overcome its conceptual surface area. Would the language specification have special rules for each of FormattableString, FormattableString<T>, FormattableString<T, U>, FormattableString<T, U, V>, and then just stop there? That's a pretty arbitrary and complex language specification. It would be painful for clients to specialize APIs for each of them.

A proposal like this would have to have been worked out in detail and implemented a few months ago to have any real chance of affecting the upcoming release.

@justinvp
Copy link
Contributor Author

Would the language specification have special rules for each of FormattableString, FormattableString, FormattableString<T, U>, FormattableString<T, U, V>, and then just stop there?

The proposal is for just one FormattableString<TArguments> type.

@gafter
Copy link
Member

gafter commented Feb 17, 2015

@justinvp Your proposal is for a change to the platform libraries. If you are proposing a language change, please spell it out. What would be the language specification for the interpolated string conversions? What conversions are defined by the language and what are their semantics?

@justinvp
Copy link
Contributor Author

Your proposal is for a change to the platform libraries.

@gafter Exactly. I'm not proposing a language change. I haven't seen an up-to-date language spec for this feature, so I can't easily say whether there's an actual diff to how the spec is currently written, but the semantics would be the same as it is currently.

When converting an interpolated-string-expression to System.IFormattable or System.FormattableString<TArguments>, System.Runtime.CompilerServices.FormattableStringFactory.Create is called, passing the format string and arguments to create an object of type String.Formattable<TArguments>.

The only "language change" is in the compiler implementations, because the compilers currently look for the System.IFormattable and System.FormattableString well known types for the conversion during lowering (source):

        protected override Conversion GetInterpolatedStringConversion(BoundInterpolatedString source, TypeSymbol destination, ref HashSet<DiagnosticInfo> useSiteDiagnostics)
        {
            // An interpolated string expression may be converted to the types
            // System.IFormattable and System.FormattableString
            return (destination == Compilation.GetWellKnownType(WellKnownType.System_IFormattable) ||
                    destination == Compilation.GetWellKnownType(WellKnownType.System_FormattableString))
                ? Conversion.InterpolatedString : Conversion.NoConversion;
        }

WellKnownType.System_FormattableString is defined as "System.FormattableString" (non-generic). It would need to be changed to "System.FormattableString`1" (generic).

Here's an example (it's the exact same process as right now):

Assume the platform libraries look like this:

namespace System
{
    public static class FormattableString
    {
        public static string Invariant<TArguments>(FormattableString<TArguments> formattable) where TArguments : IFormattable
    }

    public struct FormattableString<TArguments> : IEquatable<FormattableString<TArguments>>, IFormattable
        where TArguments : IFormattable
    {
        public FormattableString(string format, TArguments arguments);
        public string Format { get; }
        public TArguments Arguments { get; }
        public override string ToString();
        public string ToString(IFormatProvider formatProvider);
        string IFormattable.ToString(string ignored, IFormatProvider formatProvider);
        public bool Equals(FormattableString<TArguments> other);
        public override bool Equals(object obj);
        public override int GetHashCode();
    }
}

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString<Arguments> Create(string format, params object[] arguments);

        public struct Arguments : IEquatable<Arguments>, IFormattable
        {
            public Arguments(object[] arguments);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }
    }
}

If someone has code like this:

float latitude = 47.644961f;
float longitude = -122.130690f;
string coords = FormattableString.Invariant($"({latitude},{longitude})");

The same exact language conversion lowering process is used to convert $"({latitude},{longitude})" into a FormattableString<TArguments>. First the format string is determined, e.g. "({0},{1})". Then it, along with the captured contents (e.g. latitude and longitude) are passed to the call to System.Runtime.CompilerServices.FormattableStringFactory.Create. Initially, there's only one Create overload, which returns FormattableString<Arguments> (e.g. the format and object args array).

However, if optimized overloads of Create were added like this following:

namespace System.Runtime.CompilerServices
{
    public static class FormattableStringFactory
    {
        public static FormattableString<Arguments<T>> Create<T>(string format, T argument);
        public static FormattableString<Arguments<T1, T2>> Create<T1, T2>(string format, T1 argument1, T2 argument2);
        public static FormattableString<Arguments<T1, T2, T3>> Create<T1, T2, T3>(string format, T1 argument1, T2 argument2, T3 argument3);

        public struct Arguments<T> : IEquatable<Arguments<T>>, IFormattable
        {
            public Arguments(T argument);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments<T> other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }

        public struct Arguments<T1, T2> : IEquatable<Arguments<T1, T2>>, IFormattable
        {
            public Arguments(T1 argument1, T2 argument2);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments<T1, T2> other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }

        public struct Arguments<T1, T2, T3> : IEquatable<Arguments<T1, T2, T3>>, IFormattable
        {
            public Arguments(T1 argument1, T2 argument2, T3 argument3);
            public string ToString(string format, IFormatProvider formatProvider);
            public bool Equals(Arguments<T1, T2, T3> other);
            public override bool Equals(object obj);
            public override int GetHashCode();
        }
    }
}

Then the compiler would end up calling the Create overload that returns FormattableString<Arguments<float, float>> (no boxing, no array allocation, no class allocation).

@gafter
Copy link
Member

gafter commented Feb 17, 2015

You suggest an interpolated string conversion is available when there is a conversion to System.FormattableString<TArguments>. But that isn't a type, because TArguments isn't a type.

@gafter
Copy link
Member

gafter commented Feb 17, 2015

Perhaps you intend the conversion involve type inference to infer the type of TArguments. If that is the case we'd need a specification for the changes to the type inference algorithm.

And to get into the current release, we'd need it about 5-6 months ago.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 30, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 7, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants