Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeNameParser API #97566

Closed
adamsitnik opened this issue Jan 26, 2024 · 32 comments · Fixed by #100094
Closed

TypeNameParser API #97566

adamsitnik opened this issue Jan 26, 2024 · 32 comments · Fixed by #100094
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Reflection.Metadata in-pr There is an active PR which will close this issue when it is merged
Milestone

Comments

@adamsitnik
Copy link
Member

adamsitnik commented Jan 26, 2024

Background and Motivation

For years various teams at Microsoft and outside of it had been implementing their own type name parsers.

None of them were good for untrusted input. We want to put an end to it and provide a single, public API for parsing type names from untrusted input.

The new APIs need to be shipped in an OOB package that supports older monikers, as we have first party customers running on Full Framework that are going to use it.

Proposed API

The parser has two modes:

  • Default: ECMA-335 compliant mode where everything that the standard allows for is allowed (used by CLR APIs like Type.GetType).
  • Strict: extends ECMA-335 standard limitations with a set of opinionated rules based on most up-to-date security knowledge (example: no escaping, no null or path characters etc).

To prevent from unbounded recursion for inputs like typeof(List<List<List<List<List<...>>>>>).FullName, the parser introduces the concept of complexity:

  • Represents the total amount of work that needs to be performed to fully inspect given type name, including any generic arguments or underlying or nested types.
  • There's not really a parallel concept to this in reflection. Think of it as the total number of TypeName instances that would be created if you were to totally deconstruct this instance and visit each intermediate TypeName that occurs as part of deconstruction.
  • int and Person each have complexities of 1 because they're standalone types.
  • int[] has a complexity of 2 because to fully inspect it involves inspecting the array type itself, plus unwrapping the underlying type (int) and inspecting that.
  • Dictionary<string, List<int[][]>> has complexity 8 because fully visiting it involves inspecting 8 TypeName instances total:
    • Dictionary<string, List<int[][]>> (the original type)
    • Dictionary2` (the generic type definition)
    • string (a type argument of Dictionary)
    • List<int[][]> (a type argument of Dictionary
    • List`1 (the generic type definition)
    • int[][] (a type argument of List)
    • int[] (the underlying type of int[][])
    • int (the underlying type of int[]

Returned information matches the System.Type APIs: Name, FullName, AssemblyQualifiedName etc.

namespace System.Reflection.Metadata;

public sealed class TypeName : IEquatable<TypeName>
{
    internal TypeName() { }
    
    /// <summary>
    /// The assembly-qualified name of the type; e.g., "System.Int32, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089".
    /// </summary>
    /// <remarks>
    /// If <see cref="GetAssemblyName()"/> returns null, simply returns <see cref="FullName"/>.
    /// </remarks>
    public string AssemblyQualifiedName { get; }
    
    /// <summary>
    /// If this type is a nested type (see <see cref="IsNestedType"/>), gets
    /// the containing type. If this type is not a nested type, returns null.
    /// </summary>
    /// <remarks>
    /// For example, given "Namespace.Containing+Nested", unwraps the outermost type and returns "Namespace.Containing".
    /// </remarks>
    public TypeName? ContainingType { get; }
    
    /// <summary>
    /// The full name of this type, including namespace, but without the assembly name; e.g., "System.Int32".
    /// Nested types are represented with a '+'; e.g., "MyNamespace.MyType+NestedType".
    /// </summary>
    /// <remarks>
    public string FullName { get; }
    
    public bool IsArray { get; }
    public bool IsConstructedGenericType { get; }
    public bool IsElementalType { get; }
    public bool IsManagedPointerType { get; } // name inconsistent with Type.IsByRef
    public bool IsNestedType { get; }
    public bool IsSzArrayType { get; }
    public bool IsUnmanagedPointerType { get; } // name inconsistent with Type.IsPointer
    public bool IsVariableBoundArrayType { get; }
    
    /// <summary>
    /// The name of this type, without the namespace and the assembly name; e.g., "Int32".
    /// Nested types are represented without a '+'; e.g., "MyNamespace.MyType+NestedType" is just "NestedType".
    /// </summary>
    public string Name { get; }

    /// <summary>
    /// Represents the total amount of work that needs to be performed to fully inspect
    /// this instance, including any generic arguments or underlying types.
    /// </summary>
    /// <remarks>
    /// <para>There's not really a parallel concept to this in reflection. Think of it
    /// as the total number of <see cref="TypeName"/> instances that would be created if
    /// you were to totally deconstruct this instance and visit each intermediate <see cref="TypeName"/>
    /// that occurs as part of deconstruction.</para>
    /// <para>"int" and "Person" each have complexities of 1 because they're standalone types.</para>
    /// <para>"int[]" has a complexity of 2 because to fully inspect it involves inspecting the
    /// array type itself, <em>plus</em> unwrapping the underlying type ("int") and inspecting that.</para>
    /// <para>
    /// "Dictionary&lt;string, List&lt;int[][]&gt;&gt;" has complexity 8 because fully visiting it
    /// involves inspecting 8 <see cref="TypeName"/> instances total:
    /// <list type="bullet">
    /// <item>Dictionary&lt;string, List&lt;int[][]&gt;&gt; (the original type)</item>
    /// <item>Dictionary`2 (the generic type definition)</item>
    /// <item>string (a type argument of Dictionary)</item>
    /// <item>List&lt;int[][]&gt; (a type argument of Dictionary)</item>
    /// <item>List`1 (the generic type definition)</item>
    /// <item>int[][] (a type argument of List)</item>
    /// <item>int[] (the underlying type of int[][])</item>
    /// <item>int (the underlying type of int[])</item>
    /// </list>
    /// </para>
    /// </remarks>
    public int TotalComplexity { get; }
    
    /// <summary>
    /// If this type is not an elemental type (see <see cref="IsElementalType"/>), gets
    /// the underlying type. If this type is an elemental type, returns null.
    /// </summary>
    /// <remarks>
    /// For example, given "int[][]", unwraps the outermost array and returns "int[]".
    /// Given "Dictionary&lt;string, int&gt;", returns the generic type definition "Dictionary&lt;,&gt;".
    /// </remarks>
    public TypeName? UnderlyingType { get; }

    public static TypeName Parse(ReadOnlySpan<char> typeName, TypeNameParseOptions? options = null);

    public static bool TryParse(ReadOnlySpan<char> typeName, [NotNullWhenAttribute(true)] out TypeName? result, TypeNameParseOptions? options = null);
    
    public int GetArrayRank();
    
    /// <summary>
    /// Returns assembly name which contains this type, or null if this <see cref="TypeName"/> was not
    /// created from a fully-qualified name.
    /// </summary>
    /// <remarks>Since <seealso cref="AssemblyName"/> is mutable, this method returns a copy of it.</remarks>
    public Reflection.AssemblyName? GetAssemblyName();
    
    /// <summary>
    /// If this <see cref="TypeName"/> represents a constructed generic type, returns a span
    /// of all the generic arguments. Otherwise it returns an empty span.
    /// </summary>
    /// <remarks>
    /// <para>For example, given "Dictionary&lt;string, int&gt;", returns a 2-element span containing
    /// string and int.</para>
    /// </remarks>
    public ReadOnlySpan<TypeName> GetGenericArguments();
}

public sealed class TypeNameParseOptions
{
    public TypeNameParseOptions() { }
    public bool AllowFullyQualifiedName { get; set; } = true;
    
    /// <summary>
    /// Limits the maximum value of <seealso cref="TypeName.TotalComplexity"/> that parser can handle.
    /// </summary>
    public int MaxTotalComplexity { get; set; } = 10;
    
    /// <summary>
    /// Extends ECMA-335 standard limitations with a set of opinionated rules based on most up-to-date security knowledge.
    /// </summary>
    /// <remarks>
    /// When parsing AssemblyName, only Version, Culture and PublicKeyToken attributes are allowed.
    /// The comparison is also case-sensitive (in contrary to <seealso cref="AssemblyName(string)"/> constructor).
    /// </remarks>
    public bool StrictValidation { get; set; } = false;
}

Usage Examples

Sample serialization binder (whole code with tests can be found here):

public override Type? BindToType(string assemblyName, string typeName)
{
    // Fast path for common primitive type names and user-defined type names
    // that use the same syntax and casing as System.Type.FullName API.
    if (TryGetTypeFromFullName(typeName, out Type type))
    {
        return type;
    }

    TypeNameParsingOptions options = new()
    {
        AllowFullyQualifiedName = false,
        MaxTotalComplexity = 10
    };

    if (!TypeName.TryParse(typeName.AsSpan(), out TypeName parsed, options))
    {
        // we can throw any exception, log the information etc
        throw new InvalidOperationException($"Invalid type name: '{typeName}'");
    }

    return GetTypeFromParsedTypeName(parsed);
}

private Type? GetTypeFromParsedTypeName(TypeName parsed)
{
    if (TryGetTypeFromFullName(parsed.FullName, out Type type))
    {
        return type;
    }
    else if (parsed.IsArray)
    {
        TypeName arrayElementTypeName = parsed.UnderlyingType; // equivalent of type.GetElementType()
        Type arrayElementType = GetTypeFromParsedTypeName(arrayElementTypeName); // recursive call allows for creating arrays of arrays etc

        return parsed.IsSzArrayType
            ? arrayElementType.MakeArrayType()
            : arrayElementType.MakeArrayType(parsed.GetArrayRank());
    }
    else if (parsed.IsConstructedGenericType)
    {
        TypeName genericTypeDefinitionName = parsed.UnderlyingType; // equivalent of type.GetGenericTypeDefinition()
        Type genericTypeDefinition = GetTypeFromParsedTypeName(genericTypeDefinitionName);
        Debug.Assert(genericTypeDefinition.IsGenericTypeDefinition);

        ReadOnlySpan<TypeName> genericArgs = parsed.GetGenericArguments();
        Type[] typeArguments = new Type[genericArgs.Length];
        for (int i = 0; i < genericArgs.Length; i++)
        {
            typeArguments[i] = GetTypeFromParsedTypeName(genericArgs[i]); // recursive call allows for generics of generics like "List<int?>"
        }
        return genericTypeDefinition.MakeGenericType(typeArguments);
    }

    throw new ArgumentException($"{parsed.FullName} is not on the allow list.");
}

Risks

Introducing changes to the behavior of strict mode in the future can break the users.
If we ever do that, it will be just to enforce new security best practices and will break only very, very unusual input.

Initial issue description:

I am working on the design of new type parser. It's going to include a new type that represents the parsed type name, the type name parser and most likely an option bag for its customization. It may also include an AssemblyNameParser, but I am not sure whether this is going to be required (nobody has asked for a standalone assembly name parser).

For brevity I am going to call these types: TypeName, TypeNameParser and TypeNameParserOptions.

public sealed class TypeName
{
    // properties that describe parsed type like its generic arguments or array rank
}
public ref struct TypeNameParser
{
    public static TypeName Parse(ReadOnlySpan<char> name, TypeNameParserOptions? options = null);
}
public class TypeNameParserOptions
{
    // properties that describe customizable settings like max allowed recursion depth
}

Before I submit the proposal, I want to have something that:

  • replaces the internal TypeNameParser used by System.Private.CoreLib
  • replaces the internal TypeNameParser used by ILVerification
  • replaces the internal TypeNameParser used by ILCompiler*
  • replaces the parser built by Levi in his private repo
  • can be shipped from dotnet/runtime as part of .NET 9 shared framework and as OOB for .NET Standard 2.0 (so existing customers who target older TFMs can use it for security purposes)
  • all tests are passing everywhere

I'll try to replace the Roslyn parser too, but I can't promise that (please let me know if this is a must have).

The thing I am not sure of is where the mentioned types should belong.

For example, currently the Type.GetType(string name) is part of CoreLib. I believe that my proposal should include a new Type method for loading type from a parsed name:

public class Type
{
    public static Type? GetType(TypeName typeName);
}

So those who have parsed the type name and verified it, could load the type without parsing the type name again. This leads me to thinking, that TypeName should be part of CoreLib.
But can I at the same time ship this type in an OOB package like System.Reflection.Metadata?

  • it targets net9.0, so I would need to exclude it for this TFM
  • it targets netstandard2.0, which could lead into a situation where a .NET 9 apps references a NS2.0 library that references the package and it leads to two types with the same name being loaded and a runtime error when the TypeName from OOB is passed to Type.Load(TypeName) in CoreLib?

I can also simply not extend the Type class and move the Load method to TypeName itself and reference the TypeNameParser as a link in CoreLib, with sth like this:

#if SYSTEM_PRIVATE_CORELIB
    internal
#else
    public
#endif
        struct TypeNameParser

But this would lead into a situation where .NET 9 apps would load two type name parsers: one internal from CoreLib and another, public from the OOB package.

@jkotas @GrabYourPitchforks are there any better solutions?

@adamsitnik adamsitnik added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata labels Jan 26, 2024
@adamsitnik adamsitnik added this to the 9.0.0 milestone Jan 26, 2024
@adamsitnik adamsitnik self-assigned this Jan 26, 2024
@ghost
Copy link

ghost commented Jan 26, 2024

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

Issue Details

I am working on the design of new type parser. It's going to include a new type that represents the parsed type name, the type name parser and most likely an option bag for its customization. It may also include an AssemblyNameParser, but I am not sure whether this is going to be required (nobody has asked for a standalone assembly name parser).

For brevity I am going to call these types: TypeName, TypeNameParser and TypeNameParserOptions.

public sealed class TypeName
{
    // properties that describe parsed type like its generic arguments or array rank
}
public ref struct TypeNameParser
{
    public static TypeName Parse(ReadOnlySpan<char> name, TypeNameParserOptions? options = null);
}
public class TypeNameParserOptions
{
    // properties that describe customizable settings like max allowed recursion depth
}

Before I submit the proposal, I want to have something that:

  • replaces the internal TypeNameParser used by System.Private.CoreLib
  • replaces the internal TypeNameParser used by ILVerification
  • replaces the internal TypeNameParser used by ILCompiler*
  • replaces the parser built by Levi in his private repo
  • can be shipped from dotnet/runtime as part of .NET 9 shared framework and as OOB for .NET Standard 2.0 (so existing customers who target older TFMs can use it for security purposes)
  • all tests are passing everywhere

I'll try to replace the Roslyn parser too, but I can't promise that (please let me know if this is a must have).

The thing I am not sure of is where the mentioned types should belong.

For example, currently the Type.GetType(string name) is part of CoreLib. I believe that my proposal should include a new Type method for loading type from a parsed name:

public class Type
{
    public static Type? GetType(TypeName typeName);
}

So those who have parsed the type name and verified it, could load the type without parsing the type name again. This leads me to thinking, that TypeName should be part of CoreLib.
But can I at the same time ship this type in an OOB package like System.Reflection.Metadata?

  • it targets net9.0, so I would need to exclude it for this TFM
  • it targets netstandard2.0, which could lead into a situation where a .NET 9 apps references a NS2.0 library that references the package and it leads to two types with the same name being loaded and a runtime error when the TypeName from OOB is passed to Type.Load(TypeName) in CoreLib?

I can also simply not extend the Type class and move the Load method to TypeName itself and reference the TypeNameParser as a link in CoreLib, with sth like this:

#if SYSTEM_PRIVATE_CORELIB
    internal
#else
    public
#endif
        struct TypeNameParser

But this would lead into a situation where .NET 9 apps would load two type name parsers: one internal from CoreLib and another, public from the OOB package.

@jkotas @GrabYourPitchforks are there any better solutions?

Author: adamsitnik
Assignees: adamsitnik
Labels:

api-suggestion, area-System.Reflection.Metadata

Milestone: 9.0.0

@vitek-karas
Copy link
Member

This would be useful for illink and NativeAOT ilc which both currently include a copy of type name parser. /cc @sbomer @agocke

@jkotas
Copy link
Member

jkotas commented Jan 26, 2024

But this would lead into a situation where .NET 9 apps would load two type name parsers: one internal from CoreLib and another, public from the OOB package.

Yes, that's fine. I do not see a problem with it.

I do not think that a public Type.Load method that loads the type from runtime type system should be part of this proposal.

@adamsitnik
Copy link
Member Author

adamsitnik commented Feb 1, 2024

@jkotas @GrabYourPitchforks I've started porting System.Private.CoreLib to the new parser and have stumbled upon a type name that uses a single square bracket (instead of two) to represent generic type name:

[InlineData("System.Nullable`1[System.Int32]", typeof(int?))]

I would expect the type name to be represented in following way (this is what Type.FullName returns):

- System.Nullable`1[System.Int32]
+ System.Nullable`1[[System.Int32]]

Should this syntax be supported by the new parser? If so, only for CoreLib for backward compatibility or also by the new public API?

I am asking because it's the only test that is using such syntax (that I could find by running System.Reflection.Tests for now) and I have no idea whether it's supported on purpose or by accident.

@jkotas
Copy link
Member

jkotas commented Feb 1, 2024

Should this syntax be supported by the new parser?

Yes, it should be supported. It has been supported by the runtime APIs for last 20+ years. I do not see a reason why we would drop it.

The double square brackets are required to avoid ambiguity in fully qualified type names. They are not needed for non-fully qualified type names (type names without assembly name).

We do not have a formal spec for the type name grammar. #4416 has an attempt to create one.

@adamsitnik
Copy link
Member Author

@jkotas Thank you!

@adamsitnik
Copy link
Member Author

Progress

I was able to port CLR, Mono, NativeAOT and internal tools to use the new parser (#97864).
The CLR tests are passing locally for me, there is some NativeAOT failure that I am going to handle later. Along with a long list of TODOs.

My current plan is the following:

  1. @jkotas @GrabYourPitchforks Get your seal of approval for the design proposal and get the API review scheduled.
  2. In the meantime:
    a) add missing test coverage for edge case scenarios (separate PR to main with no parser changes, just new tests)
    b) fix edge case handling (names with no namespace that start with a dot, escaped type names) and TODOs
  3. Get the API approved, apply changes and get it merged.

Design: TypeName

I decided to represent type name information with a single sealed class. I would prefer to avoid introducing another type hierarchy.
I also want to keep it simple: just expose the parsed information with no possibility to load types or perform other kinds of actions. We can always add new methods in the future, for now I don't believe we need them.

public sealed class TypeName : IEquatable<TypeName>
{
    internal TypeName() // encapsulation: only parser can create new instances
    public System.Reflection.AssemblyName? AssemblyName { get; }
    public string AssemblyQualifiedName { get; }
    public System.Reflection.Metadata.TypeName? ContainingType { get; } // returns not null for nested types
    public bool IsArray { get; } // true for [], [*] and [,,,]
    public bool IsConstructedGenericType { get; }
    public bool IsElementalType { get; }
    public bool IsManagedPointerType { get; }
    public bool IsNestedType { get; }
    public bool IsSzArrayType { get; } 
    public bool IsUnmanagedPointerType { get; }
    public bool IsVariableBoundArrayType { get; }
    public string Name { get; }
    public System.Reflection.Metadata.TypeName? UnderlyingType { get; }
    public int GetArrayRank();
    public System.Reflection.Metadata.TypeName[] GetGenericArguments();
}

Questions:

  1. Should the new API follow existing Type API naming or use more meaningfull names? Example: IsManagedPointerType vs IsByRef and IsUnmanagedPointerType vs IsPointer.
  2. In Levi's prototype, the GetGenericArguments method returns a copy of arguments to avoid any mutations. It won't protect us from changing values with private reflection. Should it just return a ReadOnlySpan<TypeName> instead of an array?

Design: customization

It seems to me that the most important aspect of design is the customization. Some examples:

  • Allow all white spaces (CLR) or just spaces (safe strict parsing).
  • Allow for escaping characters in type names (CLR) or not (safe strict parsing).
  • Allow all valid Unicode characters (CLR) or just a subset of ASCII or just ASCII (safe type parser).
  • Set recursive depth limit (safe type parsing).

BTW we could also introduce an optional flag for CLR to set the default parsing rules (example: set it to Strict).

I am not sure to what degree we should allow the users for customization. Few ideas I have are listed below.

Option bag with virtual validation methods

We literally expose a possibility to implement custom validation.

public class TypeNameParserOptions
{
    public TypeNameParserOptions() { }
    public bool AllowFullyQualifiedName { get; set; } = true;
    public int MaxRecursiveDepth { get; set; } = int.MaxValue;
    public bool ThrowOnError { get; set; } = true;
    public virtual bool ValidateTypeName(System.ReadOnlySpan<char> candidate); // when it finds invalid character it returns false when ThrowOnError is set to true, throws when it's set to false
    public virtual bool ValidateAssemblyName(System.ReadOnlySpan<char> candidate);
    public virtual ReadOnlySpan<char> TrimStart(ReadOnlySpan<char> input); // span.TrimStart() by default, span.TrimStart(' ') for strict parser
}
public ref struct TypeNameParser
{
    public static TypeName? Parse(ReadOnlySpan<char> typeName, TypeNameParserOptions? options = null);
}

It has a perf penalty: every time we need non-default settings we need to allocate a class (or maintain a static cache) and pay the virtual method overhead (this can disappear in the future with new JIT versions).
It makes it impossible for us to ensure that the users implement proper validation and we may need to validate the input one more time anyway.

Two dedicated parse methods

Parse that does what Type.GetType allows for: all whitespaces, escaped chars and fully qualified names and possiblity to not throw on error.
ParseStrict that allows only for a specific subset of ASCII (no escaping), recursive depth limit and fully qualified names and possibility to not throw on error.

IMO throwOnError should be exposed by the strict parser, as in theory it can prevent the attacker from causing a lot of exceptions being thrown in the process by sending invalid inputs.

public ref struct TypeNameParser
{
    public static TypeName? Parse(ReadOnlySpan<char> typeName, bool allowFullyQualifiedName = true, bool throwOnError = true);
    public static TypeName? ParseStrict(ReadOnlySpan<char> typeName, bool allowFullyQualifiedName = true, bool throwOnError = true, int maxRecursiveDepth = 10);
}

It opens the door for introducing more and more overloads in the future.

@jkotas @GrabYourPitchforks please let me know what you think

@teo-tsirpanis
Copy link
Contributor

public ref struct TypeNameParser

Shouldn't the parsing methods be in TypeName?

@jkotas
Copy link
Member

jkotas commented Feb 3, 2024

System.Reflection.AssemblyName? AssemblyName { get; }

I think parsed AssemblyName is problematic. It would need to have the culture resolved into a CultureInfo that is not hardened against untrusted source. The caller needs to be able to control whether culture gets resolved into CultureInfo.

Should the new API follow existing Type API naming or use more meaningfull names?

In general, we have a strong preference for consistent naming and I would expect it to be the case here as well. I do not think we want to be creative with inventing a new names for existing concepts. I think the names should match 1:1 with the established names used by reflection (System.Type) if possible.

Allow all white spaces (CLR) or just spaces (safe strict parsing).
Allow for escaping characters in type names (CLR) or not (safe strict parsing).
Allow all valid Unicode characters (CLR) or just a subset of ASCII or just ASCII (safe type parser).

I do not see what is unsafe about these. These are very arbitrary policies. Do these policies need to be part of the System.Reflection.Metadata parser? If the caller does not like non-ASCII characters or spaces for whatever reason, they can implement the policy as part of their TypeName to actual type resolver.

public int MaxRecursiveDepth { get; set; } = int.MaxValue;

Do we also need a cap on max total complexity of the type (ie number of TypeName objects that the parser creates internally)?

Option bag with virtual validation methods

Immutable options bag for things like MaxRecursiveDepths makes sense to me. Validation callbacks for names do to not make sense to me.

@jkotas
Copy link
Member

jkotas commented Feb 3, 2024

Could you please create a code example for how to use this parser to build a hardened runtime type name resolver with allow lists, etc. It will help us to resolve some of the API design questions above.

@adamsitnik
Copy link
Member Author

Could you please create a code example for how to use this parser to build a hardened runtime type name resolver with allow lists, etc.

I've analyzed some of our first party customers code bases and used the new API to implement safe SerializationBinder that could replace the existing real-life implementations: https://github.com/adamsitnik/runtime/blob/281c4f32e73591b03d805a0809626efda19bd4a2/src/libraries/System.Reflection.Metadata/tests/Metadata/TypeNameParserSamples.cs#L49-L113

@jkotas PTAL and let me know what do you think about it.

@adamsitnik
Copy link
Member Author

Shouldn't the parsing methods be in TypeName?

Excellent point! I've applied this suggestion and it looks much better.

@jkotas
Copy link
Member

jkotas commented Feb 17, 2024

@jkotas PTAL and let me know what do you think about it.

I do not understand the comment for AllowFullyQualifiedName or what it is trying to protect against. If somebody creates an type name like that is in the comment, TryGetTypeFromFullName is not going to find it and the deserialization is going to fail.

Looks reasonable to me otherwise.

@adamsitnik adamsitnik added api-ready-for-review API is ready for review, it is NOT ready for implementation blocking Marks issues that we want to fast track in order to unblock other important work and removed api-suggestion Early API idea and discussion, it is NOT ready for implementation labels Feb 23, 2024
@bartonjs
Copy link
Member

bartonjs commented Mar 12, 2024

Video

  • All properties that have the same values from something on Type should use the same name as Type.
    • If any are missed in the approval notes here, they should match what it says in Type, not what it says in these notes.
  • IsElementalType is just TotalComplexity == 1. It's cut. (Though maybe coming back)
  • UnderlyingType was split into GetElementType() and GetGenericTypeDefinition()
  • Added AssemblySimpleName as part of answering the question of whether a type is assembly-qualified.
  • There was a lively debate on whether or not the complexity property should exist. If it does exist, the group seemed to be favoring renaming it from TotalComplexity to just Complexity.
  • GetGenericArguments should return ReadOnlyMemory, not ReadOnlySpan.
  • We don't like the word "Strict" in StrictValidation, but no new name was provided yet
    • Maybe even breaking it into separate flags
namespace System.Reflection.Metadata;

public sealed class TypeName : IEquatable<TypeName>
{
    internal TypeName() { }
    
    public string? AssemblySimpleName { get; }
    public string AssemblyQualifiedName { get; }
    
    public TypeName? DeclaringType { get; }

    public string FullName { get; }
    
    public bool IsArray { get; }
    public bool IsConstructedGenericType { get; }
    public bool IsByRef { get; }
    public bool IsNested { get; }
    public bool IsSZArray { get; }
    public bool IsPointer { get; }
    public bool IsVariableBoundArray { get; }
  
    public string Name { get; }

    public int Complexity { get; }
    
    public TypeName GetElementType();
    public TypeName GetGenericTypeDefinition();

    public static TypeName Parse(ReadOnlySpan<char> typeName, TypeNameParseOptions? options = null);

    public static bool TryParse(ReadOnlySpan<char> typeName, [NotNullWhenAttribute(true)] out TypeName? result, TypeNameParseOptions? options = null);
    
    public int GetArrayRank();
    
    public Reflection.AssemblyName? GetAssemblyName();
    
    public ReadOnlyMemory<TypeName> GetGenericArguments();
}

public sealed class TypeNameParseOptions
{
    public TypeNameParseOptions() { }
    public bool AllowFullyQualifiedName { get; set; } = true;
    
    public int MaxTotalComplexity { get; set; } = 10;
    
    public bool StrictValidation { get; set; } = false;
}

@bartonjs bartonjs added api-needs-work API needs work before it is approved, it is NOT ready for implementation api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Mar 12, 2024
@adamsitnik
Copy link
Member Author

If that is not a problem, I would like to extend the proposal with a method that compares the parsed type name against provided type and can take type forwarding into account.

public class TypeName
{
    public bool Matches(Type type, bool includeTypeForwards = true);
}

cc @terrajobst

@jkotas
Copy link
Member

jkotas commented Mar 18, 2024

public bool Matches(Type type, bool includeTypeForwards = true);

This introduces a dependency on System.Type. I think we have said that this API should not depend on System.Type.

@jkotas
Copy link
Member

jkotas commented Mar 18, 2024

Also, it is not clear where the type forwards would come from. Would this API trigger assembly loading and potential execution of code from assemblies involved?

@terrajobst
Copy link
Member

@jkotas this came up in my review of the SafePayloadReader.

I think it's common for consumers to want to match an instance of System.Type against a TypeName. This logic has to live somewhere and I don't think it belongs into a BF specific payload reader. I don't have super strong feelings about putting it on TypeName but it seems sensible to me.

This introduces a dependency on System.Type. I think we have said that this API should not depend on System.Type

FWIW, I think getting a System.Type out of a TypeName is what I'd consider adding a dependency. Accepting it as in input for comparison seems fine IMHO.

Also, it is not clear where the type forwards would come from.

I think the idea is that if the passed in instance of System.Type was marked with TypeForwardedFromAttribute then the comparison would honor that value as an alternative. This would mostly be relevant to BF but in principle any serializer storing type identity could have used TypeForwardedFromAttribute so I think it's fair to treat this as logically being part of TypeName comparisons rather than being part of some BF specific policy.

@jkotas
Copy link
Member

jkotas commented Mar 18, 2024

We do not mark forwarded types with TypeForwardedFromAttribute in dotnet/runtime libraries. Only the small fraction of types that are binary serializable for compatibility are annotate with TypeForwardedFromAttribute. This would be BF specific policy in practice.

@terrajobst
Copy link
Member

terrajobst commented Mar 18, 2024

What's the downside of exposing that policy on TypeName and allowing other parties who want / needs this policy to get to it?

Seems much more sensible than SerializationRecord.IsTypeMatchedBy(Type) which is the current proposal.

@jkotas
Copy link
Member

jkotas commented Mar 18, 2024

What's the downside of exposing that policy on TypeName and allowing other parties who want / needs this policy to get to it?

Binary formatter specific functionality hiding under non-binary formatter specific name that does not express what the API actually does.

The proper way to check whether a type matches type name is to resolve the type name into a type and see whether it produced the same type instance. It is how the runtime or .NET compilers do this check.

The policy implemented by this API has number of nuances: How do you compare versions? How do you compare strong names? There can be a mismatch between where TypeForwardedFromAttribute points to and where that type is actually forwarded to. Different consumers can have different requirements for how to handle these. I think this type of check belongs to binary formatter specific code. If somebody has a clone of binary formatter, they can copy this logic and tweak it any way they want. I expect that it would not be the only piece of binary formatter specific logic copied by binary formatter clones.

@adamsitnik
Copy link
Member Author

If that is not a problem, I would like to extend the proposal with a method that compares the parsed type name against provided type and can take type forwarding into account.

Let me take that back. @jkotas is right, this is not the job of the TypeName API. We need a method that creates a TypeName out of Type and takes type forwarding into account, but this is not the right place for it.

@bartonjs
Copy link
Member

bartonjs commented Mar 19, 2024

Video

  • All properties that have the same values from something on Type should use the same name as Type.
    • If any are missed in the approval notes here, they should match what it says in Type, not what it says in these notes.
  • IsElementalType has been renamed to IsSimple; and we will also add IsSimple to System.Type.
  • UnderlyingType was split into GetElementType() and GetGenericTypeDefinition()
  • Added AssemblySimpleName as part of answering the question of whether a type is assembly-qualified.
  • There was a lively debate on whether or not the complexity property should exist.
    • It morphed into a GetNodeCount() method.
  • GetGenericArguments now returns ImmutableArray
  • I(Span)Parsable/I(Span)Formattable were brought up, and decided that they're not worth the complexity (since this type will be available for netstandard)
  • We don't like the word "Strict" in StrictValidation, but no new name was provided yet
    • Maybe even breaking it into separate flags
    • It is removed for now, may come back in the future
  • TypeNameParseOptions.AllowFullyQualifiedName has been removed
  • TypeNameParseOptions.MaxTotalComplexity is now MaxNodes
  • MaxNodes will have a default value, less than int.MaxValue, but the actual number is TBD.
namespace System.Reflection.Metadata
{
    public sealed class TypeName : IEquatable<TypeName>
    {
        internal TypeName() { }
        
        public string? AssemblySimpleName { get; }
        public string AssemblyQualifiedName { get; }
        
        public TypeName? DeclaringType { get; }

        public string FullName { get; }
        
        public bool IsArray { get; }
        public bool IsConstructedGenericType { get; }
        public bool IsByRef { get; }
        public bool IsNested { get; }
        public bool IsSZArray { get; }
        public bool IsPointer { get; }
        public bool IsVariableBoundArray { get; }
      
        public string Name { get; }

        public bool IsSimple { get; }
        
        public TypeName GetElementType();
        public TypeName GetGenericTypeDefinition();

        public static TypeName Parse(ReadOnlySpan<char> typeName, TypeNameParseOptions? options = null);

        public static bool TryParse(ReadOnlySpan<char> typeName, [NotNullWhenAttribute(true)] out TypeName? result, TypeNameParseOptions? options = null);
        
        public int GetArrayRank();
        
        public Reflection.AssemblyName? GetAssemblyName();
        
        public ImmutableArray<TypeName> GetGenericArguments();

        public int GetNodeCount();
    }

    public sealed class TypeNameParseOptions
    {
        public TypeNameParseOptions() { }

        public int MaxNodes { get; set; } = 10;
    }
}

namespace System
{
    partial class Type
    {
        public bool IsSimple { get; }
    }
}

@bartonjs bartonjs added api-approved API was approved in API review, it can be implemented and removed blocking Marks issues that we want to fast track in order to unblock other important work api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 19, 2024
@jaredpar
Copy link
Member

public override Type? BindToType(string assemblyName, string typeName)

Have you considered how this round tripping will work in the face of say function pointers? Consider the following code:

var type = typeof(C);
var member = type.GetMethod("M")!;
var parameters = member.GetParameters();
Console.WriteLine(parameters[0].ParameterType.AssemblyQualifiedName);

class C {
    public unsafe static void M(delegate*<int, void> f) {} 
}

The type of f here is a function pointer type but it's string representation is actually IntPtr

System.IntPtr, System.Private.CoreLib, Version=7.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e

That means there is some amount of lossy-ness when round tripping between a type in metadata -> TypeName and BindToType. Not sure if that is a significant barrier for this API or not.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

I think defaulting MaxNodes to 10 is too low. Especially if every generic type counts as 2 nodes (for the generic type definition & for the constructed generic type), and that's in addition to all the type parameters. IMO it should be more like 50, maybe even 100.

@bartonjs
Copy link
Member

Especially if every generic type counts as 2 nodes (for the generic type definition & for the constructed generic type),

A generic type counts as the sum of the parameter node counts, plus 1.

  • List<int> is 2, List<int> and int.
  • List<Dictionary<string, int>> is 4: List<Dictionary<string, int>> (4), Dictionary<string, int> (3), string, int
  • List<Dictionary<string[], int>> is 5: List<Dictionary<string[], int>> (5), Dictionary<string[], int> (4), string[] (2), string, and int.

It's basically the number of words that appear in the type name, plus the number of [], plus the number of *s.

*  List<Dictionary<string[], int>>:
     GetGenericArguments() => (1):
*      Dictionary<string[], int>
         GetGenericArguments() => (2):
*          string[]
             GetElementType() =>
*              string (IsSimple)
*          int (IsSimple)

I think defaulting MaxNodes to 10 is too low.

The final number is TBD, they have to do some research to come up with something that looks like it can handle any "reasonable" type.

IMO it should be more like 50, maybe even 100.

If there's a type that you can envision someone actually working with which has a node count that high, please share it. This is 50:

ValueTuple<int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], int[][], string>

or

List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<List<int>>>>>(etc)

@Neme12
Copy link

Neme12 commented Mar 19, 2024

A generic type counts as the sum of the parameter node counts, plus 1.

  • List<int> is 2, List<int> and int.
  • List<Dictionary<string, int>> is 4: List<Dictionary<string, int>> (4), Dictionary<string, int> (3), string, int
  • List<Dictionary<string[], int>> is 5: List<Dictionary<string[], int>> (5), Dictionary<string[], int> (4), string[] (2), string, and int.

This is what I would expect, yes. And I think it would be more intuitive if it worked this way. But this is not what the original post says:

  • Dictionary<string, List<int[][]>> has complexity 8 because fully visiting it involves inspecting 8 TypeName instances total:

    • Dictionary<string, List<int[][]>> (the original type)
    • Dictionary2` (the generic type definition)
    • string (a type argument of Dictionary)
    • List<int[][]> (a type argument of Dictionary
    • List`1 (the generic type definition)
    • int[][] (a type argument of List)
    • int[] (the underlying type of int[][])
    • int (the underlying type of int[]

It says here that a generic type like Dictionary<string, List<int[][]>> counts as the generic type definition (Dictionary`2) plus the constructed generic type (Dictionary<string, List<int[][]>>) plus the type parameters. Not just the generic type plus the type parameters. So the sum of parameter nodes plus 2.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

Why doesn't this include an API to get a Type instance given a TypeName (Type.GetType(TypeName))? If someone already has a TypeName for whatever reason, it seems wasteful to then have to ToString it only for GetType to have to parse it again. I know people shouldn't do this with untrusted input. But not everyone has untrusted input (or people could have already validated the type name). Saying that this API shouldn't exist because people might call it with untrusted input, is like saying that Type.GetType(string) itself shouldn't exist because people might call it with a string from untrusted input. Yet it exists and it's obviously useful.

If someone ever needs this, they're just going to do the less efficient thing of GetType(TypeName.ToString()) and be done with it instead of asking for this API..

And vice versa - Type.TypeName

@jaredpar
Copy link
Member

Why doesn't this include an API to get a Type instance given a TypeName?

Part of my push back would be that TypeName doesn't necessarily uniquely identify a type. A type name is only unique when it is assembly qualified. There is no guarantee that is the case with TypeName.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

Going back to TypeNameParseOptions.AllowFullyQualifiedName (which I think should be called AllowAssemblyQualifiedName if it existed, to match TypeName.AssemblyQualifiedName), while I agree it might be less intuitive if I set it to false for it to disallow the assembly name for the top level type name but allow it for the generic type parameters, in practice, that is the right behavior for anyone who needs this flag (e.g. for something like Assembly.GetType, the top level type name shouldn't be assembly qualified, but the type parameters still should be. Or in any case where you have an AssemblyName separately from a TypeName - the nested types don't necessarily have to be from the same assembly). Although in those cases it's not only important to ensure that the top level type isn't assembly qualified, but also that all the other types inside it are, so this flag wouldn't be sufficient anyway.1 I could definitely see a use case for a flag that requires every type to be assembly qualified though, if you want your TypeName to uniquely identify a type.

1 EDIT: I was wrong, apparently you don't need that

@Neme12
Copy link

Neme12 commented Mar 19, 2024

Part of my push back would be that TypeName doesn't necessarily uniquely identify a type. A type name is only unique when it is assembly qualified. There is no guarantee that is the case with TypeName.

Well, yes, but there's Type.GetType(string) and string doesn't necessarily uniquely identify a type either 😄 and TypeName is certainly more restrictive in that it allows a lot fewer invalid inputs than a string.

@Neme12
Copy link

Neme12 commented Mar 19, 2024

If there's a type that you can envision someone actually working with which has a node count that high, please share it. This is 50:

But this is 10:

ValueTuple<List<string>, Dictionary<string, int>, int>

This certainly doesn't look unreasonable to me. I have definitely written type names that are this long. And there should probably be a good amount of wiggle room. It's not like someone is likely to hit OOM by parsing a type name with 50 nodes (unless the strings are too long, but those are not part of the calculation anyway).

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Mar 21, 2024
@github-actions github-actions bot locked and limited conversation to collaborators May 25, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Reflection.Metadata in-pr There is an active PR which will close this issue when it is merged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants