Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Opinionated restrictions for TypeName parsing #100920

Open
adamsitnik opened this issue Apr 11, 2024 · 9 comments
Open

[API Proposal]: Opinionated restrictions for TypeName parsing #100920

adamsitnik opened this issue Apr 11, 2024 · 9 comments
Assignees
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it
Milestone

Comments

@adamsitnik
Copy link
Member

Background and Motivation

Over the years @GrabYourPitchforks has built an opinionated parser for type name parsing from untrusted input.

The parser has been adopted internally at Microsoft and for .NET 9 we have decided to productize it (and :shipit: ), in order to help us with BinaryFormatter removal effort.

Based on feedback from @jkotas who noticed the problem of having multiple type name parsers implemented by various teams, we have decided to extend the parser with the possibility to parse all type names supported by CLR itself.

The initial proposal for TypeName #97566 contained a very vague boolean flag StrictValidation. In this proposal I would like to introduce two [Flags] enum that allow the users to enable opinionated restrictions.

Note: From my perspective (I don't have security background and I am not the original author of the parser), the flags can be grouped into few categories:

  1. Preventing obvious attacks (examples: using path characters in the assembly name to load the assembly from a disk and perform remote code execution, using non-predefined cultures)
  2. Rejecting valid input that can be used as a vector of attack (example: not following the backtick convention and specifying name like List`1[[DateTime], [Attacker]])
  3. Rejecting valid input that was clearly mangled (example: the names provided by Type.FullName never use whitespaces other than space, while the CLR allows for tabs)

I don't have strong opinions about the last category, but I believe than 1&2 are a must have.

Proposed API

namespace System.Reflection.Metadata;

[Flags]
public enum TypeNameRestrictions
{
    /// <summary>
    /// No restrictions beside default CLR parsing rules with all backward compatibility handling.
    /// </summary>
    None = 0,
    /// <summary>
    /// Type names that contain escaped characters are rejected.
    /// <para>Example: "Not\+Nested".</para>
    /// <para>Example: "NotAPointer\*".</para>
    /// <para>Such names are invalid for C# and F#, require using pure IL (for example via Reflection Emit).</para>
    /// </summary>
    NoEscaping = 1 << 0,
    /// <summary>
    /// Type names that contain quote characters are rejected.
    /// <para>Example: "Quotes\"AreNotAllowed".</para>
    /// <para>Example: "Quote'IsNotAllowed".</para>
    /// </summary>
    NoQuoteCharacters = 1 << 1,
    /// <summary>
    /// Type names that contain NULL character are rejected.
    /// <para>Example: "Na\0me".</para>
    /// </summary>
    NoNullCharacter = 1 << 2,
    /// <summary>
    /// Type names that use whitespaces other than space are rejected.
    /// <para>Example: "Name\t".</para>
    /// <para><seealso cref="Type.FullName"/> never uses such characters.</para>
    /// </summary>
    OnlySpaces = 1 << 3,
    /// <summary>
    /// The number provided after a backtick must specify exact number of generic type arguments.
    /// <para>Example: "List`1[[DateTime], [Attacker]]" gets rejected.</para>
    /// </summary>
    BacktickConvention = 1 << 4,
    /// <summary>
    /// Enforces the parser to reject type names that describe invalid types.
    /// <para>Example: byref of byref (Name&amp;&amp;).</para>
    /// <para>Example: arrays with rank larger than 32 (Name[,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,]).</para>
    /// </summary>
    OnlyLegalTypes = 1 << 5,
    /// <summary>
    /// Enforces all restrictions.
    /// </summary>
    All = ~None
}

[Flags]
public enum AssemblyNameRestrictions
{
    /// <summary>
    /// No restrictions beside default CLR parsing rules with all backward compatibility handling.
    /// </summary>
    None = 0,
    /// <summary>
    /// Assembly names that contain path characters are rejected.
    /// <para>Example: "./../PathToA.dll".</para>
    /// </summary>
    NoPathCharacters = 1 << 0,
    /// <summary>
    /// Assembly names that contain NULL character are rejected.
    /// <para>Example: "Na\0me".</para>
    /// </summary>
    NoNullCharacter = 1 << 1,
    /// <summary>
    /// Assembly names that contain unconsumed trailing characters are rejected.
    /// <para>Example: "Name, ".</para>
    /// </summary>
    NoUnconsumedTrailingCharacters = 1 << 2,
    /// <summary>
    /// Assembly names that use whitespaces other than space are rejected.
    /// <para>Example: "Name,\tVersion=1.2.3.4".</para>
    /// <para><seealso cref="Assembly.FullName"/> never uses such characters.</para>
    /// </summary>
    OnlySpaces = 1 << 3,
    /// <summary>
    /// Assembly names that contain obsolete <seealso cref="AssemblyName"/> attributes are rejected.
    /// <para>Example: "Name, Architecture=x86".</para>
    /// <para>Example: "Name, CodeBase=file://path".</para>
    /// </summary>
    NoObsoleteAttributes = 1 << 4,
    /// <summary>
    /// Unrecognized attributes are rejected.
    /// <para>Example: "Name, Custom=value".</para>
    /// </summary>
    NoUnrecognizedAttributes = 1 << 5,
    /// <summary>
    /// Assembly names that contain duplicate attributes are rejected.
    /// <para>Example: "Name, Version=1.2.3.4, Version=1.2.3.4".</para>
    /// </summary>
    NoDuplicateAttributes = 1 << 6,
    /// <summary>
    /// Exact casing is used to match attributes.
    /// <para>Example: "Name, Version=1.2.3.4" is allowed, but "Name, version=1.2.3.4" is not.</para>
    /// </summary>
    ExactCasing = 1 << 7,
    /// <summary>
    /// Assembly names that use custom, non-predefined cultures are rejected.
    /// <para>Example: "Name, Culture=en-US_XYZ".</para>
    /// </summary>
    OnlyPredefinedCultures = 1 << 8,
    /// <summary>
    /// Enforces all restrictions.
    /// </summary>
    All = ~None
}

public sealed class TypeNameParseOptions
{
    // Existing: int MaxNodes
    public TypeNameRestrictions TypeNameRestrictions { get; set; } = TypeNameRestrictions.BacktickConvention;
    public AssemblyNameRestrictions AssemblyNameRestrictions { get; set; } = AssemblyNameRestrictions.None;
}

Assuming that #100867 gets approved, it's going to require following changes:

public sealed class AssemblyNameInfo : IEquatable<AssemblyNameInfo>
{
-    public static AssemblyNameInfo Parse(ReadOnlySpan<char> assemblyName);
-    public static bool TryParse(ReadOnlySpan<char> assemblyName, [NotNullWhenAttribute(true)] out AssemblyNameInfo? result);
+    public static AssemblyNameInfo Parse(ReadOnlySpan<char> assemblyName, AssemblyNameRestrictions restrictions = AssemblyNameRestrictions.None);
+    public static bool TryParse(ReadOnlySpan<char> assemblyName, [NotNullWhenAttribute(true)] out AssemblyNameInfo? result, AssemblyNameRestrictions restrictions = AssemblyNameRestrictions.None);
}

Usage Examples

TypeNameParsingOptions options = new()
{
    TypeNameRestrictions = TypeNameRestrictions.All,
    AssemblyNameRestrictions = AssemblyNameRestrictions.All
};
    
TypeName parsed = TypeName.Parse(typeName, options);

Alternative Designs

We could implement those restrictions on top of the existing TypeName API, but it would:

  • be hard to do right and require us to repeat a LOT of logic
    • example: to enforce backtick convention we would need to re-parse part of the input again, find the beginning of generic arguments, then the backtick, ensure it's not escaped, try to parse the number and handle all edge cases like int overflow.
  • would require maintaining another package (finding a name, repo, release schedule, patching strategy and OWNERS),
  • would make it harder to discover and use for the end users. And this is against our ultimate goal of making .NET apps secure.

And again, it was not our goal in the first place (we needed an opinionated, secure parser, not a general-purpose parser)

Risks

Using int-based enumeration limits us to only 32 flags, but it should be enough.

@adamsitnik adamsitnik added api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata labels Apr 11, 2024
@adamsitnik adamsitnik added this to the 9.0.0 milestone Apr 11, 2024
@adamsitnik adamsitnik self-assigned this Apr 11, 2024
Copy link
Contributor

Tagging subscribers to this area: @dotnet/area-system-reflection-metadata
See info in area-owners.md if you want to be subscribed.

@colejohnson66
Copy link

colejohnson66 commented Apr 11, 2024

BacktickConvention sounds weird IMO. EnforceGenericArity would be more clear about what it means.

@jkotas
Copy link
Member

jkotas commented Apr 11, 2024

Based on feedback from @jkotas who noticed the problem of having multiple type name parsers implemented by various team

More context: My specific objection was with having yet another type name parser mantained by .NET team.

names are invalid for C# and F#

The options do not cover all possible names that are invalid in C#. For example, emojis are invalid in C# type names too. Would it make more sense for the opionated hardened layer built on top of this to scan the input for allowed/disallowed characters using something like IndexOfAny (no parsing required)?

OnlyLegalTypes

The interpretation of this rule is runtime-version specific. For example, byref of byref is not legal today, but it may be legal tomorrow. There are number of rules that you can invent for given runtime version. For example, generics of byref or generics of pointers are not legal today but may become legal tomorrow. This option is trying to partially duplicate type validation checks done by runtime type loader for given .NET version that sounds like a bad idea to me.

Rejecting valid input that was clearly mangled

Nit: This assumes that the input was generated by .NET type name formatter. The type and assembly names are frequently hand-authored and it is not unusual to see casing and white space differences in hand-authored names.

OnlyPredefinedCultures

Is the list of pre-defined cultures going to depend on the ICU that happens to be installed on the machine? I think that having the type parser to have different behavior from machine to machine is poor behavior, and it may actually open you to interesting security issues.

Do we actually need this given the AssemblyNameInfo proposal?

be hard to do right and require us to repeat a LOT of logic
example: to enforce backtick convention we would need to re-parse part of the input again, find the beginning of generic arguments, then the backtick, ensure it's not escaped, try to parse the number and handle all edge cases like int overflow.

I am not convinced by this example. All you need is to check that the name ends with $"`{expectedGenericArity}". No need to parse anything.

@jkotas
Copy link
Member

jkotas commented Apr 11, 2024

would require maintaining another package (finding a name, repo, release schedule, patching strategy and OWNERS)

We have been saying that there is going to be a hardened binary serializer type resolver (with allow lists, etc.) built on top the core type name parser. Most (if not all) of these rules can be enforced by the hardened binary serializer type resolver. No new package required. Why is that not possible?

@adamsitnik
Copy link
Member Author

Most (if not all) of these rules can be enforced by the hardened binary serializer type resolver. No new package required. Why is that not possible?

BinaryFormatter is being moved to the OOB package, which will be marked as unsafe/unsecure/BAD from day 1. If we release the allow-list binder (it's on our TODO list but it's not committed, more like a nice to have if we have enough time), it will be released in that new OOB package. If we release the high-level TypeName validator in that package, only those who install this package can use it. But there are other serializers that may need to use secure type name parsing and I am afraid they can not have unsafe/unsecure dependency.

@adamsitnik
Copy link
Member Author

adamsitnik commented Apr 12, 2024

The interpretation of this rule is runtime-version specific

You are right, I am going to remove it from the proposal.

This assumes that the input was generated by .NET type name formatter. The type and assembly names are frequently hand-authored and it is not unusual to see casing and white space differences in hand-authored names

Our main target for the opinionated restrictions are serializers, as they usually work with untrusted input. Do you really believe that there are serializers which implement their own type-name formatting rather than using Type.FullName or Type.AssemblyQualifiedName?

Is the list of pre-defined cultures going to depend on the ICU that happens to be installed on the machine? I think that having the type parser to have different behavior from machine to machine is poor behavior, and it may actually open you to interesting security issues.

The implementation is going to use CultureInfo.GetCultureInfo(cultureName, predefinedOnly: true), so yes, it may be machine-specific. But lets consider all three scenarios:

  1. The culture name is valid and installed: everything works like a charm.
  2. The culture name is valid, but not installed: instead of getting the exception during CultureInfo creation the user gets it during parsing (a little bit earlier). So it would fail in both cases, but with different exceptions?
  3. The culture is invalid (and not installed of course): we prevent from attack, which IMO is the win (and from my understanding this is our goal with the binary formatter removal)

Do we actually need this given the AssemblyNameInfo proposal?

Yes and no. CultureInfo won't be created during TypeName parsing, but there is nothing to stop the users from calling TypeName.AssemblyName.ToAssemblyNameInfo() and create it anyway.

All you need is to check that the name ends with $"`{expectedGenericArity}"

And also that it does not end with $"`\{expectedGenericArity}"?

@jeffhandley
Copy link
Member

If we release the allow-list binder (it's on our TODO list but it's not committed, more like a nice to have if we have enough time), it will be released in that new OOB package. If we release the high-level TypeName validator in that package, only those who install this package can use it. But there are other serializers that may need to use secure type name parsing and I am afraid they can not have unsafe/unsecure dependency.

I'm not yet sure what the shipping vehicle would be for an allow-list binder, so if we do produce a type name validator as a by-product of that feature, we could evaluate the best shipping vehicle(s)--either together or separate. That binder would likely be the first consumer of this logic. Taking inspiration from @jkotas's comments, we could choose to defer this proposal here until we've built that binder, prove the need for the logic to be reusable in other contexts, and then consider refactoring the logic out into a natural place for reuse.

@jkotas
Copy link
Member

jkotas commented Apr 15, 2024

Do you really believe that there are serializers which implement their own type-name formatting

Our own binary serializer can take the assembly name from TypeForwardedFrom attribute. TypeForwardedFrom attributes are hand-authored and so they are susceptible to both intentional and unintentional formatting variations. For example, run the following and look at the serialized payload:

using System.Runtime.CompilerServices;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

using (var fs = new FileStream("temp.bin", FileMode.Create))
{
    var bf = new BinaryFormatter();
    bf.Serialize(fs, new MyType());
}

[TypeForwardedFrom("MyAssembly, PUBLICkeyTOKEN=b77a5c561934e089")]
[Serializable]
class MyType
{
}

I won't be surprised if other binary-formatter like serializers that store type names into the payload have additional paths to introduce atypical formatting of type names.

And also that it does not end with $"`{expectedGenericArity}"?

I am not sure what you meant by this exactly. I agree that you can invent many empirical rules for validating type names. For example, you can create a rule that enforces that the type name does not start with a digit (System.Reflection.Metadata does not look like a good home for this sort of empirical rules).

@jkotas jkotas added the binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it label May 29, 2024
@steveharter
Copy link
Member

Moving to v10 as we have reached the "feature complete" milestone.

See also
#100094
#101910

@steveharter steveharter modified the milestones: 9.0.0, 10.0.0 Jul 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Reflection.Metadata binaryformatter-migration Issues related to the removal of BinaryFormatter and migrations away from it
Projects
None yet
Development

No branches or pull requests

5 participants