Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add [CallerMustBeUnsafe] attribute to denote APIs which should be called in an unsafe block #31354

Open
GrabYourPitchforks opened this issue Oct 31, 2019 · 16 comments
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime.CompilerServices code-analyzer Marks an issue that suggests a Roslyn analyzer Security
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

Per dotnet/roslyn-analyzers#972, there's an outstanding feature for a Roslyn analyzer which would require a caller to use an unsafe block to make an API call that's pointer-equivalent dangerous but which doesn't take pointers.

For example:

public static Span<T> ToMutableSpan(ReadOnlySpan<T> span)
{
    // When the Roslyn analyzer is active, the following code will produce a warning
    // unless it is wrapped within an "unsafe" block.

    return MemoryMarshal.CreateSpan(ref Unsafe.AsRef(in MemoryMarshal.GetReference(span)), span.Length);
}

Whether that feature exists as a standalone analyzer or whether it gets moved into Roslyn proper (see dotnet/roslyn#8663), we still need to define the attribute in corefx and apply it to the appropriate methods.

API proposal

namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Field | AttributeTargets.Method | AttributeTargets.Property, AllowMultiple = false, Inherited = true)]
    public sealed class CallerMustBeUnsafeAttribute : Attribute
    {
    }
}

In this proposal I'm not applying AttributeTargets.Class or AttributeTargets.Struct because I don't want constructs like Type t = typeof(SomeType) to require an unsafe block. Technically AttributeTargets.Property isn't needed because the property getter / setter can be annotated (same with events), but allowing it on properties seems like a decent convenience.

The prime candidates to annotate are most methods on the MemoryMarshal type, some constructor-bypassing logic in FormatterServices, and any "fast" object factories which allow bypassing normal constructor validation.

Edit: I suppose since we've said that we see ArrayPool<T> as a malloc equivalent and since it now returns uninitialized memory we should annotate its methods as well, but due to how widely used that type is that risks opening a huge can of worms.

@Wraith2
Copy link
Contributor

Wraith2 commented Oct 31, 2019

The prime candidates to annotate are most methods on the MemoryMarshal type, some constructor-bypassing logic in FormatterServices, and any "fast" object factories which allow bypassing normal constructor validation.

And some of the Unsafe methods?

@GrabYourPitchforks
Copy link
Member Author

And some of the Unsafe methods?

Maybe. On one hand the developer already quite literally wrote Unsafe.<something> in their code, so requiring an unsafe block seems redundant. But there is something to be said for consistency.

@huoyaoyuan
Copy link
Member

Today, unsafe means enable the unsafe syntax. With this analyzer, it will mean "doing unsafe things". It may conflict with some analyzer saying "unnecessary 'unsafe' modifier".

@GrabYourPitchforks
Copy link
Member Author

@huoyaoyuan You raise a good point. We need to set a bar for ourselves on where we want to apply this attribute. If the unsafe keyword right now essentially means "let me do pointer-related stuff," then maybe we set the bar to "apply it to wherever an API allows you do to something that would normally require a pointer." This gets a little hairy though, since there are some things that are pointer-esque (e.g., reading uninitialized memory or bypassing normal visibility or type-safety checks), and it's not immediately clear where things should fall. I'm leaning toward annotating things that would normally require writing unverifiable IL, but feedback is always appreciated.

@huoyaoyuan
Copy link
Member

huoyaoyuan commented Oct 31, 2019

The safety of an API may also varies. For example Unsafe.As, it can be super dangerous like As<T[], SZArrayHelper<T>>, or super safe like As<float, int>

@BreyerW
Copy link

BreyerW commented Oct 31, 2019

I would like to raise a point about forward-compatibility - will this analyzer be optional? And if yes be on or off by default on net 5?

These apis may be rarely used by avg developer but many companies develop programs that have at least one bottleneck where good throughput is bare minimum. If this analyzer will be forced in net 5 (via integrating directly to roslyn without ability to disable this particular feature) then upgrading from net core 3 and ealier to net 5 may suddenly be painful enough to put a stop to such endeavor due to errors (even if they will be mere warnings thats still a bit of a problem since it isnt unusual for companies to have warnings as errors by default) popping in many places where performance is critical.

For completely new apis developed exclusively for net 5 and later forced unsafe block isnt a problem but for anything ealier it may be problematic. Just my 2 cents.

@GrabYourPitchforks
Copy link
Member Author

@huoyaoyuan I already have a similar API proposal at https://github.com/dotnet/corefx/issues/42133. One of the downsides is that it might allow an explosion of APIs since there are many different combinations we'd want to allow. As a strawman, I wonder if it might be better to ship an API Safe.Cast<T, U>(...) which will allow the conversion only if T and U are primitive numeric types and have the same width. It'd mean the call might fail at runtime, but it should be straightforward to write a code analyzer which could enforce correct usage.

@dasMulli
Copy link
Contributor

I also think that mixing language-safe/unsafe concepts with API safe/unsafe is not very good.

The unsafe keyword in C# denotes that you agree to loose promises that the C# language gives you otherwise, thus being able to use additional syntax and language features. It means that the compiler cannot protect you at language level that what you do is okay.

Now there are a lot of APIs that do things that can break you in recoverable and non-recoverable ways. For methods that shift sanity-checking on the programmer, these have been traditionally called UnsafeXYZ or are on the Unsafe class. But nearly everything that goes a little bit deeper than traditional safe APIs can end up bad (e.g. function pointers returned by the marshaller with the referenced delegate already been garbage collected).

Mixing the language concern with the API concern is theoretically possible, but then it needs to be done for all sorts of APIs and possibly still can't protect you from the results of using these APIs outside of unsafe contexts; think modifying a mutable span of a string - you would need an unsafe block to do that casts but if you later on use that mutable span elsewhere to do modifications you might be causing bad things to happen (and hours spent debugging because this will look like spooky action at a difference with interned strings). Many programmers work on the same project and if one opts into some unsafe actions for implementing a safe api, it doesn't mean that other programmers using safe APIs won't then be able to run into issues using this safe api in a different way.

Another concern of mine is that an "you need 'unsafe' here" diagnostic will only make people enable unsafe blocks and use the language feature without thinking too much about it. Because that will be the StackOverflow answer explaining to you how to use that method. No thinking about how that will impact your program involved.

@GrabYourPitchforks
Copy link
Member Author

Would also likely apply to https://github.com/dotnet/corefx/issues/32582.

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 5.0 milestone Feb 1, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@maryamariyan maryamariyan removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@terrajobst terrajobst added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review labels Jun 12, 2020
@terrajobst
Copy link
Member

terrajobst commented Jun 12, 2020

Video

  • The idea is interesting, but we have concerns:
    • Which kind of APIs would we apply this too? Every p/Invoke? CollectionMarshal.GetSpanForList()?
    • The name should reflect what the method does, rather than what mechanically has to happen on the call side.
    • Would we turn this on for existing APIs? If so, it seems this analyzer needs to opt-in. If it's opt-in, is it still useful?
namespace System.Runtime.CompilerServices
{
    [AttributeUsage(AttributeTargets.Field |
                    AttributeTargets.Method |
                    AttributeTargets.Property,
                    AllowMultiple = false,
                    Inherited = true)]
    public sealed class CallerMustBeUnsafeAttribute : Attribute
    {
    }
}

@GrabYourPitchforks
Copy link
Member Author

/cc @bartonjs @stephentoub

This may have non-obvious impact on generated code. There was previously discussion about [DllImportGenerator]. But see also this logic from the new regex source generators.

// For strings more than two characters and when performing case-sensitive searches, we try to do fewer comparisons
// by comparing 2 or 4 characters at a time. Because we might be compiling on one endianness and running on another,
// both little and big endian values are emitted and which is used is selected at run-time.
ReadOnlySpan<byte> byteStr = MemoryMarshal.AsBytes(str.AsSpan());
bool useMultiCharReads = !caseInsensitive && byteStr.Length >= sizeof(uint);
if (useMultiCharReads)
{
writer.WriteLine($"byteSpan = global::System.Runtime.InteropServices.MemoryMarshal.AsBytes({textSpanLocal});");
}

Here, MemoryMarshal.AsBytes<T>(Span<T>) would likely be annotated as [CallerMustBeUnsafe] (see #41418). We could probably add an overload MemoryMarshal.AsBytes(Span<char>) not annotated as [CallerMustBeUnsafe], and that would be preferred during overload resolution, but it does merit further thought so that we don't break these scenarios.

Maybe a better solution would be to exclude generated code from [CallerMustBeUnsafe] enforcement? If we assume that generated code is better tested and won't generally make the same types of mistakes that human-written code would, then I think even by excluding such code we'd still be within the spirit of this proposal.

@ghost ghost added the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Nov 30, 2021
@bartonjs
Copy link
Member

Either excluding generated code, or just having the generator throw the pragma out saying it promises that it knows what it's talking about, seem fine.

@stephentoub
Copy link
Member

stephentoub commented Nov 30, 2021

or just having the generator throw the pragma out saying it promises that it knows what it's talking about

Note the cited RegexGenerator.Emitter.cs already does that for a few warnings; this would "just" be one (or more) more.

private static readonly string[] s_headers = new string[]
{
"// <auto-generated/>",
"#nullable enable",
"#pragma warning disable CS0162 // Unreachable code",
"#pragma warning disable CS0164 // Unreferenced label",
"#pragma warning disable CS0168 // Variable declared but never used",
"#pragma warning disable CS0219 // Variable assigned but never used",
"",
};

That said, I also opened an issue about trying to drive that list to zero 😄
#61666

@dasMulli
Copy link
Contributor

Regarding naming: Should this be exclusive to C# or is there an expectation that the presence of a CallerMustBeUnsafe attribute has an effect (produces diagnostic) on languages that do not have a language-level concept of „unsafe“ like C#?

If not so, then I’d favor naming it agnostic to language constructs - e.g. [UnsafeOperation] / [MemoryUnsafeOperation] - that could have semantically meaningful diagnostics and resolutions („suppress warning about calling members that perform unsafe operations“) regardless of the language used. So only the C# analyzer would require a using block and/or AllowUnsafeBlocks options while analyzers / linters for other languages could use the same message suppression mechanism as they would for any other diagnostic.

@joperezr joperezr removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Jan 25, 2022
@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 28, 2022

Discussion on this topic in Discord just now came up with names like [RequiresUnsafeContext] and [UnsafeCallersOnly] (the latter having nice parity with the recent [UnmanagedCallersOnly]).

I'm actually liking [UnsafeCallersOnly] for that reason, so I'm putting my vote in that hat.

@teo-tsirpanis
Copy link
Contributor

How about [UnsafeToCall]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Runtime.CompilerServices code-analyzer Marks an issue that suggests a Roslyn analyzer Security
Projects
No open projects
Development

No branches or pull requests