Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC.AllocateUninitializedArray should tighten its logic around when to return zeroed arrays #13292

Closed
GrabYourPitchforks opened this issue Aug 21, 2019 · 8 comments

Comments

@GrabYourPitchforks
Copy link
Member

In dotnet/coreclr#24504, the ArrayPool<T>-derived types were changed to use GC.AllocateUninitializedArray<T>(...) under the covers as a performance optimization. The AllocateUninitializedArray method uses RuntimeHelpers.IsReferenceOrContainsReferences as an implementation detail when determining whether to return a zeroed array or an uninitialized array back to the caller.

https://github.com/dotnet/coreclr/blob/af9edac7acd4735eebd7a293b27b1c509989dae8/src/System.Private.CoreLib/src/System/GC.cs#L665-L668

This logic is a little too relaxed. It appears that the AllocateUninitializedArray method is trying to ask the question "Is any arbitrary bit pattern valid for type T?", but the IsReferenceOrContainsReferences method answers the separate question "Is type T 'interesting' to the GC?"

There are several value types where IsReferenceOrContainsReferences returns false but which cannot store arbitrary bit patterns without causing problems. Among them:

  • System.Boolean (most consumers expect it to contain only 0 or 1)
  • System.Decimal (see https://github.com/dotnet/coreclr/issues/16336)
  • System.DateTime / System.DateTimeOffset
  • System.Runtime.InteropServices.GCHandle
  • System.Buffers.StandardFormat
  • System.Text.Rune

The behaviors of calling instance methods on these types or of consuming these types when they've been initialized with arbitrary bit patterns is undefined, and it could result in anything from nonsensical answers being generated to crashing the runtime.

Normally we'd tell developers not to look at the elements of arrays returned by GC.AllocateUninitializedArray or ArrayPool<T>.Rent, but there are some situations where they might not be able to control it. For example, if I'm under the VS debugger and I step past a call to AllocateUninitializedArray or Rent, the debugger could start listing the array's elements automatically even though my own code wouldn't have done such a thing.

If we do not want to change GC.AllocateUninitializedArray (perhaps because we consider it an "unsafe" API), we should at least consider putting the stronger checks into the ArrayPool<T>-derived classes.

@jkotas
Copy link
Member

jkotas commented Aug 23, 2019

AllocateUninitializedArray method is trying to ask the question "Is any arbitrary bit pattern valid for type T?

AllocateUninitializedArray is trying to ask a question: Is GC going to crash when it sees arbitrary bit pattern in this array? If the GC was able to handle object references that contain garbage, we would remove this check completely.

@GrabYourPitchforks
Copy link
Member Author

AllocateUninitializedArray is trying to ask a question: Is GC going to crash when it sees arbitrary bit pattern in this array?

I think I worded my original description poorly. This was inspired by https://github.com/dotnet/corefx/issues/40485, where that user did not expect the private array pool to be returning "garbage" data. In that case, they expected the values coming back from the pooled arrays to be constrained to a particular range. In the issue here, I'm considering the more general case of being able to assign arbitrary bit patterns to value types having bypassed the ctor and all other public APIs.

Consider the DateTime type as mentioned earlier. It's backed by a uint64, but the type places constraints on how that backing field can be initialized. For example, the type cannot be initialized with a backing field whose value is long.MaxValue. There's no ctor that will allow this (not even the serialization ctor) and no public API that will allow this, and AFAIK there's also no way to achieve this condition through struct tearing. You have to resort to private reflection or other unsafe mechanisms to achieve this.

Given these constraints on DateTime, it would be awfully surprising if I called ArrayPool<DateTime>.Shared.Rent(...) and got back a DateTime[] populated with elements that are "impossible", such as elements whose backing fields have the value long.MaxValue. Attempting to operate on these elements would have undefined behavior. What this essentially means is that ArrayPool<T>.Shared.Rent(...) can violate type safety depending on the T, but the ArrayPool<T> API hasn't historically been considered dangerous in that particular regard.

@jkotas
Copy link
Member

jkotas commented Aug 24, 2019

I think about ArrayPool.Shared as C++ new/delete with all its unsafety. It is not surprising to me that you get back unitialized memory with invalid values. You are not supposed to operate on the elements in these arrays before initializing them with good values, just like in C++.

For example, the type cannot be initialized with a backing field whose value is long.MaxValue. There's no ctor that will allow this (not even the serialization ctor) and no public API that will allow this, and AFAIK there's also no way to achieve this condition through struct tearing.

It can tear on 32-bit platforms. And I think bulk copying via Array.Copy or Span.Copy can tear struct like this as well in corner cases on 64-bit platforms.

@benaadams
Copy link
Member

benaadams commented Aug 29, 2019

The shared pool would return junky data before (outside direct user control; due to it being shared); so shouldn't be an issue to return junkier data?

i.e. you couldn't rely on it having anything sensible for your problem domain in it (a bigger constraint than not valid for the data type); without overwriting the returned array first. So reading from it without writing to it first is an invalid use.

This differs from the custom pool as that can be fully under user control for a specific use (though reading without overwriting first; or always clearing on return, for a custom pool still seems overly dangerous as random results will bleed through)

@Maoni0
Copy link
Member

Maoni0 commented Nov 7, 2019

@GrabYourPitchforks are you ok closing this or is there still a concern?

@GrabYourPitchforks
Copy link
Member Author

Looks like our stance is that ArrayPool<T> is inherently unsafe (see comment at https://github.com/dotnet/coreclr/issues/26304#issuecomment-524579762). I'm fine with this being resolved "by design" then.

@Maoni0 Maoni0 closed this as completed Nov 7, 2019
@GrabYourPitchforks
Copy link
Member Author

Thanks for checking up on it Maoni. Much appreciated. :)

@Maoni0
Copy link
Member

Maoni0 commented Nov 7, 2019

of course! :)

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the 5.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

5 participants