Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Empty ArrayPool not returning zero-inited array in .NET Core 3.0 #30649

Closed
ppekrol opened this issue Aug 21, 2019 · 16 comments · Fixed by dotnet/coreclr#26338
Closed

Empty ArrayPool not returning zero-inited array in .NET Core 3.0 #30649

ppekrol opened this issue Aug 21, 2019 · 16 comments · Fixed by dotnet/coreclr#26338
Assignees
Milestone

Comments

@ppekrol
Copy link

ppekrol commented Aug 21, 2019

In .NET Core 2.2 following code could run indefinitely (simple console app):

            var pool = ArrayPool<byte>.Create();

            var iteration = 0;
            while (true)
            {
                Console.WriteLine(iteration);

                var buffer = pool.Rent(32 * 1024);

                for (int i = 0; i < buffer.Length; i++)
                {
                    if (buffer[i] != 0)
                        throw new InvalidOperationException();
                }

                iteration++;
            }

In .NET Core 3.0 it fails around 50-100 iteration. Is this by design? I was assuming that if I'm not returning anything, then the pool would have to init new arrays and those would have zeros?

I'm using .NET Core 3.0 Preview 8 on Win10 x64

dotnet --version
3.0.100-preview8-013656
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Aug 21, 2019

This is by design and is an optimization. See the PR dotnet/coreclr#24504 for more information.

Summary of that PR: Since the contents of the array returned by ArrayPool<T>.Rent are undefined anyway, the ArrayPool<T> implementation is free to tell the GC to allocate the array from any available block of memory, including memory that contains old data which has not yet been zeroed by the GC. (I'm handwaving a bit, but I hope this conveys the general idea.)

@ppekrol
Copy link
Author

ppekrol commented Aug 21, 2019

Interesting. A bit problematic for my case, but interesting.

I was assuming (since this is not Shared array) that I will get either zeroes or my previous values in the array (was not clearing the array on return). Which was fine for my system, because those values never exceeded 10 (getting IndexOutOfRangeException in few spots now). Now I need to clear an array on Rent, which is an additional operation that needs to be performed.

What would be the purpose of clearArray in Return now? Security?

@GrabYourPitchforks
Copy link
Member

I wonder if your use case would be an argument for introducing an overload ArrayPool<T>.Create(..., options) that would allow you to configure the "should I use the uninitialized pool?" behavior. I'll let others chime in on that. :)

What would be the purpose of clearArray in Return now? Security?

It was intended to enforce the behavior "nobody else who rents an array can inspect the contents of the thing I'm returning." But I think you're hinting that the parameter is a little less useful now since there's a chance that a new buffer could be allocated from the existing data - and I'd be inclined to agree with you here. We do have ZeroMemory for folks who really need to ensure that a buffer is wiped.

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 21, 2019

If I'm choosing to rent an array for use as an intermediate buffer in a crypto operation (or similar) then I should never be reading data from the array that I don't know that I put there, So initialization to 0 isn't needed and I don't want to pay that cost. When I return that array I want to make sure that my intermediate data is not available for anyone else to inspect, in total or partially, which is supported by the clear operation, this cost I must pay to be secure. This is the sort of use case I think is currently expected.

Wanting to clear on rent but not on return seems a less disciplined approach to me but I can see how it might be unexpected that rent returns a non-zero array if you expect standard semantics outside corefx where everything is defaulted at creation.

@stephentoub
Copy link
Member

cc: @VSadov

@VSadov
Copy link
Member

VSadov commented Aug 22, 2019

My understanding on having clear at return is that return codepath generally is less perf sensitive than the renting side. - If you are the only user of the pool and want clean buffers, clearing at return is an option that is less likely to affect latencies of the app.
It also allows skipping clearing if array was unused.

The value of clearing at return is indeed diminished by the optimization, since now that does not guarantee clean buffers even if you are the only user of the pool.

Maybe we need a parameter to suppress the optimization when nonshared pool is constructed?

@ppekrol
Copy link
Author

ppekrol commented Aug 23, 2019

If you could introduce that switch that would be outstanding for me. My 'private' pool is used in a hot-path (Lucene.NET clone) and I cannot afford clearing the array on each 'Rent'. My assumption was: I'm the only user of that pool so the values that are returned from it are predictable. Because of that assumption, and because it was working, 'private' array pool was a perfect replacement for custom pool implementation there.

@benaadams
Copy link
Member

Do you populate randomly elements of the array after rent; or is it sequential but you don't use all of it?

If its the later you can slice it as either Memory<byte> or Span<byte> to constrain the size.

@ppekrol
Copy link
Author

ppekrol commented Aug 23, 2019

Size of an array is not a problem, the values inside it are. I do not expect to have different values than 10 at each index.

@stephentoub
Copy link
Member

I hadn't comprehended/internalized that dotnet/coreclr#24504 changed both the shared and isolated pools; in my mind it only did it for the former. I'd be supportive of backing out this change for the latter... with the former, you have no control over who is returning what to the pool you're using, but you do with the latter, and so the argument that the contents of the rented arrays is arbitrary is on shakier ground.

@VSadov
Copy link
Member

VSadov commented Aug 23, 2019

Yes, I think we should just undo this optimization for private pools.

@stephentoub
Copy link
Member

Re-opening to track release/3.0 port.

@stephentoub stephentoub reopened this Aug 26, 2019
@GSPP
Copy link

GSPP commented Aug 26, 2019

In private pools, users can clear data at a point of their choosing. It does not have to be part of the pool API.

If user code cleans the arrays, it can optimize cleaning to only the portion of the array that actually was used.

@stephentoub
Copy link
Member

In private pools, users can clear data at a point of their choosing. It does not have to be part of the pool API.

Such functionality could be added as opt-in. But code has already been written (as outlined in this issue) that relies on the clearing already being done.

@danmoseley
Copy link
Member

@stephentoub we can close this now right?

@stephentoub
Copy link
Member

Yup

@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants