Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a new System.Buffers namespace to the BCL for Resource Pooling #15725

Closed
jonmill opened this issue Nov 17, 2015 · 34 comments
Closed

Add a new System.Buffers namespace to the BCL for Resource Pooling #15725

jonmill opened this issue Nov 17, 2015 · 34 comments
Assignees
Milestone

Comments

@jonmill
Copy link
Contributor

jonmill commented Nov 17, 2015

Summary

Currently, the BCL does not have support for resource pooling of any kind. A Buffer Pool type should be added to allow both the Framework and external engineers to utilize Pooling of primitive and typically-short-lived types.

Rationale and Usage

Current engineers, both internally and externally, are required to create their own Pool system for short-lived arrays or create and destroy arrays on demand, leading to lots of objects to be collected during Generation 0 Garbage Collection and therefore taking a runtime performance hit. Lots of simple pooling scenarios can be solved with a generic Rent-Return contract, allowing for a genetic Pool implementation that will prevent short-term objects or custom Pool implementations for a majority of cases.

Note

The new System.Buffers namespace is not meant to cover all Pooling cases; in some instances, the requirements are so specific to the application being written that it would not be feasible to make a pool for that case in the BCL. This addition is also not meant for object pooling; this addition is specific to buffer pooling of primitive or struct-based types. Object pooling has different requirements that are not necessarily fulfilled by buffer pooling.

Proposed API

namespace System.Buffers
{
    public abstract class ArrayPool<T>
    {
        public static ArrayPool<T> Shared { get; internal set; }

        public static ArrayPool<T> Create(int maxBufferSize = <number>, int numberOfBuffers = <number>);

        public T[] Rent(int size);

        public T[] Enlarge(T[] buffer, int newSize, bool clearBuffer = false);

        public void Return(T[] buffer, bool clearBuffer = false);
    }
}

Class Description

The ArrayPool is a new class that will manage tiered buffers for the caller. The Pool is named ArrayBufferPool due to the specific nature of this pool; the buffers from this pool are expected to be used in managed code and while they can be pinned and passed to Native code, it is not expected that they will be. There are preliminary talks to create a NativeBufferPool to help with the PInvoke scenarios, but that implementation will rely on Span, which is not ready yet. The ManagedBufferPool class will be generic in order to allow callers to Rent arrays of type T, which is expected to be primitive types such as byte or char. The Pool will be lightweight and thread-safe, allowing for fast Rent and Return calls from any thread within the process, along with minimal locking overhead, and 0 heap allocations on most Rent calls (exceptions to this will be called out below in the description of the Rent function).

To allow for resizing and to minimize fragmentation, the Pool will use Bucketing to create different buffer sizes up to the specified maximum. This allows callers to request multiple buffer sizes without needing multiple pools; also, Bucket sizes will be determined ahead of time but will not allocate ahead of time. This trade off means many different Bucket sizes can be specified without putting a strain on memory utilization unless requested.

Usage examples can be seen in the Examples section below.

Public Function Descriptions

Constructor

public ManagedBufferPool(int maxBufferSize = <number>, int numberOfBuffersPerBucket = <number>);

The ManagedBufferPool constructor takes in two arguments: the maximum buffer that is expected to be requested, and the number of buffers per BufferBucket. Both of these arguments are optional and can be used to tweak the Pool to situation-specific circumstances as well as have default values that will be tailored to most situations if they are not specified. The constructor will not allocate the buffers at this time; the pool is Lazy Loaded so that the Bucket sizes will be determined, based on the maximum size, but the memory for each Bucket will not be allocated until requested.

RentBuffer

public T[] RentBuffer(int size, bool clearBuffer = false)

The Rent(..) function is used to request a buffer of a specific size from the Pool. The caller may request that the buffer be cleared before it is Rented, but this defaults to false for performance reasons. The Pool is guaranteed to return a Buffer of at least the specified size; the actual size may be larger to due buffer availability. If the Bucket containing the specified size has not been allocated yet, it will be created at this time; any further Rent calls that hit this Bucket will not allocate any data on the Heap. This function is thread-safe.

EnlargeBuffer

public void EnlargeBuffer(ref T[] buffer, int newSize, bool clearFreeSpace = false);

The EnlargeBuffer(..) function is used to request a larger buffer than the one specified. The new buffer returned will be at least the specified size and will contain all data in the passed in buffer. The buffer musted be passed as a reference since the previous reference will no longer be valid. The caller may also request that any excess space between the end of the previous buffer to the end of the new buffer be cleared; this defaults to false for performance reasons. Like Rent, this call may allocate if the new size hits a Bucket that has not been allocated yet; if the Bucket has been allocated, this call will not allocate any data on the Heap. Only buffers that have been received via calls to RentBuffer should be passed to this function. This function is thread-safe.

ReturnBuffer

public void ReturnBuffer(ref T[] buffer, bool clearBuffer = false);

The ReturnBuffer(..) function is used to give up ownership of a buffer received from calls to RentBuffer. The call takes a reference to a buffer since the reference will no longer be valid after the call returns. The buffer can be cleared by passing true to the optional parameter, but this defaults to false for performance reasons. Only buffers that have been received by calls to RentBuffer should be passed in and a buffer can only be Returned once. This function is thread-safe.

Static Declarations

SharedBufferPool

public static ManagedBufferPool<T> SharedBufferPool {get; }
This static property allows for a Shared pool to be used in cases where multiple components in a system will require access to buffers for the duration of the process lifetime, such as a web server. This instance will be created with the default parameters and is readonly. It follows the same usage and allocation patterns described above, so if the caller does not use it, nothing will be allocated.

Future Entries into System.Buffers

Going forward, we will look into adding more types of resource pooling into the System.Buffers namespace; currently, a NativeBufferPool and an ObjectPool are in the very initial stages.

Examples

Simple Rent and Return

public class Program
{
    public static void Main(string[] args)
    {
        ManagedBufferPool<byte> pool = new ManagedBufferPool<byte>();
        for (int i = 0; i < 50; i++)
        {
            byte[] buffer = pool.RentBuffer(1024);

            // Do something to fill the buffer
            DoSomething(buffer);

            string s = System.Text.Encoding.ASCII.GetString(buffer);
            Console.WriteLine(s);
            pool.ReturnBuffer(ref buffer);
        }
    }
}

Multithreaded Example

public class Program
{
    // We know our packets will be at most 1024 bytes and we need at most 8 of them
    static ManagedBufferPool<byte> _pool = new ManagedBufferPool<byte>(1024, 8);

    public static void Main(string[] args)
    {
        for (int i = 0; i < 8; i++)
        {
            new Thread(new StartStart(DoWork)).Start()
        }

        // Pseudo-code to wait
        WaitForAllThreads();
    }

    private static void DoWork()
    {
        while (true)
        {
            byte[] buffer = _pool.RentBuffer(512);

            // Pseudo-code to get data from the network and do something with it
            GetDataFromNetwork(buffer);
            DoSomethingWithFilledBuffer(buffer);

            _pool.ReturnBuffer(ref buffer);
        }
    }
}

Updates

Update 1

After some thinking and initial feedback, removing the Shared pool due to the possibility of confusing or misuse and the possibility that components will use the Shared pool, growing the Process memory unnecessarily.

Update 2

Updated the API referenced based on the review feedback.

@jonmill jonmill self-assigned this Nov 17, 2015
@stephentoub
Copy link
Member

Nice proposal. A few initial thoughts:

  • ManagedBufferPool<T> should really derive from an abstract base class and/or implement a public interface. For components that don't want to or can't use the shared pool, they should be able to accept one of these passed into them, and they should be written to wor with other pool implementations, not just this specific kind. For example, if we were to add a ctor to DeflateStream that accepted a buffer pool, we'd want to do it via an IBufferPool<T> or a base BufferPool<T>, not with ManagedBufferPoolt<T>.
  • We should strive to avoid default arguments, and instead have overloads of, for example, RentBuffer, one that doesn't take clearBuffer and one that does, with the former delegating to the latter. If you went with an abstract base class, the former could be non-virtual and the latter could be virtual. If you went with an interface, the latter could be on the interface and the former could be an extension method.
  • Why must T be a struct? What happens if it's not a primitive type, e.g. a struct wrapper over an object reference?
  • How do we plan to rationalize the design of ManagedBufferPool with NativeBufferPool, if the former is giving out managed T[]s. There'd be no relation between them and no way to substitute one for the other?
  • Is there any mechanism employed for detecting misuse, e.g. returning a buffer you didn't get from the pool, returning a buffer twice, using a buffer after it's already been returned, etc.? Presumably in some cases there's nothing we can do, but it'd be good to be clear about our intent here.
  • I don't understand why ReturnBuffer takes a ref. I know the description says it does so because the buffer is no longer valid after that, but nulling out that reference doesn't guarantee that the caller can't access it, as the caller could easily have multiple copies of the reference. Seems like having it take a ref is misleading.
  • What are the maximums used by the shared buffer pool? I assume there aren't any? How does it handle reclamation, or does it hold on to everything given back to it?
  • What library would this be in, and where would that library sit in the layering? e.g. could we use this from System.IO.dll?

@mellinoe
Copy link
Contributor

Got a couple of the same questions as Stephen.

  • Why does it need to be limited to any type in particular? I can understand if we were talking about the NativeBufferPool which would necessarily need to be some kind of blittable structure (not necessarily a primitive), but since it's a "Managed"BufferPool, why can't I make a pool of any kind of array? It would also unnecessarily restrict me from doing something like making a ManagedBufferPool<Vector3>, which technically is just a wrapper structure for 3 floats.
  • I'm also concerned about the reclamation behavior. If there's no way to get back the memory you've allocated, then you're stuck with that memory forever. For example, if we used this in System.IO and I only opened a few streams or files or something at the beginning of my program, then I'm stuck with that memory forever, even if I never use the buffer pool again. Similarly, if I only rarely open a file (maybe due to occasional user input or something), then I don't want permanent memory around for that, I'd just rather allocate a one-off array and have it GC'd normally. If this stuff is baked into a low level library, then I'd be stuck with permanent memory regardless of my usage patterns in my app. Obviously it won't be a significant amount of memory in most cases, but it could probably add up.

@jonmill
Copy link
Contributor Author

jonmill commented Nov 17, 2015

@stephentoub Responses below

ManagedBufferPool should really derive from an abstract base class and/or implement a public interface. For components that don't want to or can't use the shared pool, they should be able to accept one of these passed into them, and they should be written to wor with other pool implementations, not just this specific kind. For example, if we were to add a ctor to DeflateStream that accepted a buffer pool, we'd want to do it via an IBufferPool or a base BufferPool, not with ManagedBufferPoolt.

Good idea. I'll think on this and have an answer for the review.

We should strive to avoid default arguments, and instead have overloads of, for example, RentBuffer, one that doesn't take clearBuffer and one that does, with the former delegating to the latter. If you went with an abstract base class, the former could be non-virtual and the latter could be virtual. If you went with an interface, the latter could be on the interface and the former could be an extension method.

Just out of curiosity, what is the reason for avoiding default arguments? I'm not partial one way or another, I'm just curious about what problems they cause so I can avoid them in the future :)

Why must T be a struct? What happens if it's not a primitive type, e.g. a struct wrapper over an object reference?

I made T a struct so that it is restricted to primitives and structs and not classes; this pool is explicitly not and Object pool. The use cases I have had so far, working with the ASP.NET folks, have been around primitive and struct arrays that benefit from Pooling.

How do we plan to rationalize the design of ManagedBufferPool with NativeBufferPool, if the former is giving out managed T[]s. There'd be no relation between them and no way to substitute one for the other?

I think that the use cases for Managed and Native pools are so different that swapping between them would only add to confusion around which to use. By keeping the usage patterns slightly off, the usage of one or the other becomes much more clear; for example, if one is using PInvoke calls with buffers, you usually want to pass pointers and Span has a way to retrieve that while Arrays require unsafe code to do so. Conversely, if you want to pass some buffers around from receiving I/O, to formatting or translating, and back out to I/O, the current API set uses Arrays, so it makes it immediately obvious which pool to use. Just my thought process here

Is there any mechanism employed for detecting misuse, e.g. returning a buffer you didn't get from the pool, returning a buffer twice, using a buffer after it's already been returned, etc.? Presumably in some cases there's nothing we can do, but it'd be good to be clear about our intent here.

I haven't planned anything right now. I think if we did add any of that, it would be good to do in Debug mode only. A P0 aim was to make this as efficient as possible and all of that adds overhead; however, it does add a lot of value while debugging, so maybe adding some Debug build-specific code to check for these scenarios would be useful. I'll think more on this

I don't understand why ReturnBuffer takes a ref. I know the description says it does so because the buffer is no longer valid after that, but nulling out that reference doesn't guarantee that the caller can't access it, as the caller could easily have multiple copies of the reference. Seems like having it take a ref is misleading.

I made it use ref to try and remove all the easy ways to shoot yourself in the foot; the easiest being code like the following:

byte[] buffer = Rent(...);
// Do things....
Return(buffer, true /*clean the array*/);
// A little later
buffer = ReadIo();

I agree that it's very hard to prevent misuse if the caller makes copies of the reference but making the parameter ref means that they would have to take extra steps to misuse the buffer since the returned value becomes unusable. Making it ref also helps enforce that in another way; since you can't pass a Property as a ref, this encourages callers to not copy the reference they have unless absolutely necessary, which will help (potentially) cut down on these types of bugs.

What are the maximums used by the shared buffer pool? I assume there aren't any? How does it handle reclamation, or does it hold on to everything given back to it?

So I'm 50/50 on if I want to include the Shared pool or not; there are definitely use-cases where a Shared pool is helpful, but I also think some extra glue code to get around not having a Shared pool wouldn't be the worst thing.

In terms of defaults, the Shared pool that is in CoreFxLab uses the default parameters of the Pool constructor, which is a max buffer size of 2 MB (lazy loaded) and a max number of buffers at 50-per-bucket.

What library would this be in, and where would that library sit in the layering? e.g. could we use this from System.IO.dll?

Ideally, this would be a new library with a few dependencies as possible; it would be great if the Pooling could be used in the framework in places where it makes sense, so limiting dependencies helps. Currently, the only dependencies in the CoreFxLab prototype are:

System.Runtime.CompilerServices (for inlining tiny, math-based functions)
System.Diagnostics (for Debug checks, can be removed)
System.Threading (for Interlocked operations and SpinLock)

@jonmill
Copy link
Contributor Author

jonmill commented Nov 17, 2015

@mellinoe Responses below

Why does it need to be limited to any type in particular? I can understand if we were talking about the NativeBufferPool which would necessarily need to be some kind of blittable structure (not necessarily a primitive), but since it's a "Managed"Buffer pool, why can't I make a pool of any kind of array? It would also unnecessarily restrict me from doing something like making a ManagedBufferPool, which technically is just a wrapper structure for 3 floats.

The reason this is restricted is because this is meant to be a Buffer pool, not an Object pool; the usage patterns and underlying implementation will differ between these types. Obviously, this can be worked around by callers by just making their Objects a struct, but that will be a misuse that we can't really avoid. The use-cases I designed this around came from ASP.NET and their current Buffer Pools, which required primitive arrays and struct arrays.

I'm also concerned about the reclamation behavior. If there's no way to get back the memory you've allocated, then you're stuck with that memory forever. For example, if we used this in System.IO and I only opened a few streams or files or something at the beginning of my program, then I'm stuck with that memory forever, even if I never use the buffer pool again. Similarly, if I only rarely open a file (maybe due to occasional user input or something), then I don't want permanent memory around for that, I'd just rather allocate a one-off array and have it GC'd normally. If this stuff is baked into a low level library, then I'd be stuck with permanent memory regardless of my usage patterns in my app. Obviously it won't be a significant amount of memory in most cases, but it could probably add up.

If you use the Shared pool, then yes; that is one big reason I'm not sold on the Shared pool and may take it out. If you use an instance of the BufferPool, your instance is tracked by the GC so if you only need the Pool for a certain amount of time, it will be reclaimed once you no longer need it just like any other managed object...leaking this or keeping the memory around for the lifetime of the process would be a usage problem and not a Pool problem

@mellinoe
Copy link
Contributor

So I think that you are saying it can be used with any struct type, not just primitives, right? Just wanted to clarify. This wording was a bit unclear:

The ManagedBufferPool class will be generic in order to allow callers to Rent arrays of type T, which is expected to be primitive types such as byte or char.

I'm aware that you can't actually put a generic restriction on "primitive" types, but we do some weird stuff in System.Numerics.Vectors where we actually do artificially limit the generic type to "primitive" structs and prevent you from instantiating a non-primitive struct. I think what you're saying is that I could indeed create a ManagedBufferPool<Vector3>, so that sounds fine to me.

it will be reclaimed once you no longer need it just like any other managed object...leaking this or keeping the memory around for the lifetime of the process would be a usage problem and not a Pool problem

Yeah, that sounds like how I would expect it to behave. I was thinking more for our internal uses in low-level libraries, which for their uses might just keep around a static buffer pool, which is never reclaimed. This is how the long path stuff that @JeremyKuhne added works, right? As long as the overhead is tiny, it's not really a problem, but if the pattern proliferates throughout the BCL then it might become noticeable depending on your app's usage patterns.

@jonmill
Copy link
Contributor Author

jonmill commented Nov 17, 2015

So I think that you are saying it can be used with any struct type, not just primitives, right? Just wanted to clarify. This wording was a bit unclear:

Yeah, sorry for being unclear. I'll go change that line; you can use either primitives OR structs.

I was thinking more for our internal uses in low-level libraries, which for their uses might just keep around a static buffer pool, which is never reclaimed.

Yeah, the more I think about it, the more the static Shared pool becomes a problem. I'm going to remove it

@stephentoub
Copy link
Member

I made T a struct so that it is restricted to primitives and structs and not classes; this pool is explicitly not and Object pool. The use cases I have had so far, working with the ASP.NET folks, have been around primitive and struct arrays that benefit from Pooling.

I still don't understand the struct limitation. What's the difference between:

class Bar { }
...
ManagedBufferPool<Bar>

and:

struct BarWrapper { public Bar Bar; }
class Bar { }
...
ManagedBufferPool<BarWrapper>

?

You've stated that there are different use cases for buffer pools vs object pools. Sure. But why does a buffer of reference types not count as a buffer but a buffer of value types does? Given a reference type T, there's a difference between an ObjectPool<T> and an ObjectPool<T[]>... the latter looks an awfully lot like a buffer pool.

I'm simply not understanding why we're artificially limiting it. What value does that limitation bring? I understand the primary focus is on primitives (which are structs / value types), but why prevent other usage? Especially when we're not really preventing it... we're just making folks work a bit harder to get it, by wrapping their reference types in a value type like I did above.

@jonmill
Copy link
Contributor Author

jonmill commented Nov 18, 2015

I'm simply not understanding why we're artificially limiting it. What value does that limitation bring? I understand the primary focus is on primitives (which are structs / value types), but why prevent other usage? Especially when we're not really preventing it... we're just making folks work a bit harder to get it, but wrapping their reference types in a value type like I did above.

I suppose that's true. It would make it confusing later on if we added an ObjectPool though...then it would be confusing on which one I use since both work. The limitation with using this for classes would be that the class must have a default constructor and be non-disposable; plus, renting arrays of complex objects seems...odd. I realize that this implementation can be coerced to work with classes but you have to try to do that and it is obvious that it is not the goal.

Maybe a different solution would be to make this primitive-only and then make a separate ObjectPool that can take T and have that be a part of this proposal as well. What do you think about this alternative @stephentoub? Then the use-case for the Buffer Pool is obvious and if you want to use complex types, there is another class that is built for that in the same namespace.

@stephentoub
Copy link
Member

The limitation with using this for classes would be that the class must have a default constructor and be non-disposable

Why? I think we're misunderstanding each other. I'm not talking about a need to maintain the objects in the buffer; in fact, I expect the primary use case would involve setting clearBuffers to true when returning them to the pool. This is about not having to allocate the array / buffer itself.

For example, consider params arrays in C#. Today when you have a function like:

public void Bar(params object[] args);

and you call:

Bar("hello", 42);

the compiler needs to allocate a new object[2] in which to store the arguments to pass into the array. If the C# compiler team did perf measurements and found it to result in more efficient code, why couldn't they instead use a ManagedBufferPool<object>?

As another example, the String.Concat overloads that take objects need to internally allocate a temporary string[] in order to store the strings created from the supplied objects, e.g. https://github.com/dotnet/coreclr/blob/master/src/mscorlib/src/System/String.cs#L3082. If we did perf measurements and found that pooling that array was better than allocating it each time, why couldn't we use ManagedBufferPool<string>?

it is obvious that it is not the goal

And I'm trying to understand why that is. The goal here is to avoid needing to allocating arrays when we can reuse ones we've cached. What's so special about value types that we have a goal for working with value types and not for working with reference types? (Other than one of the most common uses for this being byte[].)

Maybe a different solution would be to make this primitive-only

It's not possible with C# to do that statically... we'd need to do run-time checks to validate the primitiveness, which makes the APIs more complicated to use, especially for something as hopefully general purpose as a buffer pool.

then make a separate ObjectPool that can take T

I'm fine with the idea of having an ObjectPool<T>, but I don't believe it should be done instead of allowing reference types to work with a buffer pool. I'd also recommend it be a separate proposal.

@nguerrera
Copy link
Contributor

In sample, does ReturnBuffer need to be in a finally? Could exceptions cause leaks otherwise? If so, should we have something that works with using (...) construct to keep the calling code tighter?

If I attempt to Rent a buffer greater than maxSize, does it:

  1. throw?
  2. allocate outside the pool and no-op on Return?
  3. keep the large buffer around for future Rent's (IOW, is maxSize just a hint)?

@jonmill
Copy link
Contributor Author

jonmill commented Nov 18, 2015

@stephentoub:

And I'm trying to understand why that is. The goal here is to avoid needing to allocating arrays when we can reuse ones we've cached. What's so special about value types that we have a goal for working with value types and not for working with reference types? (Other than one of the most common uses for this being byte[].)

I suppose you're right...there, fundamentally, is no reason we can't remove the where T : struct clause since the pool would work with Objects. That, honestly, wasn't my intention when writing the proposal; however, if there is value added by allowing it and the underlying code would support those scenarios, I think you're right that we should allow them 😃

@nguerrera

In sample, does ReturnBuffer need to be in a finally? Could exceptions cause leaks otherwise? If so, should we have something that works with using (...) construct to keep the calling code tighter?

That's up to the caller; since the buffers are all managed, if they have a logic error where they forget to return the buffer, it will be free'd when the buffer is GC'd. If this is in a loop, then they will drain the pool of resources, which will resort to allocating when they request buffers and we don't have any. In response to a comment @stephentoub made earlier, I'm thinking about adding some Debug-specific code that will try and trap these errors.

If I attempt to Rent a buffer greater than maxSize, does it:

  1. throw?
  2. allocate outside the pool and no-op on Return?
  3. keep the large buffer around for future 'Rent`s (IOW, is maxSize just a hint)?

If you request a size > the initial maxSize passed to the constructor, the buffer is allocated on demand and is dropped when passed to ReturnBuffer; we will not keep around buffers that exceed our Bucket sizes.

@rynowak
Copy link
Member

rynowak commented Nov 18, 2015

The overhead of renting and returning buffers from the prototype pool is lower than the one we built internally 👍 nice!

It will be really exciting to us to see something like this go into CoreFx because it means that community packages can start to take a dependency on it.

Some feedback


I made T a struct so that it is restricted to primitives and structs and not classes; this pool is explicitly not and (sic) Object pool. The use cases I have had so far, working with the ASP.NET folks, have been around primitive and struct arrays that benefit from Pooling.

We (ASP.NET) absolutely have the requirement to store reference types in a pooled buffer. I'm currently doing exactly what @stephentoub highlighted - creating a struct wrapper around an object. It's a pain and just seems unnecessary for something we want to put in a public API.

The limitation with using this for classes would be that the class must have a default constructor and be non-disposable;

I think this is a misunderstanding. Constructors aren't related at all, this ain't c++ where you have uninitialized memory, you have null. Array.Clear() handles this case, so why the limitation?

Structs can implement IDisposable as well, this concern is orthogonal.

renting arrays of complex objects seems...odd.

I'm not renting an array of complex objects, I'm renting a place to store complex objects.

Maybe a different solution would be to make this primitive-only and then make a separate ObjectPool that can take T and have that be a part of this proposal as well.

As discussed several times, our usage in this manner is very different from canonical object-pooling scenarios.

I'd suggest not gating this proposal on other proposals that meet different sets of requirements. If this doesn't land in a way we can ship some time in the next few weeks, then it's at risk for us to use in v1.


In answer to some of the other discussions about returning/leaking - consider an alternative from our humble buffer pooling strawman: https://github.com/aspnet/Common/blob/dev/src/Microsoft.Extensions.MemoryPool/IArraySegmentPool.cs

The standard currency here is LeasedArraySegment<T> not the buffer itself. This makes it impossible to return something that didn't come from the pool. Associating the owner with the segment directly allows you to build in better diagnostics around incorrect usage, and uses the type system to reflect the difference between any random array, and something that you borrow rather than own.

In this pattern the concrete implementation can also subclass LeasedArraySegment<T> and store any necessary state there. In our case we used it to implement diagnostics for leaks.


Consider also here using ArraySegment<T> on the base class defintions. This makes a variety of memory management strategies on the part of the pool possible. The way the APIs are defined in the proposal pretty much limits you to exactly what the implementation does.

The objection that was raised to this earlier was that not many APIs in .net take ArraySegment<T> directly, but I don't find that persuasive. I really struggle to think of any framework API I'd want to call that can't also be given an offset. Examples: Array.BlockCopy Encoder.GetText/GetBytes TextReader/Writer.

For instance in Kestrel, the memory pool allocates an LOH sized block of memory and then pins it and hands out chunks. This allows you to avoid having to 'graduate' objects that you know have a static lifetime via the normal GC process. This also should avoid the GC attempting to move the pooled memory around as part of compaction.

@steveharter
Copy link
Member

There should be a security review here that depends on the internal implementation. Could I call ReturnBuffer, but hang onto the array pointer and watch the buffer change by other threads\code and watch the data that I shouldn't?

It partially depends on where the ManagedBufferPool is created - static, local, thread, etc. so any issues of misuse would require access to the ManagedBufferPool instance.

@steveharter
Copy link
Member

It would be good to have performance numbers with and without the ManagedBufferPool to verify the premise and make sure the gen0 collections and re-allocs are indeed expensive.

@omariom
Copy link
Contributor

omariom commented Nov 21, 2015

Just read in the APi review

Returning a T[] that holds onto huge object graphs is problematic, hence we should consider always clearing on return if T isn't a primitive type

This is another case when having ContainsReferences property would be beneficial.

@benaadams
Copy link
Member

@steveharter re: is GC a problem without pooling?

For ASP.NET at its target speed around 7Mrps just for a single type it would be allocating 3.2 GBytes and 42.3M transitory objects per second and there are many more different types it allocates during the request processing; which is an enromous amount for the GC to clean up.

Prior to pooling; under high load, after about 3 minutes, it would literally flat-line on network performance and stop serving requests while still at 100% cpu on 8 cores only performing GC.

image

The GC is an exceptionally good general purpose allocator and deallocator; however object pooling has the advantage that comes from being a very single purpose object recycler.

Its easier to reuse the objects already allocated then clean up the memory; and reallocate them.

@benaadams
Copy link
Member

@sokket could an object based pool such as @rynowak mentions take types that implement IResetable which would be like IDisposable but reset the properties ready for resuse (called in Return...)

e.g.

public interface IObjectPool<T> where T : IResetable object pool also I don't think would be arrays of objects but single objects (hopefully backed by an array)

@benaadams
Copy link
Member

Also I'd heavily favor an interface over class or abstract base class; with a default/standard implementation then provided; its more flexible, composable and testable - and allows separation of contracts from implementation - even in different assemblies. (e.g. have interface assembly and implementation assembly - easier for pay-on-use)

@benaadams
Copy link
Member

clearBuffer in constructor; clear on return, if you are paranoid and want your data cleared you probably don't want to even share your buffer pool?

@clrjunkie
Copy link

Just finished watching the DR Video.

  1. I think RentBuffer should be renamed to TakeBuffer because:
    • Renting without specifying a duration or some offer as part of a contract is Taking.
    • Renting implies you don’t have ownership of the buffer which is in contrast to the actual implementation as noted in the DR.
    • It should already be obvious to the user, from the class name, that this is a Pool interface and there is no need to reinforce the concept with “Rent”.
  2. The DR briefly discussed the need for pooling of buffers that don’t need to be pinned hinting to ASP.NET requirements but I think this is key and such requirements should be detailed as to what are the exact scenarios. My experience with buffer pools in .NET servers is primarily to tackle the GC delay in compacting Gen0 due to artificial fragmentation caused by pinning during I/O.
  3. I’m also unclear about the name and goals of this class:

The “Rationale and Usage” above states

“This addition is also not meant for object pooling; this addition is specific to buffer pooling”

But the class take on T and I can ask for RentBuffer(1) which will return an array with one element.

The “Class Description” states:

“The Pool is named ArrayBufferPool”

But later on states:

“The ManagedBufferPool class will be generic in order to allow callers to Rent arrays of type T, which is expected to be primitive types such as byte or char”

I think we need a better understanding of why this pool is better than a “BufferPool” of byte arrays or a generic “ObjectPool” that has no “Buffer” (a.k.a byte array) specific semantics.

@ghost
Copy link

ghost commented Dec 4, 2015

RentBuffer should be renamed to TakeBuffer

Or simply GetBuffer (int size) (simpledb style: http://db.csail.mit.edu/6.830/simpledb/doc/simpledb/BufferPool.html) or AquireBuffer.

@clrjunkie
Copy link

Get is also good but is commonly paired with a Set operation.
I think “Return” what you Take is more natural in this case.

@jonmill
Copy link
Contributor Author

jonmill commented Dec 4, 2015

@benaadams I'm in the beginning stages of investigating what's necessary for an Object Pool; the ArrayPool has some very specific uses around renting buffers, admittedly I originally thought of purely value types but there was a request to relax that, that are different than the uses around Object Pooling. I definitely agree that there needs to be a way to reset the objects when returned. I'm also thinking heavily about how to make the Pool test-friendly, both internally and by also allowing consumers to swap it out to easily test their use of it; this is a very-high priority item and I'll be sure to have a story for how to do that when it is checked in.

@clrjunky I named the function Rent because the consumer does not own the buffer; the buffer is still owned by the pool and is simply loaned to the consumer for them to use. By calling the function Rent, it implies this relationship and also implies that the consumer should not keep references to this after returning it to the pool...semantics on how to enforce that are being thought through now. IRT pinning, it is an explicit goal of this Pool to not be pinned; pinning is usually done when PInvoking and we are investigating a pool specific for those purposes where the buffers will live on the Native heap so GC pinning and marshalling wouldn't be necessary. I say that with a heavy disclaimer that this idea is not near completion and is still in the investigation phase. In the mean time, if the buffers Rented from this pool are going to be pinned, then it needs to be done by the caller otherwise it would have unnecessary consequences and performance drawbacks for cases where it isn't necessary.

Sorry about the confusion around the name...I haven't had a chance to go through and fully update the original issue with the full rename and from all the feedback; I'm working through that now.

@jonmill
Copy link
Contributor Author

jonmill commented Dec 4, 2015

@benaadams I don't see an email on your profile...would you mind shooting me an email to chat about some of the scenarios you have in mind with the extension of and testing of the Buffer Pool?

@clrjunkie
Copy link

@sokket

I named the function Rent because the consumer does not own the buffer; the buffer is still owned by the pool and is simply loaned to the consumer for them to use

From the DR video (17:15) "Once we give you a buffer nobody holds a reference to this T Array"

Your "Renting" in theory the consumer is "Taking" in practice.

By calling the function Rent , it implies this relationship and also implies that the consumer should not keep references to this after returning it to the pool

The semantics are already defined by the class being a pool (what does Return mean without the notion of a "Pool")

pinning is usually done when PInvoking

clrjunky wrote:
"My experience (read: PInvoking) with buffer pools in .NET servers is primarily to tackle the GC delay in compacting Gen0 due to artificial fragmentation caused by pinning during I/O".

we are investigating a pool specific for those purposes where the buffers will live on the Native heap
so GC pinning and marshalling wouldn't be necessary

...and once you fill them how will I be able to copy them into my Dictionary without pinning the memory? How will I know when data is ready to be copied from the native memory?

@benaadams
Copy link
Member

@sokket sent you and email, will send followup - but with Span doing interesting things with types I could see shared type pools e.g.

BufferPool : IArrayPool<byte>, IArrayPool<char>, IArrayPool<int>

char pools being half size of byte pools, and int half size of char, but all on same underlying data.

Re: object pools I put more or less what we use into this gist as I can't find open issue and so as not to derail the thread anymore https://gist.github.com/benaadams/5f5ea438d733de6f762a

@tenor
Copy link

tenor commented Dec 7, 2015

@sokket I like this proposal. My only concern is that you'd still have a lot of allocations if the sizes of pooled buffers varies significantly.
One way to reduce allocations is to allocate larger memory slabs, and rent out segments out of those slabs.

I wrote a functioning thread-safe bufferpool that takes this approach a while ago. Here is the repo and accompanying blog post.

The pool will expand or contract (by adding or removing slabs) as buffers are rented and returned and segments can span multiple slabs.

My motivation was to prevent fragmentation during socket operations but it would work fine for simply pooling bytes. You'd have to modify methods that use the buffer to accept IEnumerable<ArraySegment<byte>> instead of, or in addition to byte[]

@jonmill
Copy link
Contributor Author

jonmill commented Dec 7, 2015

From the DR video (17:15) "Once we give you a buffer nobody holds a reference to this T Array"

Your "Renting" in theory the consumer is "Taking" in practice.

True, but this is an implementation detail that consumers won't know about...the use semantics should be such that the consumer can immediately tell that this buffer is not permanently theirs...it is still owned by the Pool and they are simply using it for the time being 😃

...and once you fill them how will I be able to copy them into my Dictionary without pinning the memory? How will I know when data is ready to be copied from the native memory?

That problem is for the Native Pool :) for this pool, the cost of always pinning the pool is an overhead that many cases won't want. If the your use case requires pinning then you have ability to do so, but having the arrays always pinned seems overkill.

char pools being half size of byte pools, and int half size of char, but all on same underlying data.

Indeed, there are lots of interesting things we can do with Span...I'll be sure to add you to that thread when the proposal is worked through.

You'd have to modify methods that use the buffer to accept IEnumerable<ArraySegment> instead of, or in addition to byte[]

Therein lies the problem; this Pool is going to be used inside the framework as well, and going back to rewrite all the functions that take in T[] to use ArraySegment or IEnumerable is a non-starter...there are simply too many places to update / replace those functions. Due to that, we need to have our base be T[], which will fit with all current code (in the framework as well as outside) so the Pool can just be plugged in and work.

My only concern is that you'd still have a lot of allocations if the sizes of pooled buffers varies significantly.

True, but the allocations are front-loaded when you first request one...as an implementation detail, the buckets can be lazy loaded so that your app memory pressure doesn't explode. Once you pay the upfront cost, Renting is super cheap and you save on GC passes since the objects will be long-lived and won't need to be reclaimed; that is where the real cost savings come in.

@tenor
Copy link

tenor commented Dec 8, 2015

Therein lies the problem; this Pool is going to be used inside the framework as well, and going back to rewrite all the functions that take in T[] to use ArraySegment or IEnumerable is a non-starter...there are simply too many places to update / replace those functions. Due to that, we need to have our base be T[], which will fit with all current code (in the framework as well as outside) so the Pool can just be plugged in and work.

A pooled stream can be useful here, to work with existing methods that accept a stream.
So you can have something like:
pool.GetStream(size)
or
new PooledMemoryStream(pool, size)
The pooled stream will encompass the rented byte[T] (or IEnumerable<ArraySegment<byte>> in my approach).
Disposing the pooled stream will return the buffer.
An advantage with using a stream is that the user cannot hold onto the byte array after the buffer has been returned since they don't have access to it.
There's the issue of allocations for the actual stream object and the List<ArraySegment<byte>> object holding the segments but I guess those can be pooled too.

This is one distinction between an ObjectPool and a BufferPool. You can access a BufferPool as a stream but not an ObjectPool.
I think this is worth considering as you work through the proposal.

@clrjunkie
Copy link

@sokket

True, but this is an implementation detail that consumers won't know about

It was stated afterwards that this pool "can’t leak" which led me to believe that consumer ownership of the returned buffer is actually part of the pool specification.

the use semantics should be such that the consumer can immediately tell that this buffer is not permanently theirs

I believe this is already made clear by the class being a “Pool”. While “Rent” implies the pool has ownership of the buffer to me it sounds moot because nothing actually backs this up and it doesn’t really help in avoiding leaks (which the class doesn’t anyway). Just my 2c.

the cost of always pinning the pool is an overhead that many cases won't want.

I never suggested that the pool includes pinning as part of its contract, all I said that I needed buffer pools when pinning was an issue which indeed seems like the common scenario judging by others recent posts on this thread. The DR should include at least two short non-pinning code examples (a.k.a use cases) to justify the existence of this class in the FW.

@JamesNK
Copy link
Member

JamesNK commented Dec 8, 2015

Hi

Is the "Proposed API" still the latest API? I modeled an interface for a buffer pool off this about a month ago - https://github.com/JamesNK/Newtonsoft.Json/blob/master/Src/Newtonsoft.Json/IJsonBufferPool.cs

Has the API changed at all since then? Will it change? I might as well keep mine in sync until it is released and is locked down.

@jonmill
Copy link
Contributor Author

jonmill commented Dec 8, 2015

It was stated afterwards that this pool "can’t leak" which led me to believe that consumer ownership of the returned buffer is actually part of the pool specification.

Ah I see...sorry for the confusion there. I referred to "leak" in the context of alive-but-not-GC'd, which cannot happen.

I never suggested that the pool includes pinning as part of its contract, all I said that I needed buffer pools when pinning was an issue which indeed seems like the common scenario judging by others recent posts on this thread. The DR should include at least two short non-pinning code examples (a.k.a use cases) to justify the existence of this class in the FW.

Sorry about that, my mistake. I'll add an example with Pinning :)

Has the API changed at all since then? Will it change? I might as well keep mine in sync until it is released and is locked down.

The Proposed API is not the latest...I'm close to locking down the final API now; I'm finishing up some conversations about shared vs instance, as well as around testability around the pool (internal testing and the ability to swap out the shared pool for a custom one).

Thanks for all the feedback folks. I'll be updating the above proposal with the concrete API by, hopefully, the end of the week. There's been a lot of great feedback from everyone and I have a good idea of what requirements everyone has. I'll spend some time coming up with a proposal that can benefit as many of the requirements as possible and update the proposal with the finalized API, along with usage descriptions and code samples. Thanks everyone for helping!

@ghost
Copy link

ghost commented Dec 8, 2015

Thank you @sokket, we adore your attitude. :)

@jonmill
Copy link
Contributor Author

jonmill commented Dec 14, 2015

The initial implementation has been merged. Thanks for the feedback everyone!

@jonmill jonmill closed this as completed Dec 14, 2015
marfenij referenced this issue in marfenij/protobuf-net-data Feb 16, 2018
stream.CopyTo work incorrect, this bug occurce aftter https://github.com/dotnet/corefx/issues/4547 introduced
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 1.0.0-rc2 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Jan 4, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests