Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add Allocator type param to MutableBuffer #6336

Closed
wants to merge 14 commits into from

Conversation

waynexia
Copy link
Member

Which issue does this PR close?

Part of #3960

Rationale for this change

I find #3960 for the same reason -- tracking memory consumption with Arrays. The most straightforward way is to use the unstable Allocator API.

By adding a decoration layer to Allocator we can keep track of the real-time memory consumption of each container instance. There is a tiny tool that implements this https://github.com/waynexia/unkai

What changes are included in this PR?

The most important changes in this PR are

  • add a feature gate allocator_api, same name as the unstable rust feature
  • add an optional type parameter A: Allocator = Global to MutableBuffer like Vec

Since this requires an unstable rust feature, allocator_api can only be used in the nightly toolchain. To keep this library still working in the stable toolchain when the feature gate is disabled, this PR defines many dummy structs like Global and Allocator to substitute those from std lib as they are not referencable without unstable feature.

When allocator_api is enabled, users can either appoint their Allocator by new APIs new_in and with_capacity_in or invoking From<Vec<T, A>> which will inherit the allocator from vec.

This PR is tested with MIRI:

cargo +nightly miri nextest run -p arrow-buffer -F allocator_api
cargo +nightly miri nextest run -p arrow-buffer

Are there any user-facing changes?

New APIs as described above

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Aug 30, 2024
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
arrow-buffer/src/buffer/mutable.rs Outdated Show resolved Hide resolved
return;
}

#[cfg(feature = "allocator_api")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entire MutableBuffer has been gated under allocator_api. Do we still need it?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You may misread this. Only those new public APIs are limited under the feature gate.

@@ -28,6 +31,23 @@ use crate::{

use super::Buffer;

#[cfg(not(feature = "allocator_api"))]
pub trait Allocator: private::Sealed {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At first glance, it's somewhat confusing to define a sealed trait here. Maybe add some comments here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added, please take a look

arrow-buffer/src/buffer/mutable.rs Show resolved Hide resolved
@@ -28,6 +31,23 @@ use crate::{

use super::Buffer;

#[cfg(not(feature = "allocator_api"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if it's a good idea to introduce allocator-api2 as a compatibility layer?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your information. It looks like allocator-api2 provides many things we don't need. And I'm hesitant to add a new dep for the normal case. We can add it in the future if it turns out that allocator-api2 suits our use case well. What we need here is

  • Placeholders to standard library things that are not available without unstable features (Allocator and Global)
  • Wrapped alloc and dealloc methods

As for now we can define them by ourselves in few lines.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Copy link
Member

@Xuanwo Xuanwo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you very much for your work. Overall, it looks good to me.

By the way, I still believe introducing allocator_api2 is a good idea, as it eliminates the need for a feature that requires nightly Rust.

The decision is up to you.


This PR also makes me wonder if we can introduce an allocator for all similar APIs.

@waynexia
Copy link
Member Author

Thanks for your review @Xuanwo

This PR also makes me wonder if we can introduce an allocator for all similar APIs.

I plan to add this for Buffer as well for the next step

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @waynexia and @Xuanwo for the reviews

I think this idea is quite cool 🙏 However, in order to get it ready to merge a few more things are needed:

  1. Documentation (specifically document the new feature here https://crates.io/crates/arrow)
  2. Tests

In terms of testing, what I think would be the most useful would be an "end to end" test / example of how to use this feature. For example to create a MutableBuffer with a custom allocator and report on the allocations performed or something.

Perhaps we could add such an example to the examples directory https://github.com/apache/arrow-rs/tree/master/arrow/examples and add a pointer to that that to the documentation?

@@ -61,8 +61,8 @@ jobs:
submodules: true
- name: Setup Rust toolchain
uses: ./.github/actions/setup-builder
- name: Test arrow-buffer with all features
run: cargo test -p arrow-buffer --all-features
- name: Test arrow-buffer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found it confusing at first why this line doesn't follow the model of the rest of the tests in this workflow which run with --all-features

I am also concerned this may remove coverage for some of the other features such as prettyprint, ffi and pyarrow

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

arrow-buffer only has one feature gate allocator_api from this PR. This specific change won't affect other tests like prettyprint as the command -p arrow-buffer only runs on this sub-crate. But I agree it's easy to forget when the new feature comes in the future... Sadly, cargo seems not to support opt-out one feature in CLI 😢

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh, the same problem comes to the main arrow lib because I added a delegate feature allocator_api = ["arrow-buffer/allocator_api"]. Looks like we have to enumerate all available features except allocator_api in CI?

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia
Copy link
Member Author

waynexia commented Sep 2, 2024

Thanks for reviewing @alamb. I've created a new example allocator_api.rs under examples/, and a paragraph in the doc comment of MutableBuffer that describes the usage and links to the example.

For example to create a MutableBuffer with a custom allocator and report on the allocations performed or something.

I tried to use unkai at first to demo the ability to track memory usage but then I realized I can't add an optional dev-dependence. An unconditional dep to unkai will break other examples without a nightly toolchain. So the example only uses Global. (maybe I should add a similar feature gate to unkai?)

#[cfg(not(feature = "allocator_api"))]
allocator: Global,
#[cfg(feature = "allocator_api")]
allocator: *value.allocator(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The From impl would benefit from having a type parameter for the allocator, otherwise the allocator here would always implicitly be the global one. That probably requires two separate impl blocks depending on the feature flag.

The same would be useful for Buffer::from_vec(Vec).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will default to Global without allocator_api, and inherit from the vector via Vec::allocator() API. Is this behavior ideal?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got what you mean https://github.com/apache/arrow-rs/pull/6336/files#diff-371342744df1b634b0bd9d90f4fe38c1eb0096df322fd3cc2fbc513f3428046cR692-R696

We should have two impl blocks indeed for different type parameters

@jhorstmann
Copy link
Contributor

An unconditional dep to unkai will break other examples without a nightly toolchain. So the example only uses Global. (maybe I should add a similar feature gate to unkai?)

A minimal allocator implementation that tracks the memory usage shouldn't be too big to include directly in the example, without dependencies.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@alamb
Copy link
Contributor

alamb commented Sep 18, 2024

I am depressed about the large review backlog in this crate. We are looking for more help from the community reviewing PRs -- see #6418 for more

@jhorstmann
Copy link
Contributor

I plan to look into this in more detail as it might be useful in our codebase, which makes good use of a custom allocator to track memory usage.

A unit test with a custom allocator, to be run with miri, would be very nice. I have a concern that after turning the MutableBuffer into a Buffer it would not be freed with the correct allocator. Adding a generic parameter to Buffer would affect a lot more apis, maybe it could use the existing Deallocation::Custom strategy somehow.

Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
Signed-off-by: Ruihang Xia <waynestxia@gmail.com>
@waynexia
Copy link
Member Author

I have a concern that after turning the MutableBuffer into a Buffer it would not be freed with the correct allocator. Adding a generic parameter to Buffer would affect a lot more apis, maybe it could use the existing Deallocation::Custom strategy somehow.

Good catch 👍 I've limited those freeze() APIs to Global alloc in 6b28b10. Now it should not be possible to construct a Buffer with memory from other allocators. I'll add this functionalities back when implementing allocator param for Buffer in the follow up PR.

A unit test with a custom allocator, to be run with miri, would be very nice.

New test mutable_buffer_with_custom_allocator in 02d73ad. There is a script .github/workflows/miri.sh to run it

Copy link
Contributor

@alamb alamb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand this API correctly that if a user directly creates Arrays from a Vec then they could provide their own allocator, but otherwise the default allocator will be used?

I am thinking about a usecase where

  1. A user creates a new Array that uses a custom allocator
  2. Then calls some kernel like take or filter that creates a new Array

It seems like the new Array will use the default system allocator rather than the custom allocator which will limit the effectiveness of this approach

Would it be possible with this API to support changing the global allocator for all new Arrays that are created?

100,
allocator_tracker.clone(),
);
let mut buffer = MutableBuffer::from(vector);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very cool

@jhorstmann
Copy link
Contributor

Now it should not be possible to construct a Buffer with memory from other allocators. I'll add this functionalities back when implementing allocator param for Buffer in the follow up PR.

Would it be possible to open that followup PR already in draft mode? I think it would be easier to review the impact of this PR if I could already see how it would affect the Buffer and Array apis. Would the concrete Array implementations also get a type parameter for the allocator? Could an array contains buffers from different allocators?

@waynexia
Copy link
Member Author

It seems like the new Array will use the default system allocator rather than the custom allocator which will limit the effectiveness of this approach

Yes, it will be Global without additional code changes. This is the downside of this Allocator trait 😢

Would it be possible with this API to support changing the global allocator for all new Arrays that are created?

Still take unkai as an example, it also provides one global allocator wrapper for similar things here, which I think would be a common pattern if one wants global memory usage tracking. But on the other hand, the global style can't tell the usage per object (or per session, of course).

I don't wish for the full support of Allocator type param in even one or two releases. But with this patch as the beginning, we can expect more cases to use it like in #6439. Or tracking & limiting memory consumption of one write buffer (a group of mutable buffers in GreptimeDB).

Would it be possible to open that followup PR already in draft mode?

Here it is #6455, I only changed the into_buffer() which could demonstrate the conversion between mutable buffer to immutable buffer (with some new miri test). As for the type param, since we have our own Bytes definition with trait object, I think we don't need to add new param to them like in MutableBuffer.

@tustvold
Copy link
Contributor

tustvold commented Sep 26, 2024

I've not had time to look at this PR in detail, but my initial response is to echo the concern Andrew raised regarding the general utility of this abstraction. As I understand it this mechanism would require making every codepath that allocates arrays be generic on an Allocator and take an Allocator as a parameter. Even putting aside the fact that this would only be possible on nightly, this is a monumental amount of work effectively doubling the API surface.

Taking a step back, memory allocation in arrow should rarely be on a hot path. As such having monomorphised access to the allocator implementation is perhaps not as important as for say a standard library collection type, we can afford to have a moderate additional overhead per allocation.

Additionally the proposed use-cases appear to largely center around memory tracking, as opposed to altering the behaviour of the system allocator. As such we could potentially just store a std::sync::Weak<AtomicUsize> in thread local storage, with allocations decrementing / incrementing this as appropriate. Implementations can then override to capture allocations performed in that context. Some care would be needed in async contexts, e.g. by wrapping spawned futures, but nothing overly intrusive.

If this sounds like an avenue worth exploring I can probably find time to write the idea up in more detail / file a PR with this

@waynexia
Copy link
Member Author

As I understand it this mechanism would require making every codepath that allocates arrays be generic on an Allocator and take an Allocator as a parameter.

IMO this is not that horrible in two aspects:

  • I suspect the type param on struct would only reside on MutableBuffer for a long time, maybe until the Allocator gets stable. From Buffer and above the type param is erased (see Allocator api for immutable Buffer #6455). Like how we interact with Vec which may or may not belong to a customized allocator: nothing is wrong if we just ignore it, only a "feature" is missing.
  • We can gain great progress after the major mechanisms of constructing an array like through Vec or MutableBuffer are handled.

There is still much work to do to adapt those methods. But they are not that tightly coupled together like MutableBuffer and Buffer.

Additionally the proposed use-cases appear to largely center around memory tracking, as opposed to altering the behaviour of the system allocator. As such we could potentially just store a std::sync::Weak in thread local storage, with allocations decrementing / incrementing this as appropriate.

If we don't need per-object tracking, simply replace the global allocator as @alamb mentioned before should be okay. And this is already viable in the current version as we don't need to do any change in the arrow side.

And considering the arrow is a large group of libs, we don't have to take this allocator feature as "can or can't" for the entire project. It is acceptable from my side that this feature is only available in a few low-level APIs like arrow-buffer and isn't in others like arrow-compute in a specific period.

@tustvold
Copy link
Contributor

and isn't in others like arrow-compute in a specific period.

This is the concern, by going down this path are we signing up for a long slog that we already suspect isn't going to ever be finished?

I think we should approach this from first principles, specifically what are we trying to achieve, e.g. allow tracking per-query memory usage reliably, and then discuss and propose concrete designs to achieve this. As it stands this PR adds some functionality without really articulating a practical vision for how this might be used in a downstream library like datafusion to achieve a stated goal

@alamb
Copy link
Contributor

alamb commented Sep 28, 2024

If we don't need per-object tracking, simply replace the global allocator as @alamb mentioned before should be okay. And this is already viable in the current version as we don't need to do any change in the arrow side.

I think we should approach this from first principles, specifically what are we trying to achieve, e.g. allow tracking per-query memory usage reliably, and then discuss and propose concrete designs to achieve this. As it stands this PR adds some functionality without really articulating a practical vision for how this might be used in a downstream library like datafusion to achieve a stated goal

I agree. I think we should continue the big picture discussion on #3960

@tustvold
Copy link
Contributor

Alternate proposal - #6590

@tustvold
Copy link
Contributor

Closing as I think the consensus is to proceed with an approach based around #6590

@tustvold tustvold closed this Nov 29, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants