-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(allocator): Introduce scoped allocator #9230
Conversation
**Related issue:** - This PR is a part of #9230.
**Description:** This is a part of #9230. I profiled the performance, and `thread_local` took too long to get the address of the thread-local variable. So, I inlined the reference into the allocator. # Benchmark result ``` Gnuplot not found, using plotters backend common/allocator/alloc/std/1000000 time: [4.9478 ms 4.9653 ms 4.9922 ms] Found 17 outliers among 100 measurements (17.00%) 4 (4.00%) high mild 13 (13.00%) high severe common/allocator/alloc/no-scope/1000000 time: [5.4821 ms 5.4938 ms 5.5068 ms] Found 17 outliers among 100 measurements (17.00%) 2 (2.00%) high mild 15 (15.00%) high severe common/allocator/alloc/scoped/1000000 time: [3.1401 ms 3.1456 ms 3.1518 ms] Found 12 outliers among 100 measurements (12.00%) 3 (3.00%) high mild 9 (9.00%) high severe common/allocator/alloc/cached-no-scope/1000000 time: [5.0992 ms 5.1090 ms 5.1198 ms] Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) high mild 9 (9.00%) high severe common/allocator/alloc/cached-scoped/1000000 time: [3.0191 ms 3.0230 ms 3.0273 ms] Found 11 outliers among 100 measurements (11.00%) 2 (2.00%) low mild 1 (1.00%) high mild 8 (8.00%) high severe ```
**Description:** This PR is a part of #9230
Doesn't this break rusts safety guarantees or am i missing something? |
Yeah, just had a conversation with @kdy1 about this. The safety problem boils down to this line: swc/crates/swc_allocator/src/alloc.rs Lines 36 to 39 in d2e96bf
I think this can be mitigated, though it might be hard and require more API changes:
|
Marking the scoped api unsafe would be correct but would give the user of the api the impossible task of verifying that no code within the callback violate the safety comment. Have you considered reference counting the allocator? With If you are ok with the Ast being Sync but not Send you cold even get away with using I made an attempt at implementing it for fun: use allocator_api2::alloc::{AllocError, Allocator, Layout};
use fastalloc::FastAlloc;
use std::{cell::Cell, ptr::NonNull, thread};
// The module containing FastAlloc is intentionally as small as possible to make it easy to verify the safety comment.
mod fastalloc {
use bumpalo::Bump;
use std::rc::Rc;
#[derive(Default)]
pub struct FastAlloc(Rc<Bump>);
impl FastAlloc {
pub fn clone(&mut self) -> FastAlloc {
FastAlloc(self.0.clone())
}
pub fn alloc(&self) -> &Bump {
self.0.as_ref()
}
}
// Safety:
// The Rc in FastAlloc can only be cloned by first getting a mutable reference to FastAlloc.
// Getting a mutable reference to FastAlloc on another thread is impossible because FastAlloc doesn't implement Send.
unsafe impl Sync for FastAlloc {}
}
// Safety:
// Implementation is delegated to Bump
unsafe impl Allocator for FastAlloc {
fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
self.alloc().allocate(layout)
}
unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) {
self.alloc().deallocate(ptr, layout)
}
fn allocate_zeroed(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
self.alloc().allocate(layout)
}
unsafe fn grow(
&self,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<NonNull<[u8]>, AllocError> {
self.alloc().grow(ptr, old_layout, new_layout)
}
unsafe fn grow_zeroed(
&self,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<NonNull<[u8]>, AllocError> {
self.alloc().grow_zeroed(ptr, old_layout, new_layout)
}
unsafe fn shrink(
&self,
ptr: NonNull<u8>,
old_layout: Layout,
new_layout: Layout,
) -> Result<NonNull<[u8]>, AllocError> {
self.alloc().shrink(ptr, old_layout, new_layout)
}
}
thread_local! {
static ALLOC: Cell<Option<FastAlloc>> = const { Cell::new(None) };
}
fn scope(f: impl FnOnce()) {
ALLOC.set(Some(FastAlloc::default()));
f();
ALLOC.set(None);
}
fn batch_alloc<T>(f: impl FnOnce(&mut FastAlloc) -> T) -> T {
let mut alloc: FastAlloc = ALLOC.take().unwrap(); // Fallback to another allocator instead of panicking if desired.
let result = f(&mut alloc);
ALLOC.set(Some(alloc));
result
}
#[derive(Debug)]
struct Box<T>(allocator_api2::boxed::Box<T, FastAlloc>);
impl<T> Box<T> {
fn new(val: T) -> Self {
batch_alloc(|alloc| Self::new_in(val, alloc))
}
fn new_in(val: T, alloc: &mut FastAlloc) -> Self {
Box(allocator_api2::boxed::Box::new_in(val, alloc.clone()))
}
}
#[derive(Debug)]
enum LinkedList<T> {
Empty,
Cons(T, Box<LinkedList<T>>),
}
fn main() {
scope(|| {
let list = LinkedList::Empty;
let list = LinkedList::Cons(3, Box::new(list));
let list = LinkedList::Cons(2, Box::new(list));
let list = LinkedList::Cons(1, Box::new(list));
// Taking and resetting ALLOC might be expensive so i provided a batch api.
// I have not profiled anything so it might not be needed.
let faster_list = batch_alloc(|alloc| {
let list = LinkedList::Empty;
let list = LinkedList::Cons(3, Box::new_in(list, alloc));
let list = LinkedList::Cons(2, Box::new_in(list, alloc));
let list = LinkedList::Cons(1, Box::new_in(list, alloc));
list
});
// Even though Box can't be sent to another thread. Sending shared references to other threads is fine.
thread::scope(|s| {
s.spawn(|| dbg!(&list));
s.spawn(|| dbg!(&list));
s.spawn(|| dbg!(&faster_list));
s.spawn(|| dbg!(&faster_list));
});
});
} |
I already tried |
Closing as this is an issue more than a PR |
Description:
We don't need to add lifetimes to everywhere!
It's 16 bytes to avoid the cost of thread-local accesses.swc_allocator::boxed::Box<T>
isstd::boxed::Box<T>
orbumpalo::boxed::Box<T>
, but in a compact form by using pointer magic. The size ofBox<T>
is 8 bytes.scoped_tls
.FastAlloc
will be used to cache the context because accessing a thread-local variable is an expensive operation.FastAlloc
instead ofBox::new()
wherever possible.swc_allocator/scoped
.Box<T>
andVec<T>
should work identically as ones fromstd
.swc_allocator::scope()
that enables allocations usingbumpalo
.swc_core
should reexport the feature.swc_core
should reexport the allocator crate.@swc/core
Box<T>
allocated from another thread.