Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(allocator): Introduce scoped allocator #9230

Closed
wants to merge 65 commits into from
Closed

Conversation

kdy1
Copy link
Member

@kdy1 kdy1 commented Jul 13, 2024

Description:

We don't need to add lifetimes to everywhere!

  • swc_allocator::boxed::Box<T> is std::boxed::Box<T> or bumpalo::boxed::Box<T>, but in a compact form by using pointer magic. The size of Box<T> is 8 bytes. It's 16 bytes to avoid the cost of thread-local accesses.
  • How it is allocated is determined based on the context. Namely, using scoped_tls.
  • FastAlloc will be used to cache the context because accessing a thread-local variable is an expensive operation.
  • Parser should use FastAlloc instead of Box::new() wherever possible.
  • It should be behind a feature flag. swc_allocator/scoped.
  • If the feature is disabled, Box<T> and Vec<T> should work identically as ones from std.
  • If the features is enabled, swc_allocator::scope() that enables allocations using bumpalo.
  • swc_core should reexport the feature.
  • swc_core should reexport the allocator crate.
  • Document downside: User should use same allocator while allocating/deallocating using scoped API.
  • Document that recommendation is enabling it only for fully single threaded tasks.
  • Use it from @swc/core
  • Test multi-threaded usecase. Namely, there should be a test about dropping Box<T> allocated from another thread.
  • [ ]

kdy1 added a commit that referenced this pull request Jul 14, 2024
**Related issue:**

 - This is a part of #9230
kdy1 added a commit that referenced this pull request Jul 14, 2024
**Related issue:**

 - This PR is a part of #9230.
kdy1 added a commit that referenced this pull request Jul 14, 2024
**Description:**

This is a part of #9230. I
profiled the performance, and `thread_local` took too long to get the
address of the thread-local variable. So, I inlined the reference into
the allocator.

# Benchmark result

```

Gnuplot not found, using plotters backend
common/allocator/alloc/std/1000000
                        time:   [4.9478 ms 4.9653 ms 4.9922 ms]
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe

common/allocator/alloc/no-scope/1000000
                        time:   [5.4821 ms 5.4938 ms 5.5068 ms]
Found 17 outliers among 100 measurements (17.00%)
  2 (2.00%) high mild
  15 (15.00%) high severe

common/allocator/alloc/scoped/1000000
                        time:   [3.1401 ms 3.1456 ms 3.1518 ms]
Found 12 outliers among 100 measurements (12.00%)
  3 (3.00%) high mild
  9 (9.00%) high severe

common/allocator/alloc/cached-no-scope/1000000
                        time:   [5.0992 ms 5.1090 ms 5.1198 ms]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe

common/allocator/alloc/cached-scoped/1000000
                        time:   [3.0191 ms 3.0230 ms 3.0273 ms]
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) low mild
  1 (1.00%) high mild
  8 (8.00%) high severe
```
kdy1 added a commit that referenced this pull request Jul 15, 2024
**Description:**

This PR is a part of #9230
kdy1 added a commit that referenced this pull request Jul 15, 2024
**Description:**

In the default mode, `Box<T>` and `Vec<T>` should work just like them from std.

**Related issue:**

 - This PR is part of #9230
@GabbeV
Copy link

GabbeV commented Jul 15, 2024

Doesn't this break rusts safety guarantees or am i missing something?
What prevents me from dereferencing a Box in another thread or after the scope has ended?

@bgw
Copy link

bgw commented Jul 15, 2024

Doesn't this break rusts safety guarantees or am i missing something? What prevents me from dereferencing a Box in another thread or after the scope has ended?

Yeah, just had a conversation with @kdy1 about this. The safety problem boils down to this line:

let s = unsafe {
// Safery: We are using a scoped API
transmute::<&'a Allocator, &'static Allocator>(self)
};

I think this can be mitigated, though it might be hard and require more API changes:

  • Optional (but on-by-default?) runtime checks in debug to detect escapes when they happen. This may require adding some guard objects and refcounts.
  • Marking the scoped API as unsafe so that people are required to make their own safety assertions.

@GabbeV
Copy link

GabbeV commented Jul 15, 2024

Marking the scoped api unsafe would be correct but would give the user of the api the impossible task of verifying that no code within the callback violate the safety comment.

Have you considered reference counting the allocator?

With Arc<Bump> instead of &'static Bump you should be able to avoid lifetime completely.

If you are ok with the Ast being Sync but not Send you cold even get away with using Rc<Bump> if the overhead of atomics turn out to be too high.

I made an attempt at implementing it for fun:

use allocator_api2::alloc::{AllocError, Allocator, Layout};
use fastalloc::FastAlloc;
use std::{cell::Cell, ptr::NonNull, thread};

// The module containing FastAlloc is intentionally as small as possible to make it easy to verify the safety comment.
mod fastalloc {
    use bumpalo::Bump;
    use std::rc::Rc;

    #[derive(Default)]
    pub struct FastAlloc(Rc<Bump>);

    impl FastAlloc {
        pub fn clone(&mut self) -> FastAlloc {
            FastAlloc(self.0.clone())
        }

        pub fn alloc(&self) -> &Bump {
            self.0.as_ref()
        }
    }

    // Safety:
    // The Rc in FastAlloc can only be cloned by first getting a mutable reference to FastAlloc.
    // Getting a mutable reference to FastAlloc on another thread is impossible because FastAlloc doesn't implement Send.
    unsafe impl Sync for FastAlloc {}
}

// Safety:
// Implementation is delegated to Bump
unsafe impl Allocator for FastAlloc {
    fn allocate(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
        self.alloc().allocate(layout)
    }

    unsafe fn deallocate(&self, ptr: NonNull<u8>, layout: Layout) {
        self.alloc().deallocate(ptr, layout)
    }

    fn allocate_zeroed(&self, layout: Layout) -> Result<NonNull<[u8]>, AllocError> {
        self.alloc().allocate(layout)
    }

    unsafe fn grow(
        &self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<NonNull<[u8]>, AllocError> {
        self.alloc().grow(ptr, old_layout, new_layout)
    }

    unsafe fn grow_zeroed(
        &self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<NonNull<[u8]>, AllocError> {
        self.alloc().grow_zeroed(ptr, old_layout, new_layout)
    }

    unsafe fn shrink(
        &self,
        ptr: NonNull<u8>,
        old_layout: Layout,
        new_layout: Layout,
    ) -> Result<NonNull<[u8]>, AllocError> {
        self.alloc().shrink(ptr, old_layout, new_layout)
    }
}

thread_local! {
    static ALLOC: Cell<Option<FastAlloc>> = const { Cell::new(None) };
}

fn scope(f: impl FnOnce()) {
    ALLOC.set(Some(FastAlloc::default()));
    f();
    ALLOC.set(None);
}

fn batch_alloc<T>(f: impl FnOnce(&mut FastAlloc) -> T) -> T {
    let mut alloc: FastAlloc = ALLOC.take().unwrap(); // Fallback to another allocator instead of panicking if desired.
    let result = f(&mut alloc);
    ALLOC.set(Some(alloc));
    result
}

#[derive(Debug)]
struct Box<T>(allocator_api2::boxed::Box<T, FastAlloc>);

impl<T> Box<T> {
    fn new(val: T) -> Self {
        batch_alloc(|alloc| Self::new_in(val, alloc))
    }

    fn new_in(val: T, alloc: &mut FastAlloc) -> Self {
        Box(allocator_api2::boxed::Box::new_in(val, alloc.clone()))
    }
}

#[derive(Debug)]
enum LinkedList<T> {
    Empty,
    Cons(T, Box<LinkedList<T>>),
}

fn main() {
    scope(|| {
        let list = LinkedList::Empty;
        let list = LinkedList::Cons(3, Box::new(list));
        let list = LinkedList::Cons(2, Box::new(list));
        let list = LinkedList::Cons(1, Box::new(list));

        // Taking and resetting ALLOC might be expensive so i provided a batch api.
        // I have not profiled anything so it might not be needed.
        let faster_list = batch_alloc(|alloc| {
            let list = LinkedList::Empty;
            let list = LinkedList::Cons(3, Box::new_in(list, alloc));
            let list = LinkedList::Cons(2, Box::new_in(list, alloc));
            let list = LinkedList::Cons(1, Box::new_in(list, alloc));
            list
        });

        // Even though Box can't be sent to another thread. Sending shared references to other threads is fine.
        thread::scope(|s| {
            s.spawn(|| dbg!(&list));
            s.spawn(|| dbg!(&list));
            s.spawn(|| dbg!(&faster_list));
            s.spawn(|| dbg!(&faster_list));
        });
    });
}

@kdy1
Copy link
Member Author

kdy1 commented Jul 15, 2024

I already tried Arc, but it was too slow.

@kdy1
Copy link
Member Author

kdy1 commented Nov 7, 2024

Closing as this is an issue more than a PR

@kdy1 kdy1 closed this Nov 7, 2024
@kdy1 kdy1 deleted the boxed branch November 7, 2024 07:01
@kdy1 kdy1 modified the milestones: Planned, v1.9.2 Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants