feat: use bucket mutex instead of global mutex for dynamic set #268

Boshen · 2023-02-16T15:31:36Z

Background:

I'm working a concurrent program using swc to parse files in parallel. swc uses string-cache.

I have found the global mutex in the dynamic set being the major performance bottleneck for our application. Using instruments as the profiler, we can see that string-cache contributes to 11.4% (9.6s) of the time:

By switch to a a bucket level mutex (this PR), string-cache drops to 1.6% (1.04s) of the time:

For this PR, can I kindly ask for a code review even if this PR cannot be merged due to any kind of circumstances, so I can safely patch my application with a fork, or help me polish this so everyone using swc can benefit from this PR.

Thank you all in advance. 🍻

Manishearth · 2023-02-16T16:05:18Z

cc @jdm @mrobinson if either of you are familiar with the guts of this crate and what the perf implications would be here

(I could try and figure this out but it would take me a while; I don't have that much time rn)

YoniFeng · 2023-02-16T21:50:25Z

(Just curious, feel free to ignore - I have no Rust or general low-level programming expertise)

I'm failing to imagine how this can have negative perf implications.
I looked around for concurrent HashMap implementations, locking per-bucket seems standard (Folly's ConcurrentHashMap, oneTBB concurrent_hash_map, .NET's ConcurrentDictionary).

froydnj · 2023-02-16T21:53:17Z

FWIW, we adopted this strategy in Firefox's atom table (which is essentially the same thing as this crate), and noticed no ill effects (as well as significant performance gains for parallel modification).

jdm · 2023-02-17T04:58:20Z

I read through the changes and they look sensible to me. I have one question: if the focus of this PR is about switching from a global mutex to a mutex per bucket, why the switch from what looks like a linked list implementation to a vector instead? I don't see a reason that that change is required for the mutex one; am I missing anything?

Boshen · 2023-02-17T05:14:55Z

I read through the changes and they look sensible to me. I have one question: if the focus of this PR is about switching from a global mutex to a mutex per bucket, why the switch from what looks like a linked list implementation to a vector instead? I don't see a reason that that change is required for the mutex one; am I missing anything?

It was because I had trouble with adding Mutex onto the linked list. Now you have brought this up, should I switch it to std::collections::LinkedList?

Boshen · 2023-02-17T05:26:48Z

Or maybe std::collections::VecDeque? Given the LinkedList specifically doc notes

NOTE: It is almost always better to use Vec or VecDeque because array-based containers are generally faster, more memory efficient, and make better use of CPU cache.

Boshen · 2023-02-17T06:45:19Z

I profiled the three implementations (hand-rolled linked list vs VecDeque vs Vec). Vec seems to have the best performance.

Boshen · 2023-02-17T07:47:15Z

This Vec version seems flawed. I ran my application for a few dozen times and it occasionally reported false postives (not getting the correct string), but I can't find where the bug is.

Let me switch this PR back to the slower but correct hand-rolled linked list version.
And we can iterate from here.

Boshen · 2023-02-17T08:12:22Z

@jdm Would you like to take another look?

YoniFeng · 2023-02-17T15:05:21Z

This Vec version seems flawed. I ran my application for a few dozen times and it occasionally reported false postives (not getting the correct string), but I can't find where the bug is.

I don't understand what the "reported false positives" means. However, I did notice a subtle difference between your Vec implementation and the existing, so maybe this helps:
When inserting, in the case where entry.ref_count.fetch_add(1, SeqCst) returns 0, you push the new entry to the vector (i.e. to the back/end) whereas the existing code adds the entry to the head of the linked list.
This means it's possible for the next insert() usage to again find() the entry that is about to be removed, and we'll end up inserting another duplicate entry (i.e. an unintended 3rd, 4th etc.).
With the existing implementation, the next insert() is guaranteed to find() the new entry that was added since it's at the front.

If this can explain the issue you saw with the Vec version, I think you should feel comfortable using a VecDeque and using push_front() (for the better performance).

YoniFeng · 2023-02-19T15:06:56Z

src/dynamic_set.rs

            hash,
            ref_count: AtomicIsize::new(1),
            string: string.into_boxed_str(),
        });
        let ptr = NonNull::from(&mut *entry);
-        self.buckets[bucket_index] = Some(entry);
-
+        vec.push(entry);


The existing code adds the entry to the head of the linked list.
This opens the possibility for the next insert() usage to again find() the entry that is about to be removed, and we'll end up inserting another duplicate entry (i.e. an unintended 3rd, 4th etc.).
With the existing implementation, the next insert() is guaranteed to find() the new entry that was added (and increment its ref count) since it's at the front.

jdm

Thanks for reducing the scope of this change! Would you mind squashing these commits?

This implementation uses bucket level mutex with linear probing.

Boshen · 2023-02-20T01:33:12Z

Thanks for reducing the scope of this change! Would you mind squashing these commits?

@jdm Done.

jdm · 2023-02-20T13:57:04Z

@bors-servo r+

bors-servo · 2023-02-20T13:57:06Z

📌 Commit b473a4a has been approved by jdm

bors-servo · 2023-02-20T13:57:10Z

⌛ Testing commit b473a4a with merge 51743a7...

bors-servo · 2023-02-20T13:58:58Z

☀️ Test successful - checks-github
Approved by: jdm
Pushing 51743a7 to master...

kdy1 · 2023-02-22T10:42:48Z

Can you publish a new version, please?

mrobinson · 2023-02-22T10:53:11Z

Can you publish a new version, please?

I've opened a PR to bump the version here: #270

kdy1 · 2023-02-22T10:55:19Z

Thank you!

mrobinson · 2023-02-22T13:46:24Z

https://crates.io/crates/string-cache/0.8.5 is now available on crates.io.

**Description:** I want to check the performance difference caused by servo/string-cache#268.

Revert trivial impl of Borrow<str> for Atom #266 caused a breaking change (see #271) in 0.8.5 (which was yanked). This PR fixes #271 / will allow publishing new versions (so #268 can make it out). I've just started learning Rust, haven't been able to 100% wrap my head around the issue. The breaking change manifesting in the "browserlist-rs" crate: ```rust error[E0283]: type annotations needed --> {.. snip ..}\browserslist-rs-0.12.3\src\data\caniuse.rs:91:35 | 91 | let chrome = CANIUSE_BROWSERS.get(&"chrome".into()).unwrap(); | ^^^ ---------------- type must be known at this point | | | cannot infer type of the type parameter `Q` declared on the associated function `get` | = note: multiple `impl`s satisfying `Atom<BrowserNameAtomStaticSet>: Borrow<_>` found in the following crates: `core`, `string_cache`: - impl<Static> Borrow<str> for Atom<Static> where Static: StaticAtomSet; - impl<T> Borrow<T> for T where T: ?Sized; note: required by a bound in `AHashMap::<K, V, S>::get` --> {.. snip ..}\ahash-0.7.6\src\hash_map.rs:81:12 | 81 | K: Borrow<Q>, | ^^^^^^^^^ required by this bound in `AHashMap::<K, V, S>::get` help: consider specifying the generic argument | 91 | let chrome = CANIUSE_BROWSERS.get::<Q>(&"chrome".into()).unwrap(); | +++++ ``` `CANIUSE_BROWSERS` is a ```rust AHashMap<BrowserNameAtom, BrowserStat> ``` where `BrowserNameAtom` is an Atom generated with `string_cache_codegen`. The signature of the `get` function: ```rust pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V> where K: Borrow<Q>, Q: Hash + Eq, { self.0.get(k) } ``` `K` is the generic key type for the hash map. This is the exact same signature as the std lib HashMap, to which it delegates. When I debug 0.8.4 usage locally, the `From` that gets used for the `into()` is ```rust impl<'a, Static: StaticAtomSet> From<&'a str> for Atom<Static> { #[inline] fn from(string_to_add: &str) -> Self { Atom::from(Cow::Borrowed(string_to_add)) } } ``` It isn't clear to me how the `Borrow<str>` impl that was added "competes" with the std lib's implementation here. It doesn't seem relevant?

fix: move debug_assert check Fix needed after #268 The fix is only for debug_assert correctness. There's no functional effect. i.e. version 0.8.6 might cause debug tests to fail, but it isn't a (functionally) breaking change. **Explanation** Moving the lock to be per-bucket changed the `DYNAMIC_SET`'s API so that it doesn't need to be locked (i.e. `DYANMIC_SET` is not wrapped with a `Mutex`). The `Atom`'s `fn drop` changed from ```rust fn drop(&mut self) { if self.tag() == DYNAMIC_TAG { let entry = self.unsafe_data.get() as *const Entry; if unsafe { &*entry }.ref_count.fetch_sub(1, SeqCst) == 1 { drop_slow(self) } } // Out of line to guide inlining. fn drop_slow<Static>(this: &mut Atom<Static>) { DYNAMIC_SET .lock() .remove(this.unsafe_data.get() as *mut Entry); } } ``` to ```rust impl<Static> Drop for Atom<Static> { #[inline] fn drop(&mut self) { if self.tag() == DYNAMIC_TAG { let entry = self.unsafe_data.get() as *const Entry; if unsafe { &*entry }.ref_count.fetch_sub(1, SeqCst) == 1 { drop_slow(self) } } // Out of line to guide inlining. fn drop_slow<Static>(this: &mut Atom<Static>) { DYNAMIC_SET.remove(this.unsafe_data.get() as *mut Entry); } } ``` (the `lock()` is gone) Now when we enter `DYNAMIC_SET.remove()`, there's no longer a lock guarantee. Until we lock the bucket, another thread could be in the middle of performing a `DYNAMIC_SET.insert()` for the same string. Therefore, the `debug_assert!(value.ref_count.load(SeqCst) == 0)` is premature - it needs to occur after we take the lock for the bucket. Caught at swc-project/swc#6980 in a [failed test](https://github.com/swc-project/swc/actions/runs/4326536698/jobs/7554083939).

This was referenced Feb 17, 2023

perf: patch string-cache with dashmap version swc-project/swc#6937

Closed

perf: replace string-cache with bucket-level version web-infra-dev/rspack#1879

Merged

perf: replace string-cache's global HashMap with DashMap web-infra-dev/rspack#1833

Closed

hyf0 mentioned this pull request Feb 17, 2023

Use better string type swc-project/swc#4946

Closed

6 tasks

YoniFeng reviewed Feb 19, 2023

View reviewed changes

jdm approved these changes Feb 19, 2023

View reviewed changes

feat: use bucket mutex instead of global mutex for dynamic set

b473a4a

This implementation uses bucket level mutex with linear probing.

Boshen force-pushed the bucket-mutex-upstream branch from 73d5cfd to b473a4a Compare February 20, 2023 01:31

bors-servo merged commit 51743a7 into servo:master Feb 20, 2023

This was referenced Feb 20, 2023

Drop memory after releasing the global lock #261

Open

chore: 🤖 make parking_lot dependency optional #265

Open

Boshen deleted the bucket-mutex-upstream branch February 20, 2023 14:38

kdy1 mentioned this pull request Feb 22, 2023

chore(deps): Update string-cache dudykr/stc#743

Merged

kdy1 added a commit to dudykr/stc that referenced this pull request Feb 22, 2023

chore(deps): Update string-cache (#743)

f991878

**Description:** I want to check the performance difference caused by servo/string-cache#268.

YoniFeng mentioned this pull request Feb 23, 2023

Revert trivial impl of Borrow<str> for Atom #272

Merged

YoniFeng mentioned this pull request Mar 3, 2023

fix: move debug_assert check #275

Merged

overlookmotel mentioned this pull request Feb 29, 2024

Proper string type for rolldown rolldown/rolldown#427

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: use bucket mutex instead of global mutex for dynamic set #268

feat: use bucket mutex instead of global mutex for dynamic set #268

Boshen commented Feb 16, 2023

Manishearth commented Feb 16, 2023

YoniFeng commented Feb 16, 2023 •

edited

Loading

froydnj commented Feb 16, 2023

jdm commented Feb 17, 2023

Boshen commented Feb 17, 2023

Boshen commented Feb 17, 2023 •

edited

Loading

Boshen commented Feb 17, 2023

Boshen commented Feb 17, 2023

Boshen commented Feb 17, 2023

YoniFeng commented Feb 17, 2023 •

edited

Loading

YoniFeng Feb 19, 2023

jdm left a comment

Boshen commented Feb 20, 2023

jdm commented Feb 20, 2023

bors-servo commented Feb 20, 2023

bors-servo commented Feb 20, 2023

bors-servo commented Feb 20, 2023

kdy1 commented Feb 22, 2023

mrobinson commented Feb 22, 2023

kdy1 commented Feb 22, 2023

mrobinson commented Feb 22, 2023

feat: use bucket mutex instead of global mutex for dynamic set #268

feat: use bucket mutex instead of global mutex for dynamic set #268

Conversation

Boshen commented Feb 16, 2023

Manishearth commented Feb 16, 2023

YoniFeng commented Feb 16, 2023 • edited Loading

froydnj commented Feb 16, 2023

jdm commented Feb 17, 2023

Boshen commented Feb 17, 2023

Boshen commented Feb 17, 2023 • edited Loading

Boshen commented Feb 17, 2023

Boshen commented Feb 17, 2023

Boshen commented Feb 17, 2023

YoniFeng commented Feb 17, 2023 • edited Loading

YoniFeng Feb 19, 2023

Choose a reason for hiding this comment

jdm left a comment

Choose a reason for hiding this comment

Boshen commented Feb 20, 2023

jdm commented Feb 20, 2023

bors-servo commented Feb 20, 2023

bors-servo commented Feb 20, 2023

bors-servo commented Feb 20, 2023

kdy1 commented Feb 22, 2023

mrobinson commented Feb 22, 2023

kdy1 commented Feb 22, 2023

mrobinson commented Feb 22, 2023

YoniFeng commented Feb 16, 2023 •

edited

Loading

Boshen commented Feb 17, 2023 •

edited

Loading

YoniFeng commented Feb 17, 2023 •

edited

Loading