Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: use bucket mutex instead of global mutex for dynamic set #268

Merged
merged 1 commit into from
Feb 20, 2023

Conversation

Boshen
Copy link
Contributor

@Boshen Boshen commented Feb 16, 2023

Background:

I'm working a concurrent program using swc to parse files in parallel. swc uses string-cache.

I have found the global mutex in the dynamic set being the major performance bottleneck for our application. Using instruments as the profiler, we can see that string-cache contributes to 11.4% (9.6s) of the time:

image

By switch to a a bucket level mutex (this PR), string-cache drops to 1.6% (1.04s) of the time:

image


For this PR, can I kindly ask for a code review even if this PR cannot be merged due to any kind of circumstances, so I can safely patch my application with a fork, or help me polish this so everyone using swc can benefit from this PR.

Thank you all in advance. 🍻

@Manishearth
Copy link
Member

cc @jdm @mrobinson if either of you are familiar with the guts of this crate and what the perf implications would be here

(I could try and figure this out but it would take me a while; I don't have that much time rn)

@YoniFeng
Copy link
Contributor

YoniFeng commented Feb 16, 2023

(Just curious, feel free to ignore - I have no Rust or general low-level programming expertise)

I'm failing to imagine how this can have negative perf implications.
I looked around for concurrent HashMap implementations, locking per-bucket seems standard (Folly's ConcurrentHashMap, oneTBB concurrent_hash_map, .NET's ConcurrentDictionary).

@froydnj
Copy link

froydnj commented Feb 16, 2023

FWIW, we adopted this strategy in Firefox's atom table (which is essentially the same thing as this crate), and noticed no ill effects (as well as significant performance gains for parallel modification).

@jdm
Copy link
Member

jdm commented Feb 17, 2023

I read through the changes and they look sensible to me. I have one question: if the focus of this PR is about switching from a global mutex to a mutex per bucket, why the switch from what looks like a linked list implementation to a vector instead? I don't see a reason that that change is required for the mutex one; am I missing anything?

@Boshen
Copy link
Contributor Author

Boshen commented Feb 17, 2023

I read through the changes and they look sensible to me. I have one question: if the focus of this PR is about switching from a global mutex to a mutex per bucket, why the switch from what looks like a linked list implementation to a vector instead? I don't see a reason that that change is required for the mutex one; am I missing anything?

It was because I had trouble with adding Mutex onto the linked list. Now you have brought this up, should I switch it to std::collections::LinkedList?

@Boshen
Copy link
Contributor Author

Boshen commented Feb 17, 2023

Or maybe std::collections::VecDeque? Given the LinkedList specifically doc notes

NOTE: It is almost always better to use Vec or VecDeque because array-based containers are generally faster, more memory efficient, and make better use of CPU cache.

@Boshen
Copy link
Contributor Author

Boshen commented Feb 17, 2023

I profiled the three implementations (hand-rolled linked list vs VecDeque vs Vec). Vec seems to have the best performance.

image

@Boshen
Copy link
Contributor Author

Boshen commented Feb 17, 2023

This Vec version seems flawed. I ran my application for a few dozen times and it occasionally reported false postives (not getting the correct string), but I can't find where the bug is.

Let me switch this PR back to the slower but correct hand-rolled linked list version.
And we can iterate from here.

@Boshen
Copy link
Contributor Author

Boshen commented Feb 17, 2023

@jdm Would you like to take another look?

@YoniFeng
Copy link
Contributor

YoniFeng commented Feb 17, 2023

This Vec version seems flawed. I ran my application for a few dozen times and it occasionally reported false postives (not getting the correct string), but I can't find where the bug is.

I don't understand what the "reported false positives" means. However, I did notice a subtle difference between your Vec implementation and the existing, so maybe this helps:
When inserting, in the case where entry.ref_count.fetch_add(1, SeqCst) returns 0, you push the new entry to the vector (i.e. to the back/end) whereas the existing code adds the entry to the head of the linked list.
This means it's possible for the next insert() usage to again find() the entry that is about to be removed, and we'll end up inserting another duplicate entry (i.e. an unintended 3rd, 4th etc.).
With the existing implementation, the next insert() is guaranteed to find() the new entry that was added since it's at the front.

If this can explain the issue you saw with the Vec version, I think you should feel comfortable using a VecDeque and using push_front() (for the better performance).

hash,
ref_count: AtomicIsize::new(1),
string: string.into_boxed_str(),
});
let ptr = NonNull::from(&mut *entry);
self.buckets[bucket_index] = Some(entry);

vec.push(entry);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existing code adds the entry to the head of the linked list.
This opens the possibility for the next insert() usage to again find() the entry that is about to be removed, and we'll end up inserting another duplicate entry (i.e. an unintended 3rd, 4th etc.).
With the existing implementation, the next insert() is guaranteed to find() the new entry that was added (and increment its ref count) since it's at the front.

Copy link
Member

@jdm jdm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for reducing the scope of this change! Would you mind squashing these commits?

This implementation uses bucket level mutex with linear probing.
@Boshen Boshen force-pushed the bucket-mutex-upstream branch from 73d5cfd to b473a4a Compare February 20, 2023 01:31
@Boshen
Copy link
Contributor Author

Boshen commented Feb 20, 2023

Thanks for reducing the scope of this change! Would you mind squashing these commits?

@jdm Done.

@jdm
Copy link
Member

jdm commented Feb 20, 2023

@bors-servo r+

@bors-servo
Copy link
Contributor

📌 Commit b473a4a has been approved by jdm

@bors-servo
Copy link
Contributor

⌛ Testing commit b473a4a with merge 51743a7...

@bors-servo
Copy link
Contributor

☀️ Test successful - checks-github
Approved by: jdm
Pushing 51743a7 to master...

@bors-servo bors-servo merged commit 51743a7 into servo:master Feb 20, 2023
@Boshen Boshen deleted the bucket-mutex-upstream branch February 20, 2023 14:38
@kdy1
Copy link
Contributor

kdy1 commented Feb 22, 2023

Can you publish a new version, please?

@mrobinson
Copy link
Member

Can you publish a new version, please?

I've opened a PR to bump the version here: #270

@kdy1
Copy link
Contributor

kdy1 commented Feb 22, 2023

Thank you!

@mrobinson
Copy link
Member

https://crates.io/crates/string-cache/0.8.5 is now available on crates.io.

kdy1 added a commit to dudykr/stc that referenced this pull request Feb 22, 2023
**Description:**

I want to check the performance difference caused by servo/string-cache#268.
bors-servo added a commit that referenced this pull request Feb 23, 2023
Revert trivial impl of Borrow<str> for Atom

#266 caused a breaking change (see #271) in 0.8.5 (which was yanked).
This PR fixes #271 / will allow publishing new versions (so #268 can make it out).

I've just started learning Rust, haven't been able to 100% wrap my head around the issue.

The breaking change manifesting in the "browserlist-rs" crate:
```rust
error[E0283]: type annotations needed
  --> {.. snip ..}\browserslist-rs-0.12.3\src\data\caniuse.rs:91:35
   |
91 |     let chrome = CANIUSE_BROWSERS.get(&"chrome".into()).unwrap();
   |                                   ^^^ ---------------- type must be known at this point
   |                                   |
   |                                   cannot infer type of the type parameter `Q` declared on the associated function `get`
   |
   = note: multiple `impl`s satisfying `Atom<BrowserNameAtomStaticSet>: Borrow<_>` found in the following crates: `core`, `string_cache`:
           - impl<Static> Borrow<str> for Atom<Static>
             where Static: StaticAtomSet;
           - impl<T> Borrow<T> for T
             where T: ?Sized;
note: required by a bound in `AHashMap::<K, V, S>::get`
  --> {.. snip ..}\ahash-0.7.6\src\hash_map.rs:81:12
   |
81 |         K: Borrow<Q>,
   |            ^^^^^^^^^ required by this bound in `AHashMap::<K, V, S>::get`
help: consider specifying the generic argument
   |
91 |     let chrome = CANIUSE_BROWSERS.get::<Q>(&"chrome".into()).unwrap();
   |                                      +++++
```

`CANIUSE_BROWSERS` is a
```rust
AHashMap<BrowserNameAtom, BrowserStat>
```

where `BrowserNameAtom` is an Atom generated with `string_cache_codegen`.

The signature of the `get` function:
```rust
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where
        K: Borrow<Q>,
        Q: Hash + Eq,
    {
        self.0.get(k)
    }
```

`K` is the generic key type for the hash map.
This is the exact same signature as the std lib HashMap, to which it delegates.

When I debug 0.8.4 usage locally, the `From` that gets used for the `into()` is
```rust
impl<'a, Static: StaticAtomSet> From<&'a str> for Atom<Static> {
    #[inline]
    fn from(string_to_add: &str) -> Self {
        Atom::from(Cow::Borrowed(string_to_add))
    }
}
```

It isn't clear to me how the `Borrow<str>` impl that was added "competes" with the std lib's implementation here.
It doesn't seem relevant?
bors-servo added a commit that referenced this pull request Feb 23, 2023
Revert trivial impl of Borrow<str> for Atom

#266 caused a breaking change (see #271) in 0.8.5 (which was yanked).
This PR fixes #271 / will allow publishing new versions (so #268 can make it out).

I've just started learning Rust, haven't been able to 100% wrap my head around the issue.

The breaking change manifesting in the "browserlist-rs" crate:
```rust
error[E0283]: type annotations needed
  --> {.. snip ..}\browserslist-rs-0.12.3\src\data\caniuse.rs:91:35
   |
91 |     let chrome = CANIUSE_BROWSERS.get(&"chrome".into()).unwrap();
   |                                   ^^^ ---------------- type must be known at this point
   |                                   |
   |                                   cannot infer type of the type parameter `Q` declared on the associated function `get`
   |
   = note: multiple `impl`s satisfying `Atom<BrowserNameAtomStaticSet>: Borrow<_>` found in the following crates: `core`, `string_cache`:
           - impl<Static> Borrow<str> for Atom<Static>
             where Static: StaticAtomSet;
           - impl<T> Borrow<T> for T
             where T: ?Sized;
note: required by a bound in `AHashMap::<K, V, S>::get`
  --> {.. snip ..}\ahash-0.7.6\src\hash_map.rs:81:12
   |
81 |         K: Borrow<Q>,
   |            ^^^^^^^^^ required by this bound in `AHashMap::<K, V, S>::get`
help: consider specifying the generic argument
   |
91 |     let chrome = CANIUSE_BROWSERS.get::<Q>(&"chrome".into()).unwrap();
   |                                      +++++
```

`CANIUSE_BROWSERS` is a
```rust
AHashMap<BrowserNameAtom, BrowserStat>
```

where `BrowserNameAtom` is an Atom generated with `string_cache_codegen`.

The signature of the `get` function:
```rust
pub fn get<Q: ?Sized>(&self, k: &Q) -> Option<&V>
    where
        K: Borrow<Q>,
        Q: Hash + Eq,
    {
        self.0.get(k)
    }
```

`K` is the generic key type for the hash map.
This is the exact same signature as the std lib HashMap, to which it delegates.

When I debug 0.8.4 usage locally, the `From` that gets used for the `into()` is
```rust
impl<'a, Static: StaticAtomSet> From<&'a str> for Atom<Static> {
    #[inline]
    fn from(string_to_add: &str) -> Self {
        Atom::from(Cow::Borrowed(string_to_add))
    }
}
```

It isn't clear to me how the `Borrow<str>` impl that was added "competes" with the std lib's implementation here.
It doesn't seem relevant?
bors-servo added a commit that referenced this pull request Mar 5, 2023
fix: move debug_assert check

Fix needed after #268
The fix is only for debug_assert correctness. There's no functional effect.
i.e. version 0.8.6 might cause debug tests to fail, but it isn't a (functionally) breaking change.

**Explanation**
Moving the lock to be per-bucket changed the `DYNAMIC_SET`'s API so that it doesn't need to be locked (i.e. `DYANMIC_SET` is not wrapped with a `Mutex`).
The `Atom`'s `fn drop` changed from
```rust
fn drop(&mut self) {
        if self.tag() == DYNAMIC_TAG {
            let entry = self.unsafe_data.get() as *const Entry;
            if unsafe { &*entry }.ref_count.fetch_sub(1, SeqCst) == 1 {
                drop_slow(self)
            }
        }

        // Out of line to guide inlining.
        fn drop_slow<Static>(this: &mut Atom<Static>) {
            DYNAMIC_SET
                .lock()
                .remove(this.unsafe_data.get() as *mut Entry);
        }
    }
```
to
```rust
impl<Static> Drop for Atom<Static> {
    #[inline]
    fn drop(&mut self) {
        if self.tag() == DYNAMIC_TAG {
            let entry = self.unsafe_data.get() as *const Entry;
            if unsafe { &*entry }.ref_count.fetch_sub(1, SeqCst) == 1 {
                drop_slow(self)
            }
        }

        // Out of line to guide inlining.
        fn drop_slow<Static>(this: &mut Atom<Static>) {
            DYNAMIC_SET.remove(this.unsafe_data.get() as *mut Entry);
        }
    }
```
(the `lock()` is gone)
Now when we enter `DYNAMIC_SET.remove()`, there's no longer a lock guarantee.
Until we lock the bucket, another thread could be in the middle of performing a `DYNAMIC_SET.insert()` for the same string.
Therefore, the `debug_assert!(value.ref_count.load(SeqCst) == 0)` is premature - it needs to occur after we take the lock for the bucket.

Caught at swc-project/swc#6980 in a [failed test](https://github.com/swc-project/swc/actions/runs/4326536698/jobs/7554083939).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants