Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add bit removal methods to SparseBitMatrix and factor *BitSet relational methods into more extensible trait #88272

Merged
merged 19 commits into from
Sep 1, 2021

Conversation

willcrichton
Copy link
Contributor

I need the ability to clear the bits out of a row from SparseBitMatrix. Currently, all the mutating methods only allow insertion of bits, and there is no way to get access to the underlying data.

One approach is simply to make ensure_row public, since it grants &mut access to the underlying HybridBitSet. This PR adds the pub modifier. However, presumably this method was private for a reason, so I'm open to other designs. I would prefer general mutable access to the rows, because that way I can add many mutating operations (clear, intersect, etc.) without filing a PR each time :-)

r? @ecstatic-morse

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 23, 2021
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Aug 24, 2021

row is already public, so the fact that each row is stored as a HybridBitSet is already exposed to consumers, and this change doesn't break encapsulation. However, it still feels a bit wrong to me. Calling ensure_row may cause the rows vector to grow, which isn't even necessary if you're just clearing bits.

Is there anything you could need beyond clear, intersect and remove_bit? They won't be too much effort to implement in a way that handles empty rows gracefully, and they might benefit other rustc users someday. I'd prefer you go that route if possible, but this is pretty low stakes.

@willcrichton
Copy link
Contributor Author

I could implement those methods, although they would still end up calling ensure_row internally. An alternative would be to add the following to parallel SparseBitMatrix::row:

fn row_mut(&mut self, row: R) -> Option<&mut HybridBitSet<C>>;

This eliminates the weird side effect, and also matches the existing API. But if you have any preference towards lifting the set methods to matrix, I can do that instead.

@ecstatic-morse
Copy link
Contributor

My whole point is that they would not call ensure_mut internally. There's no need to ensure that a row has backing storage just to intersect with or clear it.

@willcrichton
Copy link
Contributor Author

Ah I understand, I can do that.

One other question. For operations like intersect and subtract, would you like me to implement those methods on SparseBitSet / HybridBitSet first, and then extend them to SparseBitMatrix? I've already had to implement these methods in my own codebase, but they seem useful to have in rustc proper: https://github.com/willcrichton/flowistry/blob/032a5982984291b3d1e791706d0607b7be09410c/src/core/indexed.rs#L204-L234

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Aug 24, 2021

One other question. For operations like intersect and subtract, would you like me to implement those methods on SparseBitSet / HybridBitSet first, and then extend them to SparseBitMatrix? I've already had to implement these methods in my own codebase, but they seem useful to have in rustc proper: https://github.com/willcrichton/flowistry/blob/032a5982984291b3d1e791706d0607b7be09410c/src/core/indexed.rs#L204-L234

Yes, I think that's the right path, although it means handling all combinations in {SparseBitSet, BitSet}^2. I thought we did this already, but I guess I was wrong. This is useful, but a bit more work than I was expecting. I wonder if some parts of rustc convert to a dense BitSet just to do subtraction?

You probably know this, but an in-tree impl can take advantage of the fact that SparseBitSet is in sorted order, which means you can do both intersect and subtract in a single pass with ArrayVec::retain.

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Aug 24, 2021

If you don't want to make the changes to HybridBitSet, I'm willing to sign off on the visibility change for ensure_row as long as you add some documentation and a FIXME noting that it should be removed in favor of subtract_row, clear_row, etc. when the corresponding methods are available on HybridBitSet.

Ideally we would make the HybridBitSet changes now, though. I'm happy to write up mentoring instructions, since there's a few helper traits (SubtractFromBitSet) that should be rewritten into more general forms.

Let me know which option you prefer.

@willcrichton
Copy link
Contributor Author

I'm happy to do the broader HybridBitSet changes. Let me know what additional changes you'd like (eg helper traits) and I can do it today / tomorrow.

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Aug 24, 2021

Alright, so we have three different bitset types---SparseBitSet, BitSet, and HybridBitSet (the union of the other two). clear, insert and remove can be trivially implemented for all of them (and mostly already have been). We also have three binary operations we want to implement---intersection, union and subtraction. The current approach uses helper traits to make these operations generic over the different bitsets (intersection is missing at the moment):

/// This is implemented by all the bitsets so that BitSet::union() can be
/// passed any type of bitset.
pub trait UnionIntoBitSet<T: Idx> {
// Performs `other = other | self`.
fn union_into(&self, other: &mut BitSet<T>) -> bool;
}
/// This is implemented by all the bitsets so that BitSet::subtract() can be
/// passed any type of bitset.
pub trait SubtractFromBitSet<T: Idx> {
// Performs `other = other - self`.
fn subtract_from(&self, other: &mut BitSet<T>) -> bool;
}

We want to change these to make both sides of the operation generic, just like (for example) ops::AddAssign. Note that the existing impls put the mutable argument on the right, unlike AddAssign. You can leave this as is, or change it, up to you. This will require 3^2 = 9 impls, but only 4 of those are interesting: the other 5 simply dispatch to the appropriate variant of HybridBitSet. The majority of the interesting ones already exist in the code; you just have to collect them in the appropriate place. Don't worry about going from a dense bitset back to a sparse one.

Finally, you should expose the operations as inherent methods on each of the bitset types, just like the current code does.

Once all that is done, it should be easy to add generic versions of intersect_row, subtract_row, and union_row to SparseBitMatrix, as well as remove and clear (which don't require any of the helper trait changes). You should avoid calling ensure_row unless its actually necessary. You'll have a small problem with the lazy, generic version of union_row, but you'll find a solution 😄 .

Let me know if any of that is unclear, or if you have a better approach. And if you want to just do the ensure_row thing so you can continue working on your project, just say the word. This was a bit more involved than I had first thought. 😆

@inquisitivecrystal inquisitivecrystal added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Aug 24, 2021
@willcrichton
Copy link
Contributor Author

I'm trying to understand the reason for having separate a UnionIntoBitSet::union_into trait method and BitSet::union inherent method. Is it for convenience so downstream users don't have to import the UnionIntoBitSet trait? As opposed to just having users call union_into directly.

@ecstatic-morse
Copy link
Contributor

That's correct.

@willcrichton
Copy link
Contributor Author

Still need to add comments and tests, but @ecstatic-morse can you let me know if this general design looks right to you? In short:

  • I consolidated the union/subtract (+ intersect) traits into a single BitRelations trait (otherwise there was an ungodly number of impl blocks).
  • I added a bit_relations_inherent_impls macro that adds inherent implementations of each trait method to all three BitSet types.
  • I factored out all the existing union/subtract/intersect logic into impls for these traits. The only impls I'm not sure about are the subtract/intersect impls for HybridBitSet -- I'm not sure if you want to do something especially efficient in these cases like for union.
  • I added all the new methods to SparseBitMatrix.

@rust-log-analyzer

This comment has been minimized.

Copy link
Contributor

@ecstatic-morse ecstatic-morse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @willcrichton!

I left a few notes. There's an issue I didn't foresee with the SparseBitSet impls, which is going to cause problems, but for the most part this looks good.

compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
compiler/rustc_index/src/bit_set.rs Show resolved Hide resolved
compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
@willcrichton willcrichton changed the title Make SparseBitMatrix::ensure_row public to enable general mutation of rows Add bit removal methods to SparseBitMatrix and factor *BitSet relational methods into more extensible trait Aug 25, 2021
Copy link
Contributor

@ecstatic-morse ecstatic-morse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Almost there. All of my old review comments have been resolved satisfactorily. There's a bug and an unnecessary dense BitSet allocation in the new code, however.

Can I ask for some unit tests for the new impls? I'm worried that I might have missed something just from staring at it.

compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
@rust-log-analyzer

This comment has been minimized.

@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Aug 26, 2021

Is there a faster way to run the tests than this?

 ./x.py test --stage 0 compiler/rustc_index

This takes 10-15 minutes on my laptop, which is a drag...

@willcrichton, does running cargo test in rustc_index not work? It's not officially supported AFAIK, but works fine for small helper crates like this one.

@rust-log-analyzer

This comment has been minimized.

@willcrichton
Copy link
Contributor Author

@willcrichton, does running cargo test in rustc_index not work? It's not officially supported AFAIK, but works fine for small helper crates like this one.

:O I didn't know that! Works like a charm, thanks.

@ecstatic-morse
Copy link
Contributor

I usually run ./x.py fmt --check as a pre-push hook as well, it makes things go a bit smoother.

@willcrichton
Copy link
Contributor Author

@ecstatic-morse I think this is about good. One last implementation issue. In the case of intersect(HybridBitSet::Dense, BitSet) (or subtract), is it important to convert HybridBitSet::Dense back to Sparse if it's below the sparsity threshold?

Copy link
Contributor

@ecstatic-morse ecstatic-morse left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright! This is basically there. I left one nit and one question. Thanks for sticking with it @willcrichton. Really we should be using proptest or quickcheck for better test coverage, but I don't think we can do that in-tree. We have smoke tests at least, and I've looked things over as carefully as I can. Next time we'll just make the private method public 😄.

Regarding the dense -> sparse transformation, I don't think we should try hard to make it happen. It's not always more efficient, since it can lead to thrashing when we're adding/subtracting elements while close to the threshold. I think the dense/sparse intersect case is worth it, since the result is guaranteed to be below the sparse threshold and it saves us from traversing the dense bitset to clear it, but that's pretty much the only case.

compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
compiler/rustc_index/src/bit_set.rs Outdated Show resolved Hide resolved
@ecstatic-morse
Copy link
Contributor

@bors try
@rust-timer queue

Just in case we've eliminated an optimized code path or messed with inlining thresholds.

@rust-timer
Copy link
Collaborator

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 27, 2021
@bors
Copy link
Contributor

bors commented Aug 27, 2021

⌛ Trying commit 86bd551 with merge 988d819e13d73f484d8370357ba9d1f8696de90f...

@bors
Copy link
Contributor

bors commented Aug 27, 2021

☀️ Try build successful - checks-actions
Build commit: 988d819e13d73f484d8370357ba9d1f8696de90f (988d819e13d73f484d8370357ba9d1f8696de90f)

@rust-timer
Copy link
Collaborator

Queued 988d819e13d73f484d8370357ba9d1f8696de90f with parent 4a6547c, future comparison URL.

@rust-timer
Copy link
Collaborator

Finished benchmarking try commit (988d819e13d73f484d8370357ba9d1f8696de90f): comparison url.

Summary: This benchmark run did not return any relevant changes.

If you disagree with this performance assessment, please file an issue in rust-lang/rustc-perf.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf -perf-regression

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Aug 28, 2021
@ecstatic-morse
Copy link
Contributor

ecstatic-morse commented Aug 30, 2021

I gave this one last look today. Given the number of bugs found during the review process, it seemed wise. Everything (still) looks correct.

For posterity, in addition to the SparseBitMatrix changes, this PR extends HybridBitSet with in-place versions of the fundamental set operations. They aren't quite optimal, since they fail to take into account the sorted-ness of the elements in a sparse bitset, but should be no slower than the existing impls. As a result, we have the option to use HybridBitSet in more places (e.g. dataflow).

@bors r+

@bors
Copy link
Contributor

bors commented Aug 30, 2021

📌 Commit e340a0e has been approved by ecstatic-morse

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 30, 2021
@bors
Copy link
Contributor

bors commented Sep 1, 2021

⌛ Testing commit e340a0e with merge 608b5e1...

@bors
Copy link
Contributor

bors commented Sep 1, 2021

☀️ Test successful - checks-actions
Approved by: ecstatic-morse
Pushing 608b5e1 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Sep 1, 2021
@bors bors merged commit 608b5e1 into rust-lang:master Sep 1, 2021
@rustbot rustbot added this to the 1.56.0 milestone Sep 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants