Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add binary_search and friends #282

Merged
merged 6 commits into from
Oct 31, 2023
Merged

Conversation

Eh2406
Copy link
Contributor

@Eh2406 Eh2406 commented Oct 24, 2023

This is a start on fixing #281.

I copied the implementation of these methods from std, except for binary_search_by where I used std directly. Alternatively we could have all implementations directly forward to .as_slice().foobar or .entries.foobar, if you would prefer that.

I left off the plane binary_search{_keys} on set and map, because they already provide a O(1) way to look up the index for a key. I left them in on Slice because they do not.

For documentation I adapted the first sentence of std, and provided a link to std.

how would you like to see these tested?

@cuviper
Copy link
Member

cuviper commented Oct 25, 2023

Alternatively we could have all implementations directly forward to .as_slice().foobar or .entries.foobar, if you would prefer that.

I would, yeah, at least for partition_point since that's less trivial. The rest can go either way.

I left off the plane binary_search{_keys} on set and map, because they already provide a O(1) way to look up the index for a key. I left them in on Slice because they do not.

That's an interesting point! I wonder if small enough maps/sets might still win by binary search, but if you're too small then even a linear search might be fastest. But anyway, if someone really wants it they can use binary_search_by the same way our impl would, and we could choose to add it later.

how would you like to see these tested?

We haven't been great about that, but let's at least add sanity checks to map/tests.rs and set/tests.rs. But we don't need to be very thorough when it's relying directly on the standard library, and then type checking already proves the remaining differences.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Oct 26, 2023

I will work on it. I have PackagingCon for then next few days a may not have much time.

@Eh2406
Copy link
Contributor Author

Eh2406 commented Oct 26, 2023

If I had been more awake you would have gone shorter tests. :-P I was not up to determining interesting test cases. So instead you got tests from STD copied in and adapted. They may be more thorough than you asked for.

@cuviper
Copy link
Member

cuviper commented Oct 27, 2023

Same vibe as "If I had more time, I would have written a shorter letter." :)

It's a lot, but that's fine.

Copy link
Member

@cuviper cuviper left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks - the implementation looks good, just need some cleanup in the docs and a few comments.

src/map.rs Outdated Show resolved Hide resolved
src/map.rs Outdated Show resolved Hide resolved
src/map/slice.rs Outdated Show resolved Hide resolved
src/map/tests.rs Outdated Show resolved Hide resolved
src/set/tests.rs Outdated Show resolved Hide resolved
@cuviper cuviper linked an issue Oct 31, 2023 that may be closed by this pull request
@cuviper
Copy link
Member

cuviper commented Oct 31, 2023

I left off the plane binary_search{_keys} on set and map, because they already provide a O(1) way to look up the index for a key. I left them in on Slice because they do not.

That's an interesting point! I wonder if small enough maps/sets might still win by binary search, but if you're too small then even a linear search might be fastest. But anyway, if someone really wants it they can use binary_search_by the same way our impl would, and we could choose to add it later.

Actually, there's still a gap there because binary search also finds the insertion point for missing items. So I think it's good to have the doc comment about scalability, but we should still want the map/set methods for the Err case.

I'll go ahead and add this myself while I'm thinking about it...

While `IndexMap::binary_search_keys` and `IndexSet::binary_search` don't
scale as well as their corresponding `get_index_of`, O(log n) vs. O(1),
they do still offer the benefit of returning an `Err` insertion point.
@cuviper
Copy link
Member

cuviper commented Oct 31, 2023

Thanks again! I'll put together a release soon.

@cuviper cuviper merged commit cd641d2 into indexmap-rs:master Oct 31, 2023
14 checks passed
@cuviper
Copy link
Member

cuviper commented Nov 2, 2023

FYI, I did publish this as 2.1.0 the other day.

I'd be interested to see where you use this, if that code is open.

@Eh2406 Eh2406 deleted the binary_search branch November 2, 2023 19:12
@Eh2406
Copy link
Contributor Author

Eh2406 commented Nov 2, 2023

It is for https://github.com/pubgrub-rs/pubgrub, although the particular performance experiments where this would be useful have not yet pulled their weight. Many data structures have to "backtrack" to reset their state as of some previous point in time. For each individual data structure there are many ways this can be achieved.

  • The data structure can be "cleared" and reconstructed from other data sources.
  • Items in the data structure can include a "timestamp" and then the reset is .retain(|_, v| v.timestamp < reset_to_timestamp)
  • The data structure can be maintained in an ordered fashion and reset by truncating newer modifications. If you need to scan the entire data structure to decide where to cut, then this is the same as the previous. You can have an auxiliary data structure recording timestamp -> cutpoint, for O(1) time reset but additional allocations. Using partition point to make a cut, gives O(ln(size)) reset with no additional space. Which sure seems tempting.

The particular example of this pattern that inspired the PR was contradicted_incompatibilities, which currently uses the first strategy but could easily use the last one. I will definitely drop you a link when/if progress toward using this PR is achieved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

binary_search and partition_point on Slice
2 participants