-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extend entry API to work on borrowed keys. #1769
Conversation
text/0000-entry-into-owned.md
Outdated
// ... | ||
|
||
*string_map.entry("foo").or_insert(0) += 1; // Clones if "foo" not in map. | ||
*string_map.entry("bar".to_string()) += 1; // By-value, never clones. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: is this missing .or_insert(0)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, thanks!
33f259d
to
6952692
Compare
This looks like a good way to avoid yet more clones while making |
@llogiq I added a counterpoint to the 'Drawbacks' section that Also added drawback pointing out that impl would be insta-stable. |
I've only given this a cursory glance, so maybe I do not understand it yet, but.. There are plenty of situations where delaying the duplication of the key sounds useful, but also many just call for better use of hash functions too, i.e. If the key value still exists elsewhere, then you likely only need the hash, not the key itself. Or maybe that's kinda what you're effectively trying to do here? |
@burdges I'm not sure I follow what you're saying:
Are you suggesting storing only the hash in the map? The map needs to own the key itself if on insertion, regardless of whether the it exists elsewhere or not. This is to be able to compare for equality when hashes match. Maybe I'm misunderstanding? |
This concern has now been included in the RFC, you can skip reading this comment
Ok, so I ran into trouble: the `{Vacant,Occupied,}Entry` types have a `.key() -> &K` method on them. My proposed implementation would have stored a `Q: IntoOwned` in the entries rather than a `K`.
I can see two options, both far from perfect:
I'd really appreciate some ideas around this. |
@cristicbz Can't You could add additional helper methods for accessing the key when |
@Diggsey Ah yes! That's a good point, using Right now I'm running into trouble over the comparison implementation. I sort of waved my hand in the POC, by always performing It feels like there should be a natural way of extending the ability to do |
In hind sight, I think my comment is kinda off topic for this pull request @cristicbz . I said basically : If you need this for performance, then maybe you should use probabilistic data structures more aggressively instead. In particular, you might use a data structure and algorithm where you can proceed if the hashes match and need not worry about the actual key being correct. Rust is kinda lacking on probabilistic data structures though, and they always require more detailed analysis, including estimating the expected table size in advance. It's clear someone might want this optimization without going through the hassle of tweaking cuckoo hash table parameters or something. |
Actually, we don't need The only problem is that people might accidentally pass in e.g. And yes, |
@Kixunil I tired to explain the problem with The gist is it requires clone-able keys. So it'd work fine for
That's roughly what the impl
expresses. |
This solution has now been simplified and included in the RFC, you can skip reading this comment
[New playground link!](https://play.rust-lang.org/?gist=97b23ca418b29b0009fbfab2299874f3&version=nightly&backtrace=1)
Ok, so I think I worked out a solution for the pub trait AsBorrowOf<T, B: ?Sized>: IntoOwned<T> where T: Borrow<B> {
fn as_borrow_of(&self) -> &B;
}
impl<T> AsBorrowOf<T, T> for T {
default fn as_borrow_of(&self) -> &Self {
self
}
}
impl<'a, B: ToOwned + ?Sized> AsBorrowOf<B::Owned, B> for &'a B {
default fn as_borrow_of(&self) -> &B {
*self
}
} Then, the signature of #[stable(feature = "rust1", since = "1.0.0")]
pub fn entry<Q, B: ?Sized>(&mut self, key: Q) -> Entry<Q, V, K>
where Q: AsBorrowOf<K, B>,
K: Borrow<B>,
B: Hash + Eq {
// Gotta resize now.
self.reserve(1);
self.search_mut(key.as_borrow_of()).into_entry(key).expect("unreachable")
} The trick is to not ever use There are two remaining issues as far as I can tell:
let hash_map = HashMap::new();
*hash_map.entry("hello").or_insert(0u64) += 1; currently the inference
fn generic<K: Hash + Eq + Borrow<Q>, V, Q>(map: &mut HashMap<K, V>, key: K) {
map.entry(key)
} but after changing the signature of fn generic<K: Hash + Eq + Borrow<Q> + Borrow<K>, V, Q>(map: &mut HashMap<K, V>, key: K) {
map.entry(key)
} I'll update the RFC with this information and add a link with a preliminary rustc branch tomorrow. |
I wonder if there are specialization extensions which would enable the approach ruled out at the beginning, rather than adding |
@cristicbz oh, seems like I skimmed through it too fast. Sorry. Anyway, I think |
@withoutboats impl<'a, T: ?Sized> From<&'a T> for T::Owned where T: ToOwned { /* ... */ } (edit: I fixed the above signature which was broken before) Not sure if there is any reasonable way of adding this backwards compatibly (even with specialization). But maybe there is, there's a lot I don't get about how far we can push coherence. I don't know if we can get around expressing somehow though Maybe some arcane magick exists to sort this out some other way. The counterpoint to 'identical traits with untyped contracts' is that they allow distinguishing between semantics (referring to |
This solution is now the main version proposed in the RFC, you can skip reading this comment
So it turns out these three traits can all be compacted into a single one without specialisation:
pub trait AsBorrowOf<T, B: ?Sized>: Sized where T: Borrow<B> {
fn into_owned(self) -> T;
fn as_borrow_of(&self) -> &B;
}
impl<T> AsBorrowOf<T, T> for T {
fn into_owned(self) -> T { self }
fn as_borrow_of(&self) -> &Self {
self
}
}
impl<'a, B: ToOwned + ?Sized> AsBorrowOf<B::Owned, B> for &'a B {
fn into_owned(self) -> B::Owned { self.to_owned() }
fn as_borrow_of(&self) -> &B {
*self
}
} The meaning of
For
It is annoying that for a given |
740c4f4
to
9e35d5e
Compare
OK now that I've written an implementation and found a bunch of issues, I updated the RFC to address them and moved to the single-trait-no-specialization |
I added an analysis of the crater run results. Manually went through each one, created a minimal test case and described the causes and results. In short, there are 9 irreconcilable regressions caused by actual inference failures introduced by widening the allowed argument types of
Instances of rust-lang/rust#37164 and rust-lang/rust#37138 appear once each. |
@Skrapion I think that case would be supported with the proposed API by adding a |
@aturon @withoutboats I'd love to get this going again, what do you think of the |
@cristicbz Gah, I'm incredibly sorry I've let this sit idle for so long again; it just never reaches the top of my stack. Please feel free to ping me on IRC or Twitter to get my attention back on it. I will push toward a resolution ASAP. |
Ok I've been roped into helping out here. Disclaimer: I am so bloody sick of this problem and I am fairly certain it will haunt me on my death bed. As such I am inclined to pick the maximally flexible solution, no matter how ugly it is. So presumably the second option in #1769 (comment) People will ask what the heck is up with it, and we'll shrug and say "computers are bad". With that out of the way, I would like to discuss where we're heading with this in bigger picture terms. This is basically a big aside, and everyone can ignore it if they want. Again, I want this to all go away, and the choice I mentioned above does that... for now. So the Entry API was borne of two things:
In solving this issue, we took on several other constraints:
Constraint (3) mostly meant we won't let you give one thing for lookup, and then another thing for the actual insertion. I fear this constraint has led us awry, and caused us to sacrifice too much of (2). As a result later generations ran into several issues in serving (3).
Which is what this RFC is trying to address. However I forsee further complaints that this still isn't addressing these other common complaints:
I would like to propose a rejection of (3), producing an incredibly flexible API that you can legitimately misuse and lose your keys with. It is a three-phase entry: let map = ...;
let key = ...;
// Start the algorithm
let raw_entry = map.raw_entry();
// First do hashing (can memoize)
let hashed_entry = raw_entry.hash_with(|hasher| -> u64 {
key.handle_hashing(hasher)
});
// Then search, transforming both keys however you need
let searched_entry = hashed_entry.search_with(|found_key| -> bool {
key.field == &**found_key.whatever
);
// And finally handle the entry
let val = match searched_entry {
Occupied(entry) => entry.into_mut(),
// And you can just give the dang key here
Vacant(entry) => entry.set(key.clone(), 0),
}
*val += 1; I believe this would handle everyone's usecase for this nonsense from now until eternity. It would be as efficient as possible. It would put no constraints on implementations. And it lets you misuse it as much as you please. (BTreeMap would just remove the hash_with part, and search_with would be -> Ordering) We could also add conveniences so you can do |
@gankro When you say 2nd option, to be maximally clear, do you specifically mean Regarding your sketch proposal, it sounds a little like #1533. My personal preference would be for something like that to be added in_addition_ to this RFC. For the common case of borrow+clone (like word count, or for caching) , raw_entry feels like dropping too many levels of abstraction, but as you point out, examples like @shepmaster 's would still need the power provided by it. On mobile and thumbs hurt, but thanks a lot for breathing in some new life into this RFC ; it would make me so happy to see it resolved one way or another. |
I was suggesting taking "entry_or_clone with explicit Query bound, Query without ToOwned blanket" (I would call it cow_entry/entry_cow, but that's a minor quibble). It's ugly, but it gives our users the the most (I think? there's a lot of moving parts and it's hard to keep them all in my head at once.) I agree that landing this on its own is probably fine. |
@cristicbz So I spent some time re-reading the thread, still haunted by the feeling that there must be a way we can do better. I wonder whether we could consider deprecating |
I've been travelling, so sorry if I'm slow to reply. Two things here:
It's a bit laborious, but I'd need to experiment to see how such an impl would pan out. |
While far from addressing all the issues this RFC addressed, I submitted a temporary fix (non-breaking) for some of the issues with Entry: rust-lang/rust#44278 |
Sadly, I do not have the resources I had when I initially wrote this RFC and I have left it in lmbo for way too long a time. I appreciate everyone's time and feedback and I think some really useful research came out of this, but it doesn't seem like the initially proposed solution is good enough to get merged. Rather than keep this RFC open and modifying it many times asynchronously, it'd better for someone else to come with a fresh perspective and improve on the ideas from this conversation, crystallising them into a new RFC. Then it would be easier to request a fresh batch of comments and feedback from the community on that new design. |
Just as an aside, I'd love to see (a) more probabilistic data structures implemented in Rust and (b) an effort to unify their interface in some trait hierarchy, but with both happening outside the core language. |
Rendered
Playground
Prototype Implementation
Analysis of a Preliminary Crater Run
Alternative to #1203 and #1533 .
cc @aturon @gankro @gereeter
(@aturon this works around the coherence issues we were talking about so it's fully general!)