-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collection Views #216
Merged
Merged
Collection Views #216
Changes from all commits
Commits
Show all changes
6 commits
Select commit
Hold shift + click to select a range
ca618d8
Create 0000-collection-views.md
Gankra 6ee0ee0
Update 0000-collection-views.md
Gankra 873c1ad
Update 0000-collection-views.md
Gankra 26328ed
Update 0000-collection-views.md
Gankra 9a77de8
Learn 2 code
Gankra b175db9
Major rewrite based on enum design and comments
Gankra File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,205 @@ | ||
- Start Date: 2014-08-28 | ||
- RFC PR: (leave this empty) | ||
- Rust Issue: (leave this empty) | ||
|
||
# Summary | ||
|
||
Add additional iterator-like Entry objects to collections. | ||
Entries provide a composable mechanism for in-place observation and mutation of a | ||
single element in the collection, without having to "re-find" the element multiple times. | ||
This deprecates several "internal mutation" methods like hashmap's `find_or_insert_with`. | ||
|
||
# Motivation | ||
|
||
As we approach 1.0, we'd like to normalize the standard APIs to be consistent, composable, | ||
and simple. However, this currently stands in opposition to manipulating the collections in | ||
an *efficient* manner. For instance, if one wishes to build an accumulating map on top of one | ||
of the concrete maps, they need to distinguish between the case when the element they're inserting | ||
is *already* in the map, and when it's *not*. One way to do this is the following: | ||
|
||
``` | ||
if map.contains_key(&key) { | ||
let v = map.find_mut(&key).unwrap(); | ||
let new_v = *v + 1; | ||
*v = new_v; | ||
} else { | ||
map.insert(key, 1); | ||
} | ||
``` | ||
|
||
However, searches for `key` *twice* on every operation. | ||
The second search can be squeezed out the `update` re-do by matching on the result | ||
of `find_mut`, but the `insert` case will always require a re-search. | ||
|
||
To solve this problem, Rust currently has an ad-hoc mix of "internal mutation" methods which | ||
take multiple values or closures for the collection to use contextually. Hashmap in particular | ||
has the following methods: | ||
|
||
``` | ||
fn find_or_insert<'a>(&'a mut self, k: K, v: V) -> &'a mut V | ||
fn find_or_insert_with<'a>(&'a mut self, k: K, f: |&K| -> V) -> &'a mut V | ||
fn insert_or_update_with<'a>(&'a mut self, k: K, v: V, f: |&K, &mut V|) -> &'a mut V | ||
fn find_with_or_insert_with<'a, A>(&'a mut self, k: K, a: A, found: |&K, &mut V, A|, not_found: |&K, A| -> V) -> &'a mut V | ||
``` | ||
|
||
Not only are these methods fairly complex to use, but they're over-engineered and | ||
combinatorially explosive. They all seem to return a mutable reference to the region | ||
accessed "just in case", and `find_with_or_insert_with` takes a magic argument `a` to | ||
try to work around the fact that the *two* closures it requires can't both close over | ||
the same value (even though only one will ever be called). `find_with_or_insert_with` | ||
is also actually performing the role of `insert_with_or_update_with`, | ||
suggesting that these aren't well understood. | ||
|
||
Rust has been in this position before: internal iteration. Internal iteration was (author's note: I'm told) | ||
confusing and complicated. However the solution was simple: external iteration. You get | ||
all the benefits of internal iteration, but with a much simpler interface, and greater | ||
composability. Thus, this RFC proposes the same solution to the internal mutation problem. | ||
|
||
# Detailed design | ||
|
||
A fully tested "proof of concept" draft of this design has been implemented on top of hashmap, | ||
as it seems to be the worst offender, while still being easy to work with. You can | ||
[read the diff here](https://github.com/Gankro/rust/commit/39a1fa7c7362a3e22e59ab6601ac09475daff39b). | ||
|
||
All the internal mutation methods are replaced with a single method on a collection: `entry`. | ||
The signature of `entry` will depend on the specific collection, but generally it will be similar to | ||
the signature for searching in that structure. `entry` will in turn return an `Entry` object, which | ||
captures the *state* of a completed search, and allows mutation of the area. | ||
|
||
For convenience, we will use the hashmap draft as an example. | ||
|
||
``` | ||
/// Get an Entry for where the given key would be inserted in the map | ||
pub fn entry<'a>(&'a mut self, key: K) -> Entry<'a, K, V>; | ||
|
||
/// A view into a single occupied location in a HashMap | ||
pub struct OccupiedEntry<'a, K, V>{ ... } | ||
|
||
/// A view into a single empty location in a HashMap | ||
pub struct VacantEntry<'a, K, V>{ ... } | ||
|
||
/// A view into a single location in a HashMap | ||
pub enum Entry<'a, K, V> { | ||
/// An occupied Entry | ||
Occupied(OccupiedEntry<'a, K, V>), | ||
/// A vacant Entry | ||
Vacant(VacantEntry<'a, K, V>), | ||
} | ||
``` | ||
|
||
Of course, the real meat of the API is in the View's interface (impl details removed): | ||
|
||
``` | ||
impl<'a, K, V> OccupiedEntry<'a, K, V> { | ||
/// Get a reference to the value of this Entry | ||
pub fn get(&self) -> &V; | ||
|
||
/// Get a mutable reference to the value of this Entry | ||
pub fn get_mut(&mut self) -> &mut V; | ||
|
||
/// Set the value stored in this Entry | ||
pub fn set(mut self, value: V) -> V; | ||
|
||
/// Take the value stored in this Entry | ||
pub fn take(self) -> V; | ||
} | ||
|
||
impl<'a, K, V> VacantEntry<'a, K, V> { | ||
/// Set the value stored in this Entry | ||
pub fn set(self, value: V); | ||
} | ||
``` | ||
|
||
There are definitely some strange things here, so let's discuss the reasoning! | ||
|
||
First, `entry` takes a `key` by value, because this is the observed behaviour of the internal mutation | ||
methods. Further, taking the `key` up-front allows implementations to avoid *validating* provided keys if | ||
they require an owned `key` later for insertion. This key is effectively a *guarantor* of the entry. | ||
|
||
Taking the key by-value might change once collections reform lands, and Borrow and ToOwned are available. | ||
For now, it's an acceptable solution, because in particular, the primary use case of this functionality | ||
is when you're *not sure* if you need to insert, in which case you should be prepared to insert. | ||
Otherwise, `find_mut` is likely sufficient. | ||
|
||
The result is actually an enum, that will either be Occupied or Vacant. These two variants correspond | ||
to concrete types for when the key matched something in the map, and when the key didn't, repsectively. | ||
|
||
If there isn't a match, the user has exactly one option: insert a value using `set`, which will also insert | ||
the guarantor, and destroy the Entry. This is to avoid the costs of maintaining the structure, which | ||
otherwise isn't particularly interesting anymore. | ||
|
||
If there is a match, a more robust set of options is provided. `get` and `get_mut` provide access to the | ||
value found in the location. `set` behaves as the vacant variant, but also yields the old value. `take` | ||
simply removes the found value, and destroys the entry for similar reasons as `set`. | ||
|
||
Let's look at how we one now writes `insert_or_update`: | ||
|
||
``` | ||
match map.entry(key) { | ||
Occupied(entry) => { | ||
let v = entry.get_mut(); | ||
let new_v = *v + 1; | ||
*v = new_v; | ||
} | ||
Vacant(entry) => { | ||
entry.set(1); | ||
} | ||
} | ||
``` | ||
|
||
One can now write something equivalent to the "intuitive" inefficient code, but it is now as efficient as the complex | ||
`insert_or_update` methods. In fact, this matches so closely to the inefficient manipulation | ||
that users could reasonable ignore Entries *until performance becomes an issue*. At which point | ||
it's an almost trivial migration. Closures also aren't needed to dance around the fact that one may | ||
want to avoid generating some values unless they have to, because that falls naturally out of | ||
normal control flow. | ||
|
||
If you look at the actual patch that does this, you'll see that Entry itself is exceptionally | ||
simple to implement. Most of the logic is trivial. The biggest amount of work was just | ||
capturing the search state correctly, and even that was mostly a cut-and-paste job. | ||
|
||
With Entries, the gate is also opened for... *adaptors*! | ||
Really want `insert_or_update` back? That can be written on top of this generically with ease. | ||
However, such discussion is out-of-scope for this RFC. Adaptors can | ||
be tackled in a back-compat manner after this has landed, and usage is observed. Also, this | ||
proposal does not provide any generic trait for Entries, preferring concrete implementations for | ||
the time-being. | ||
|
||
# Drawbacks | ||
|
||
* More structs, and more methods in the short-term | ||
|
||
* More collection manipulation "modes" for the user to think about | ||
|
||
* `insert_or_update_with` is kind of convenient for avoiding the kind of boiler-plate | ||
found in the examples | ||
|
||
# Alternatives | ||
|
||
* Just put our foot down, say "no efficient complex manipulations", and drop | ||
all the internal mutation stuff without a replacement. | ||
|
||
* Try to build out saner/standard internal manipulation methods. | ||
|
||
* Try to make this functionality a subset of [Cursors](http://discuss.rust-lang.org/t/pseudo-rfc-cursors-reversible-iterators/386/7), | ||
which would be effectively a bi-directional mut_iter | ||
where the returned references borrow the cursor preventing aliasing/safety issues, | ||
so that mutation can be performed at the location of the cursor. | ||
However, preventing invalidation would be more expensive, and it's not clear that | ||
cursor semantics would make sense on e.g. a HashMap, as you can't insert *any* key | ||
in *any* location. | ||
|
||
* This RFC originally [proposed a design without enums that was substantially more complex] | ||
(https://github.com/Gankro/rust/commit/6d6804a6d16b13d07934f0a217a3562384e55612). | ||
However it had some interesting ideas about Key manipulation, so we mention it here for | ||
historical purposes. | ||
|
||
# Unresolved questions | ||
The internal mutation methods cannot actually be implemented in terms of the View, because | ||
they return a mutable reference at the end, and there's no way to do that with the current | ||
View design. However, it's not clear why this is done by them. We believe it's simply to | ||
validate what the method *actually did*. If this is the case, then Views make this functionality | ||
obsolete. However, if this is *still* desirable, `set` could be tweaked to do this as well. | ||
However for some structures it may incur additional cost. Is this desirable functionality? | ||
|
||
Naming bikesheds! |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this can't be written to use
if let
ormatch
onview.get_mut()
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm convinced it's because the borrow of
view
currently doesn't end even withget_mut
returningNone
. Using an "empty/full" enum would make matching possible (and perhaps easier, too).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Lexical borrow scopes.
view.get_mut()
mutably borrowsview
, so using it in amatch
expression prevents mutatingview
in the fallback arm:I'm currently working on a modification to
if let
to fix the expression borrow scope issue, which would let you express this asHowever, this is not part of the
if let
RFC, and may or may not be accepted.FWIW, non-lexical borrow scopes will fix the
match
approach (and will fix the RFC-specifiedif let
as well).There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I figured it might be something like this.