-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaf node linked list for iteration #33
Conversation
0194dba
to
17d5ff3
Compare
hey @declanvk can you explain how this works and why it makes things easier ? None of the papers I read mentioned using a linked list in the leafs, did you come up with this for yourself ? |
Linked node is a pretty common optimization in B+-tree's (see https://en.wikipedia.org/wiki/B%2B_tree#Implementation), but I don't think I saw it anywhere in the ART paper or associated stuff.
The linked list works from the base case of a single tree containing only a single leaf node. The
On delete, we can easily remove a leaf node from the linked list by pointing the previous and next leaf nodes at each other.
|
I was also considering an implementation based on parent pointers (see #7 ), which is similar to how the Rust BTreeMap does it (see https://doc.rust-lang.org/1.80.1/src/alloc/collections/btree/node.rs.html#52), but I thought that the leaf siblings would be more direct, since iteration would just become traversing the linked list. The downsides are definitely the increase in space usage, if you see the tests based on
I haven't tested what the impact on iteration speed looks like though, which is what I would be weighing the size increase against. |
An ART implementation's biggest strength is definitely its compact size and excellent cache performance. I'm hoping that even though there is a significant increase in leaf node size and overall tree allocation size, that won't impact the search/insert/delete paths too much because the inner node have not change in size. However, if the benchmarks at the end show negative impacts to the main operations (get/insert/delete) that may outweight performance improvements to the iteration/range/etc |
Makes a lot of sense, please share your findings when you get there, if there is anything I can help, just let me know. |
1a158fb
to
d5ddd4a
Compare
**Description** - Rename `LeafNode::new` to `LeafNode::with_no_siblings` to emphasize that that the sibling pointers are unint - Add `LeafNode` functions for inserting and removing from the list - Minor changes like: - Switch `panic!` to `unreachable!` where relevant - Type alias for optional leaf pointer, since the type was large - Add a helper type to the for the `DotPrinter` and make the settings non-exhaustive to allow future settings without breakage - Fixup comments and make some functions unsafe that were not
**Description** - Add steps to the insert and delete operations so that the leaf linked list will be maintained with the correct order - This is the next step before we're able to use the linked list in various iteration processes - There is a TODO left to make the `deep_clone` functions work with the linked list - Add public API to the TreeMap type for converting to and from opaque node pointers. These functions are useful when running tests on the tree internals
**Description** - Add two new errors types for the WF checker that concern the contents and ordering of the leaf linked list - Add a FIXME note in the `impl Visitable for NodePtr` impl, the current state is probably broken
b737763
to
717a64d
Compare
Benchmark results from
Important bits:
There are other minor regressions and improvements too. I was expecting more of an impact on the insert path, but I'm not seeing very much of it. |
31a1271
to
9d9a2d2
Compare
Updated benchmarks after re-implementing the prefix iterators:
|
I think I'm going to |
**Description** Cache the minimum and maximum leaf nodes as part of the tree internal state so that it is easy to start tree iteration using the leaves linked list. This also speeds up the min/max access functions.
**Description** - Create a new unsafe "iterator" that will traverse the leaf node linked list. - Use the leaf node linked list iterator to implement the `Iter`, `IterMut`, `Keys`, `Values`, and `ValuesMut` iterators **Motivation** The changes to use the linked list iterator should be faster to iterate (needs benchmark) and less complex to implement. **Testing Done** `./scripts/full_test.sh nightly`
**Description** Fixes #19 This commit modifies the `IntoIter` implementation to be multiple steps: 1. Deallocate all the inner node of the trie, leaving only the leave nodes 2. Iterate through the leaf nodes by linked list and deallocate the leaf node and return the key-value pairs. 3. If there are any leaf nodes remaining on drop, deallocate all of them This commit also moves the tree dealloc operation into a separate module, and adds a couple variants of the dealloc operation to support the `IntoIter` modifications. The TreeMap fuzz tests are also updated to exercise the `.into_iter` implementation in a way that should cover all 3 steps listed above. **Motivation** This change makes the `IntoIter` more efficient, since we don't have to maintain a valid trie between each call to `next()`. **Testing Done** `./scripts/full-test.sh nightly`
**Description** - The previous implementation of `deallocate_leaves` was faulty because it would read all the way through the linked list of leaf nodes even if some leaves had been popped off the back by the iterator. - Fixing this by changing the `deallocate_leaves` take an iterator with start and end, instead of just start pointer. - Added some missing safety docs - Modified the `IntoIter` fuzz tests to iterate on the front and the back - Added new unit test to ensure drop count was correct **Testing Done** Ran fuzzer and full test script
**Description** - Fix bug where range iteration could sometimes be off by one on the end bound - Add debug_assertion checks on all inner node range iterator impls to check for cases "illegal" bound cases. Also add unit tests on these checks **Motivation** This change is supporting the range iterators change, since the inner node iterators are required. **Testing Done** Added unit tests, ran full nightly test
**Description** - Implement the basic range iterator which will returns tuples of `(key, value)`. - Add the `TreeMap::range` function and a doctest example - Remove some dead code comments that were lying around **Motivation** This is the central feature that I wanted to implement, the ability to query a sub-section of the `TreeMap` using the native key types. This commit only implements the `Range` iterator, not the `RangeMut` iterator so that the initial commit is devoid of `macro_rules` stuff. This commit also includes some useful machinery that I think could be applied to the `Prefix*` iterators, namely the `find_terminating_node` function. **Testing Done** Full nightly test. I was not able to run a fuzzer because of some difficulties setting it up.
**Description** - Fix another issue in the compressed inner node range iterator bounds handling and add some additional tests. - Factor out some common code in the range iterator lookup function - Fix a bug in the header `Debug` impl when it would read out of slice bounds when there were implicit bytes in the header. **Motivation** I made these changes while extending the range iterator to support `RangeMut`, and found the test case from `(Unbounded, Included(128))` **Testing Done** Added some more unit tests, tested with fully nightly
**Description** - Add the `RangeMut` iterator by generalizing the existing range iterator using a `macro_rules` macro. - Also add the `TreeMap::range_mut` method and docs + example **Motivation** This change is needed to reach feature parity with the `BTreeMap` which has a `range_mut` method. **Testing Done** Added doc test and new unit test. The `RangeMut` test coverage is pretty low, but I'm banking on the fact that the implementation is largely the same as `Range`.
**Description** - Add new concrete pointer type which only points to inner nodes - Swap equality condition in `TreeMap::eq` function **Motivation** - The new concrete pointer type is helpful in cases where I want to statically know that the pointer cannot be a leaf node pointer. There are a couple cases of `LeafNode(_) => unreachable!()` in match statements that I'd like to remove later - The equality condition was checking the expensive part first, having the check on number of elements go first means it could easily short-circuit on maps of different size **Testing Done** `cargo test`
**Description** The specialized clone based on `deep_clone` was removed because it did not account for the leaf node linked list. This commit re-implements `clone()` using a non-recursive algorithm that should be more efficient than creating an empty tree and inserting elements. This commit is needed, otherwise it would be a significant perf hit to add the leaf node linked list. This commit also removes the `deep_clone` functions and associated stuff. **Testing Done** `./scripts/full-test.sh nightly`
**Description** - Apply some clippy lints to cleanup the build - Fix some test cases for miri, and reduce their size - Converting `debug_assert` to `assert` to make sure the panic message in tests is consistent in release mode
**Description** - Add `criterion` and `iai-callgrind` benchmark for the range iterator - Modify some existing `criterion` benches to use the `dictionary_tree` helper
**Description** - Re-implement the prefix iterators using the leaf node linked-list and some of the helper functions from the `range` module. - Remove the `prefix_keys`, `prefix_values`, `prefix_values_mut` functions and iterators - Add some additional unit tests **Motivation** - The new implementation is faster and uses less code, since it can re-use the search function from the `range` module. - These iterator variants didn't add much value, aside from matching the existing `keys`, `values`, `values_mut` iterators. **Testing Done** `./scripts/full-test.sh nightly`
4c262c6
to
33fe4c3
Compare
I know you already merged and did everything, but anyway nice changes, ty |
This PR adds a doubly linked list to the leaf nodes to implement iteration in a more natural way. This should make it easier to implement things like range or prefix and maybe even the drain + filter.