Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weekly Sync 2024-09-13 #6

Merged
merged 4 commits into from
Sep 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .bleep
Original file line number Diff line number Diff line change
@@ -1 +1 @@
46cdb8138867aa29ff1fd9d672c1c4bdd63914f7
6f6a59de57389578cd13e173b6f8cf2069ea83e1
2 changes: 1 addition & 1 deletion .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ jobs:
trie-hard:
strategy:
matrix:
toolchain: [nightly, 1.72, 1.80.0]
toolchain: [nightly, 1.74, 1.80.0]
runs-on: ubuntu-latest
# Only run on "pull_request" event for external PRs. This is to avoid
# duplicate builds for PRs created from internal branches.
Expand Down
4 changes: 2 additions & 2 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,8 @@ Fast implementation of a trie data structure
"""

[dev-dependencies]
rstest = "0.21.0"
criterion = "0.3"
rstest = "0.22.0"
criterion = "0.5.1"
radix_trie = "0.2.1"
paste = "1.0.15"
once_cell = "1.19.0"
Expand Down
8 changes: 4 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This crate is an implementation of the [trie](https://en.wikipedia.org/wiki/Trie

## Performance

There are several other trie implementations for rust that are more full featured, so it you are looking for a more robust tool, you will probably want to check out [`radix_trie`](https://crates.io/crates/radix_trie) which seems to have the best features and performance. On the other hand, if you want raw speed and have the same narrow use case, you came to the right place!
There are several other trie implementations for rust that are more full-featured, so if you are looking for a more robust tool, you will probably want to check out [`radix_trie`](https://crates.io/crates/radix_trie) which seems to have the best features and performance. On the other hand, if you want raw speed and have the same narrow use case, you came to the right place!

Here is a chart showing the time taken to read 10k entries from a map that consists of 119 entries containing only lower-case characters, numbers, and `-`. As you can see, when miss rate gets above 50% the performance of trie-hard surpasses `std::HashMap` and improves as miss rates get higher.

Expand Down Expand Up @@ -69,7 +69,7 @@ let root = Node {

This tells us that if a byte other than `a` or `d` appears in the first position, the key being tested does not appear in the trie. This ability to make an exclusion decision at every step is what makes tries more appealing than even hashmaps in some cases. Searching for a string in a hashmap requires hashing the entire string whereas a trie can potentially determine that a string is not part of a set within a single byte.

If the byte is `a` or `d` we still need to know which node to go to next. All nodes in the graph are stored in contiguous a vector (with the root node at index zero). Each node will contain the information on where its child appears in the array of nodes. In our example the root node will point to nodes with indexes 1 and 2. Where 1 is the index with keys starting with `a` and 2 is the node for keys starting with `d`. It is important that these child nodes are ordered by their corresponding byte.
If the byte is `a` or `d` we still need to know which node to go to next. All nodes in the graph are stored in a contiguous vector (with the root node at index zero). Each node will contain the information on where its child appears in the array of nodes. In our example the root node will point to nodes with indexes 1 and 2. Where 1 is the index with keys starting with `a` and 2 is the node for keys starting with `d`. It is important that these child nodes are ordered by their corresponding byte.

```rust
let root = Node {
Expand All @@ -89,7 +89,7 @@ At this point we can visualize the conceptual trie and trie-hard like this
| ------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| ![First Layer Conceptual Trie](https://github.com/cloudflare/trie-hard/blob/main/resources/FirstLayerVanilla.png?raw=true "Header Read vs HashMap Benchmark") | ![Trie Hard read is faster than HashMap for small maps where miss rate is high](https://github.com/cloudflare/trie-hard/blob/main/resources/FirstLayerTrieHard.png?raw=true "Header Read vs HashMap Benchmark") |

Because of the recursive nature of a trie, we can repeat the same process of creating a mask based on allowed bytes at each node and preparing a set of children for each node. When we reach a complete word that appears in the initial set, we need to signify that the node is a valid word. Visually we will mark them with greed, but in rust they just appear as a different enum variant of `TrieNode`.
Because of the recursive nature of a trie, we can repeat the same process of creating a mask based on allowed bytes at each node and preparing a set of children for each node. When we reach a complete word that appears in the initial set, we need to signify that the node is a valid word. Visually we will mark them with green, but in rust they just appear as a different enum variant of `TrieNode`.

After repeating for one more layer, we can visualize the trie like the this.

Expand All @@ -99,7 +99,7 @@ After repeating for one more layer, we can visualize the trie like the this.

Notice that `do` shows up as green because it is a complete word found in the original collection.

Finally we add the last layer and complete this small trie.
Finally, we add the last layer and complete this small trie.


| Conceptual | Trie-Hard |
Expand Down