-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimise std::trie's iterator #11497
Conversation
Doesn't the trie have a bounded tree height since the keys are integers? (specifically it seems something like at most 16 levels on 64-bit) If so, why not just replace the growable heap-allocated array in the iterators with an immediate fixed-size array? That should result in the same or faster performance than an internal iterator. |
I agree with @bill-myers, I would rather attempt to optimize the external iterators rather than adding internal ones. Another option would be to have the iterator maintain a "reasonably sized" fixed-size vector which then gets spilled onto the heap if you iterate farther down perhaps? Regardless, I'd like to pursue optimization before adding internal iterators. If it does indeed turn out that internal iterators are fundamentally faster, then I think it's ok to add them, but right now I wouldn't necessarily be convinced that there's a fundamental reason external iterators can't be faster. |
No need for internal iterators: just a gentle application of |
I like that this is faster, but it makes me worried. The trie can rewritten with raw pointers and have over 5x faster insert/removal too... but isn't this an admission that Rust's safety model doesn't really work for a systems language? It's going to occur for all owned tree-like data structures. I think a general solution without unsafe code is necessary. |
I agree in principal, but I'm not sure it is possible to have an safe solution that is this low level/performant, at least, not without some sort of dependent typing. I certainly don't have a better proposal than carefully writing parts of the stdlib (I seriously had to go all the way down to stepping through by |
There are plenty of user-defined tree-like data structures too, and Rust needs a solution to avoid them being painfully slow compared to C++. It's not good that after lots of tweaking, |
It's a bit off-topic, but... The issue with self.stack.unsafe_mut_ref(self.length) and start_ptr.ofsset(self.length) is that the element type is VecMutIterator, which is large, and x86 can only address arrays or compute addresses with element sizes of 1, 2, 4 or 8 for free. If the element size is different, you need an explicit multiplication instruction to compute the final address, which of course reduces performance since it has lower throughput and latency. This multiplication and safety issue can be solved by using iterators for the fixed array instead of raw pointers or integer offsets. However, there is still a bounds check on each iterator increment, which is unavoidable for safety, because the compiler has no easy way of divining that the increment will not happen more times than the fixed stack size (that would require automatic theorem proving based, in this case, on the fact that if you shift any integer by K, it will become zero in max_int/K steps and this means that nodes at that level won't be split, etc.) Ultimately, the outcome is that library data structures probably must be written with some unsafe code, and using only safe code is possible only for applications. That's still far better than the C++ situation. |
Owned tree data structures are common in user code too and they'll certainly be everywhere in Servo. I don't think it's acceptable to require that these all use |
It's actually "only" 3 words (which is still too large on 64-bit platforms; even if/when we remove the unnecessary 3rd word).
We'd need a reversible iterator of some sort to be able to push and pop (and, as you say, there's the bounds checking).
I imagine we could get somewhere with some limited bounded integer/pointer types, but I don't see how we can offer a general solution other than dependent typing, or getting rustc/LLVM to recognise & optimise the "swap out, put back in" idiom (for inserts/removes) where possible (which may not be many places: probably interacts poorly with destructors/failure). |
Well, they could use a common library implementation of a tree. Specifically, the bounds checks for the iteration stack can only be avoided for trees with a constant bound on their height, which generally means they are either tries, balanced search trees or heaps, which are or can all be provided by libstd. If you are iterating over an unbounded tree (say, an HTML document), then you need a growable heap-allocated stack (or something that is either on the stack or on the heap depending on size), and you only need to add/peek/remove the last element, so you don't save any bounds checks with unsafe code. Also, with code that actually does something with the elements its iterating on, I doubt that that bounds check matters.
I think you in fact just need a fixed-capacity stack type with fast access to the last element (i.e. something that holds a [T, ..n] and the current length in bytes). Then, optimal performance only requires an "unsafe_push" function that doesn't check for going out of bounds by assuming that you know it can't happen. Without integer generic parameters it seems that such a type cannot be implemented as a generic struct, but it should be implementable as a macro. |
Looks like the internal iterators were removed from this commit, and it's just the external iterators with some optimizations are left (nice job!) r=me with a rebase |
This are *trivial* to reimplement in terms of each_reverse if that extra little bit of performance is needed.
This stores the stack of iterators inline (we have a maximum depth with `uint` keys), and then uses direct pointer offsetting to manipulate it, in a blazing fast way: Before: bench_iter_large ... bench: 43187 ns/iter (+/- 3082) bench_iter_small ... bench: 618 ns/iter (+/- 288) After: bench_iter_large ... bench: 13497 ns/iter (+/- 1575) bench_iter_small ... bench: 220 ns/iter (+/- 91)
This stores the stack of iterators inline (we have a maximum depth with `uint` keys), and then uses direct pointer offsetting to manipulate it, in a blazing fast way: Before: bench_iter_large ... bench: 43187 ns/iter (+/- 3082) bench_iter_small ... bench: 618 ns/iter (+/- 288) After: bench_iter_large ... bench: 13497 ns/iter (+/- 1575) bench_iter_small ... bench: 220 ns/iter (+/- 91) Also, removes `.each_{key,value}_reverse` as an offering to placate the gods of external iterators for my heinous sin of attempting to add new internal ones (in a previous version of this PR).
Support multi-character punct tokens in MBE Fixes rust-lang#11497 In the context of MBE, consecutive puncts are parsed as multi-character punct tokens whenever possible. For example, `:::` is parsed as ``[Punct(`::`), Punct(`:`)]`` and shouldn't get matched to patterns like `: : :` or `: ::`. We have implemented this behavior only for when we match puncts against `tt` fragments, but not when we match puncts literally. This PR extracts the multi-character punct handling procedure into a separate method and extends its support for literal matching. For good measure, this PR adds support for `<-` token, which is still [considered as one token in rustc](https://github.com/rust-lang/rust/blob/e3961864075eaa9e855e5eec6b4f148029684539/compiler/rustc_ast/src/token.rs#L249) despite the placement syntax having been removed.
Remove `derive_new` test dependency It is the last thing depending on syn 1.0 in clippy changelog: none
This stores the stack of iterators inline (we have a maximum depth with
uint
keys), and then uses direct pointer offsetting to manipulate it,in a blazing fast way:
Before:
After:
Also, removes
.each_{key,value}_reverse
as an offering toplacate the gods of external iterators for my heinous sin of
attempting to add new internal ones (in a previous version of this
PR).