Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimise std::trie's iterator #11497

Merged
merged 2 commits into from
Jan 18, 2014
Merged

Optimise std::trie's iterator #11497

merged 2 commits into from
Jan 18, 2014

Conversation

huonw
Copy link
Member

@huonw huonw commented Jan 12, 2014

This stores the stack of iterators inline (we have a maximum depth with
uint keys), and then uses direct pointer offsetting to manipulate it,
in a blazing fast way:

Before:

bench_iter_large          ... bench:     43187 ns/iter (+/- 3082)
bench_iter_small          ... bench:       618 ns/iter (+/- 288)

After:

bench_iter_large          ... bench:     13497 ns/iter (+/- 1575)
bench_iter_small          ... bench:       220 ns/iter (+/- 91)

Also, removes .each_{key,value}_reverse as an offering to
placate the gods of external iterators for my heinous sin of
attempting to add new internal ones (in a previous version of this
PR).

@bill-myers
Copy link
Contributor

Doesn't the trie have a bounded tree height since the keys are integers? (specifically it seems something like at most 16 levels on 64-bit)

If so, why not just replace the growable heap-allocated array in the iterators with an immediate fixed-size array?

That should result in the same or faster performance than an internal iterator.

@alexcrichton
Copy link
Member

I agree with @bill-myers, I would rather attempt to optimize the external iterators rather than adding internal ones. Another option would be to have the iterator maintain a "reasonably sized" fixed-size vector which then gets spilled onto the heap if you iterate farther down perhaps?

Regardless, I'd like to pursue optimization before adding internal iterators. If it does indeed turn out that internal iterators are fundamentally faster, then I think it's ok to add them, but right now I wouldn't necessarily be convinced that there's a fundamental reason external iterators can't be faster.

@huonw
Copy link
Member Author

huonw commented Jan 13, 2014

No need for internal iterators: just a gentle application of unsafe to get the external iterators even faster than the internal ones were.

@thestinger
Copy link
Contributor

I like that this is faster, but it makes me worried. The trie can rewritten with raw pointers and have over 5x faster insert/removal too... but isn't this an admission that Rust's safety model doesn't really work for a systems language? It's going to occur for all owned tree-like data structures. I think a general solution without unsafe code is necessary.

@huonw
Copy link
Member Author

huonw commented Jan 13, 2014

I agree in principal, but I'm not sure it is possible to have an safe solution that is this low level/performant, at least, not without some sort of dependent typing. I certainly don't have a better proposal than carefully writing parts of the stdlib unsafely for performance (at least generics mean that we only have to write it once for each data structure, but as you say, there are many tree-like data structures).

(I seriously had to go all the way down to stepping through by .offseting pointers directly to get this performance, the following were all at least 2x slower self.stack[self.length], self.stack.unsafe_mut_ref(self.length), start_ptr.offset(self.length).)

@thestinger
Copy link
Contributor

There are plenty of user-defined tree-like data structures too, and Rust needs a solution to avoid them being painfully slow compared to C++. It's not good that after lots of tweaking, TrieMap and TreeMap are several times slower at inserts and removals compared to easier to write unsafe code. I really don't want the solution to be that you have to write your tree with unsafe code for fast insertion, removal and iteration.

@bill-myers
Copy link
Contributor

It's a bit off-topic, but...

The issue with self.stack.unsafe_mut_ref(self.length) and start_ptr.ofsset(self.length) is that the element type is VecMutIterator, which is large, and x86 can only address arrays or compute addresses with element sizes of 1, 2, 4 or 8 for free.

If the element size is different, you need an explicit multiplication instruction to compute the final address, which of course reduces performance since it has lower throughput and latency.

This multiplication and safety issue can be solved by using iterators for the fixed array instead of raw pointers or integer offsets.

However, there is still a bounds check on each iterator increment, which is unavoidable for safety, because the compiler has no easy way of divining that the increment will not happen more times than the fixed stack size (that would require automatic theorem proving based, in this case, on the fact that if you shift any integer by K, it will become zero in max_int/K steps and this means that nodes at that level won't be split, etc.)

Ultimately, the outcome is that library data structures probably must be written with some unsafe code, and using only safe code is possible only for applications.

That's still far better than the C++ situation.

@thestinger
Copy link
Contributor

Owned tree data structures are common in user code too and they'll certainly be everywhere in Servo. I don't think it's acceptable to require that these all use unsafe code to avoid a 2-3x performance hit on everything but non-iterative searches.

@huonw
Copy link
Member Author

huonw commented Jan 14, 2014

VecMutIterator, which is large

It's actually "only" 3 words (which is still too large on 64-bit platforms; even if/when we remove the unnecessary 3rd word).

This multiplication issue can be solved by using iterators for the fixed array instead of raw pointers or integer offsets.

We'd need a reversible iterator of some sort to be able to push and pop (and, as you say, there's the bounds checking).

Owned tree data structures are common in user code too and they'll certainly be everywhere in Servo. I don't think it's acceptable to require that these all use unsafe code to avoid a 2-3x performance hit on everything but non-iterative searches.

I imagine we could get somewhere with some limited bounded integer/pointer types, but I don't see how we can offer a general solution other than dependent typing, or getting rustc/LLVM to recognise & optimise the "swap out, put back in" idiom (for inserts/removes) where possible (which may not be many places: probably interacts poorly with destructors/failure).

@bill-myers
Copy link
Contributor

@thestinger

Well, they could use a common library implementation of a tree.

Specifically, the bounds checks for the iteration stack can only be avoided for trees with a constant bound on their height, which generally means they are either tries, balanced search trees or heaps, which are or can all be provided by libstd.

If you are iterating over an unbounded tree (say, an HTML document), then you need a growable heap-allocated stack (or something that is either on the stack or on the heap depending on size), and you only need to add/peek/remove the last element, so you don't save any bounds checks with unsafe code.

Also, with code that actually does something with the elements its iterating on, I doubt that that bounds check matters.

@huonw

We'd need a reversible iterator of some sort to be able to push and pop (and, as you say, there's the bounds checking).

I think you in fact just need a fixed-capacity stack type with fast access to the last element (i.e. something that holds a [T, ..n] and the current length in bytes).

Then, optimal performance only requires an "unsafe_push" function that doesn't check for going out of bounds by assuming that you know it can't happen.

Without integer generic parameters it seems that such a type cannot be implemented as a generic struct, but it should be implementable as a macro.

@alexcrichton
Copy link
Member

Looks like the internal iterators were removed from this commit, and it's just the external iterators with some optimizations are left (nice job!)

r=me with a rebase

huonw added 2 commits January 18, 2014 10:45
This are *trivial* to reimplement in terms of each_reverse if that extra
little bit of performance is needed.
This stores the stack of iterators inline (we have a maximum depth with
`uint` keys), and then uses direct pointer offsetting to manipulate it,
in a blazing fast way:

Before:

    bench_iter_large          ... bench:     43187 ns/iter (+/- 3082)
    bench_iter_small          ... bench:       618 ns/iter (+/- 288)

After:

    bench_iter_large          ... bench:     13497 ns/iter (+/- 1575)
    bench_iter_small          ... bench:       220 ns/iter (+/- 91)
bors added a commit that referenced this pull request Jan 18, 2014
This stores the stack of iterators inline (we have a maximum depth with
`uint` keys), and then uses direct pointer offsetting to manipulate it,
in a blazing fast way:

Before:

    bench_iter_large          ... bench:     43187 ns/iter (+/- 3082)
    bench_iter_small          ... bench:       618 ns/iter (+/- 288)

After:

    bench_iter_large          ... bench:     13497 ns/iter (+/- 1575)
    bench_iter_small          ... bench:       220 ns/iter (+/- 91)

Also, removes `.each_{key,value}_reverse` as an offering to
placate the gods of external iterators for my heinous sin of 
attempting to add new internal ones (in a previous version of this
PR).
@bors bors closed this Jan 18, 2014
@bors bors merged commit 0148055 into rust-lang:master Jan 18, 2014
@huonw huonw deleted the trie-internal-iter branch June 27, 2014 06:48
arcnmx pushed a commit to arcnmx/rust that referenced this pull request Jan 9, 2023
Support multi-character punct tokens in MBE

Fixes rust-lang#11497

In the context of MBE, consecutive puncts are parsed as multi-character punct tokens whenever possible. For example, `:::` is parsed as ``[Punct(`::`), Punct(`:`)]`` and shouldn't get matched to patterns like `: : :` or `: ::`.

We have implemented this behavior only for when we match puncts against `tt` fragments, but not when we match puncts literally. This PR extracts the multi-character punct handling procedure into a separate method and extends its support for literal matching.

For good measure, this PR adds support for `<-` token, which is still [considered as one token in rustc](https://github.com/rust-lang/rust/blob/e3961864075eaa9e855e5eec6b4f148029684539/compiler/rustc_ast/src/token.rs#L249) despite the placement syntax having been removed.
flip1995 pushed a commit to flip1995/rust that referenced this pull request Sep 25, 2023
Remove `derive_new` test dependency

It is the last thing depending on syn 1.0 in clippy

changelog: none
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants