perf: Skip the first goto call in full construction (-35%) #33

Marwes · 2018-07-12T21:13:05Z

perf: Skip the first goto call in full construction (-35%)

This is probably less significant when larger number of needles are used
but should still be a decent speed improvement.

 name                     old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 bench_construction       131,148      122,956            -8,192   -6.25%   x 1.07
 bench_full_construction  309,493      200,595          -108,898  -35.19%   x 1.54

perf: Use for_each_transition in AcAutomaton::fill (-84%)

 name                     old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 bench_construction       167,415      25,804           -141,611  -84.59%   x 6.49
 bench_full_construction  275,170      133,253          -141,917  -51.57%   x 2.07

(The two benchmarks were done on different computers)

Marwes · 2018-07-12T21:17:44Z

This should be the last PR :) It does complicate things a bit more but it really helps when querying for the next state of Dense as many states (at least in these short benchmarks) only have distance of 1 to the first non-fail state.

However, if next_state needs to traverse more than one state then it should be increasingly likely that the memorization kicks so chances are the memoization gives a greater performance improvement than 12% in that case instead!

Marwes · 2018-07-13T08:52:50Z

Another thing that could have improved things further would be to actively update full_automaton as the automaton is searched (essentially doing path compression https://en.wikipedia.org/wiki/Disjoint-set_data_structure#Path_compression). Unfortunately, needing to query full_automation and checking for != FAIL_STATE on each loop to see if the state is already set instead of just checking si < current_si seems to negate any improvements of doing so.

Marwes · 2018-07-13T10:08:59Z

(*) Second to last PR, it is possible to improve (and complicate) the AcAutomaton a bit more as well (-24%)

BurntSushi · 2018-07-13T10:38:51Z

Thanks for this! I'm going to let this sit for at least a week to allow any additional changes you might have planned to accrue here in order to minimize churn and context switching.

Marwes · 2018-08-04T22:32:11Z

(*) Second to last PR, it is possible to improve (and complicate) the AcAutomaton a bit more as well (-24%)

This ended up not having that large of an impact so I scrapped it.

Instead I realized that for_each_transition can be used pervasively with some slight restructuring which it seems like LLVM really liked. As Dense states are just stored as Vec<(u8, StateIdx)> we avoid iterating over the entire 0...255 range and instead only iterate the few states in the Vec yielding a massive speedup in constructing AcAutomaton.

The only idea I have left to try is to re-apply the above idea as it could be more significant after this improvment but I don't have those sources accessible atm so that will have to wait.

This is probably less significant when larger number of needles are used but should still be a decent speed improvement. ``` name old ns/iter new ns/iter diff ns/iter diff % speedup bench_construction 131,148 122,956 -8,192 -6.25% x 1.07 bench_full_construction 309,493 200,595 -108,898 -35.19% x 1.54 ```

``` name old ns/iter new ns/iter diff ns/iter diff % speedup bench_construction 167,415 25,804 -141,611 -84.59% x 6.49 bench_full_construction 275,170 133,253 -141,917 -51.57% x 2.07 ```

Since the previous commit were so significant I find it safest to add and explicit function for this so we don't rely on LLVM doing the optimization for dense states.

Marwes · 2018-08-13T12:35:24Z

Rebased and added for_each_ok_transition so we don't have to rely on LLVM for the improvement in 2b9df73.

The only idea I have left to try is to re-apply the above idea as it could be more significant after this improvment but I don't have those sources accessible atm so that will have to wait.

Still no improvement from this so I am finished with all optimizations I am able to do here.

Marwes · 2018-08-22T08:18:54Z

ping @BurntSushi (this is done)

BurntSushi · 2018-08-27T00:04:49Z

Awesome! Thanks so much for this. Sorry it took a while. I needed to make time to carefully review this. I think I buy everything! I've merged this locally (to drop the unneeded CI commit---thank you for that! I fixed it elsewhere) and pushed it to crates.io in aho-corasick 0.6.8.

These speed improvements are really really nice. It actually makes a huge difference in a real test case I've been playing with: BurntSushi/ripgrep#838

Marwes force-pushed the opt branch 2 times, most recently from 46fd1e8 to f4ccc69 Compare July 12, 2018 21:22

Marwes added 2 commits August 13, 2018 13:09

perf: Use for_each_transition in AcAutomaton::fill (-84%)

2b9df73

``` name old ns/iter new ns/iter diff ns/iter diff % speedup bench_construction 167,415 25,804 -141,611 -84.59% x 6.49 bench_full_construction 275,170 133,253 -141,917 -51.57% x 2.07 ```

Marwes force-pushed the opt branch from 782c78d to a396cb5 Compare August 13, 2018 11:09

refactor: Add for_each_ok_transition to rely less on LLVM optimizing

c2f65ac

Since the previous commit were so significant I find it safest to add and explicit function for this so we don't rely on LLVM doing the optimization for dense states.

Marwes force-pushed the opt branch from a396cb5 to 483f8ea Compare August 13, 2018 12:20

chore: Fix travis build on 1.12

f33244c

Marwes force-pushed the opt branch from 483f8ea to f33244c Compare August 13, 2018 12:26

BurntSushi closed this in ecbd064 Aug 27, 2018

Marwes deleted the opt branch August 27, 2018 08:04

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf: Skip the first goto call in full construction (-35%) #33

perf: Skip the first goto call in full construction (-35%) #33

Marwes commented Jul 12, 2018 •

edited

Loading

Marwes commented Jul 12, 2018 •

edited

Loading

Marwes commented Jul 13, 2018 •

edited

Loading

Marwes commented Jul 13, 2018 •

edited

Loading

BurntSushi commented Jul 13, 2018

Marwes commented Aug 4, 2018

Marwes commented Aug 13, 2018

Marwes commented Aug 22, 2018

BurntSushi commented Aug 27, 2018

perf: Skip the first goto call in full construction (-35%) #33

perf: Skip the first goto call in full construction (-35%) #33

Conversation

Marwes commented Jul 12, 2018 • edited Loading

Marwes commented Jul 12, 2018 • edited Loading

Marwes commented Jul 13, 2018 • edited Loading

Marwes commented Jul 13, 2018 • edited Loading

BurntSushi commented Jul 13, 2018

Marwes commented Aug 4, 2018

Marwes commented Aug 13, 2018

Marwes commented Aug 22, 2018

BurntSushi commented Aug 27, 2018

Marwes commented Jul 12, 2018 •

edited

Loading

Marwes commented Jul 12, 2018 •

edited

Loading

Marwes commented Jul 13, 2018 •

edited

Loading

Marwes commented Jul 13, 2018 •

edited

Loading