Optimize ebml::reader::vuint_at() #11498

c-a · 2014-01-12T12:36:37Z

Use a lookup table, SHIFT_MASK_TABLE, that for every possible four
bit prefix holds the number of times the value should be right shifted and what
the right shifted value should be masked with. This way we can get rid of the
branches which in my testing gives approximately a 2x speedup.

Timings on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz

-- Before --
running 5 tests
test ebml::tests::test_vuint_at ... ok
test ebml::bench::vuint_at_A_aligned ... bench: 494 ns/iter (+/- 3)
test ebml::bench::vuint_at_A_unaligned ... bench: 494 ns/iter (+/- 4)
test ebml::bench::vuint_at_D_aligned ... bench: 467 ns/iter (+/- 5)
test ebml::bench::vuint_at_D_unaligned ... bench: 467 ns/iter (+/- 5)

-- After --
running 5 tests
test ebml::tests::test_vuint_at ... ok
test ebml::bench::vuint_at_A_aligned ... bench: 181 ns/iter (+/- 2)
test ebml::bench::vuint_at_A_unaligned ... bench: 192 ns/iter (+/- 1)
test ebml::bench::vuint_at_D_aligned ... bench: 181 ns/iter (+/- 3)
test ebml::bench::vuint_at_D_unaligned ... bench: 197 ns/iter (+/- 6)

Since reader::vuint_at() returns a result of type reader::Res it makes sense to make it public. Due to rust's current behavior of externally referenced private structures, rust-lang#10573, you could still use the result and assign it to a variable if you let the compiler do the type assignment, but you could not explicitly annotate a variable to hold a reader::Res.

huonw · 2014-01-12T13:36:25Z

src/libextra/ebml.rs

@@ -130,32 +130,32 @@ pub mod reader {
            return vuint_at_slow(data, start);
        }

+        static shift_table: [uint, ..16] = [


Conventionally these would be SHIFT_TABLE and MASK_TABLE.

c-a · 2014-01-12T15:27:28Z

New measurement storing mask and shift in a single table:
-- (u32, u32) tuple --
running 5 tests
test ebml::tests::test_vuint_at ... ok
test ebml::bench::vuint_at_A_aligned ... bench: 181 ns/iter (+/- 2)
test ebml::bench::vuint_at_A_unaligned ... bench: 192 ns/iter (+/- 1)
test ebml::bench::vuint_at_D_aligned ... bench: 181 ns/iter (+/- 3)
test ebml::bench::vuint_at_D_unaligned ... bench: 197 ns/iter (+/- 6)

And for academic interest without the bound check at the top of the function:
-- Without fallback to vuint_slow --
running 5 tests
test ebml::tests::test_vuint_at ... ok
test ebml::bench::vuint_at_A_aligned ... bench: 44 ns/iter (+/- 1)
test ebml::bench::vuint_at_A_unaligned ... bench: 40 ns/iter (+/- 1)
test ebml::bench::vuint_at_D_aligned ... bench: 40 ns/iter (+/- 1)
test ebml::bench::vuint_at_D_unaligned ... bench: 44 ns/iter (+/- 1)

alexcrichton · 2014-01-12T18:22:26Z

This looks pretty awesome, thanks!

Could you add some comments to the lookup table as to why it exists and what the values/rows signify?

Use a lookup table, SHIFT_MASK_TABLE, that for every possible four bit prefix holds the number of times the value should be right shifted and what the right shifted value should be masked with. This way we can get rid of the branches which in my testing gives approximately a 2x speedup.

brson · 2014-01-13T00:37:49Z

Very cool. What kind of impact does this have on rustc compiles?

brson · 2014-01-13T00:39:03Z

You may also be interested in #9303

Use a lookup table, SHIFT_MASK_TABLE, that for every possible four bit prefix holds the number of times the value should be right shifted and what the right shifted value should be masked with. This way we can get rid of the branches which in my testing gives approximately a 2x speedup. Timings on Intel(R) Core(TM) i5-3570K CPU @ 3.40GHz -- Before -- running 5 tests test ebml::tests::test_vuint_at ... ok test ebml::bench::vuint_at_A_aligned ... bench: 494 ns/iter (+/- 3) test ebml::bench::vuint_at_A_unaligned ... bench: 494 ns/iter (+/- 4) test ebml::bench::vuint_at_D_aligned ... bench: 467 ns/iter (+/- 5) test ebml::bench::vuint_at_D_unaligned ... bench: 467 ns/iter (+/- 5) -- After -- running 5 tests test ebml::tests::test_vuint_at ... ok test ebml::bench::vuint_at_A_aligned ... bench: 181 ns/iter (+/- 2) test ebml::bench::vuint_at_A_unaligned ... bench: 192 ns/iter (+/- 1) test ebml::bench::vuint_at_D_aligned ... bench: 181 ns/iter (+/- 3) test ebml::bench::vuint_at_D_unaligned ... bench: 197 ns/iter (+/- 6)

…affects_lint, r=Centri3 fix enum_variant_names depending lint depending on order changelog: [`enum_variant_names`]: fix single word variants preventing lint of later variant pre/postfixed with the enum name fixes rust-lang#11494 Single word variants prevented checking the `check_enum_start` and `check_enum_end` for being run on later variants

c-a added 2 commits January 12, 2014 13:33

extra::ebml: Add unit test for vuint_at()

1130886

huonw reviewed Jan 12, 2014
View reviewed changes

fixup! ebml::extra: Optimize reader::vuint_at()

f4c9ed4

bors closed this Jan 17, 2014

bors merged commit f4c9ed4 into rust-lang:master Jan 17, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize ebml::reader::vuint_at() #11498

Optimize ebml::reader::vuint_at() #11498

c-a commented Jan 12, 2014

huonw Jan 12, 2014

c-a commented Jan 12, 2014

alexcrichton commented Jan 12, 2014

brson commented Jan 13, 2014

brson commented Jan 13, 2014

Optimize ebml::reader::vuint_at() #11498

Optimize ebml::reader::vuint_at() #11498

Conversation

c-a commented Jan 12, 2014

huonw Jan 12, 2014

Choose a reason for hiding this comment

c-a commented Jan 12, 2014

alexcrichton commented Jan 12, 2014

brson commented Jan 13, 2014

brson commented Jan 13, 2014