Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

huonw · 2013-04-20T16:00:05Z

This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully.

@pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in base.rs in trans_closure, but without adding the set_no_inline (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect.

I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved.

Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million):

$ ./intrinsics-perf.sh f32
func   new   old   speedup
sin    0.80  2.75  3.44
cos    0.80  2.76  3.45
sqrt   0.56  2.73  4.88
ln     1.01  2.94  2.91
log10  0.97  2.90  2.99
log2   1.01  2.95  2.92
exp    0.90  2.85  3.17
exp2   0.92  2.87  3.12
pow    6.95  8.57  1.23

   geometric mean: 2.97

$ ./intrinsics-perf.sh f64
func   new   old   speedup
sin    12.08  14.06  1.16
cos    12.04  13.67  1.14
sqrt   0.49  2.73  5.57
ln     4.11  5.59  1.36
log10  5.09  6.54  1.28
log2   2.78  5.10  1.83
exp    2.00  3.97  1.99
exp2   1.71  3.71  2.17
pow    5.90  7.51  1.27

   geometric mean: 1.72

So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup.

(fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)

…nsics.

Achieves at least 5x speed up for some functions! Also, reorganise the delegation code so that the delegated function wrappers have the #[inline(always)] annotation, and reduce the repetition of delegate!(..).

huonw · 2013-04-20T16:02:35Z

@JensNockert and @brson were involved in the original discussion and are possibly interested in this.

auroranockert · 2013-04-20T16:03:48Z

Cool!

pcwalton · 2013-04-20T18:53:25Z

The set_no_inline should not be necessary for any functions, actually. It was a relic of an earlier patch; the only reason it's in is that I haven't fully tested the consequences of removing it and I wanted to land the patch in its current state.

@pcwalton

…lton This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully. @pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in `base.rs` in `trans_closure`, but without adding the `set_no_inline` (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect. I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved. Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million): ``` $ ./intrinsics-perf.sh f32 func new old speedup sin 0.80 2.75 3.44 cos 0.80 2.76 3.45 sqrt 0.56 2.73 4.88 ln 1.01 2.94 2.91 log10 0.97 2.90 2.99 log2 1.01 2.95 2.92 exp 0.90 2.85 3.17 exp2 0.92 2.87 3.12 pow 6.95 8.57 1.23 geometric mean: 2.97 $ ./intrinsics-perf.sh f64 func new old speedup sin 12.08 14.06 1.16 cos 12.04 13.67 1.14 sqrt 0.49 2.73 5.57 ln 4.11 5.59 1.36 log10 5.09 6.54 1.28 log2 2.78 5.10 1.83 exp 2.00 3.97 1.99 exp2 1.71 3.71 2.17 pow 5.90 7.51 1.27 geometric mean: 1.72 ``` So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup. (fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)

default_trait_access: Fix wrong suggestion rust-lang/rust-clippy#5975 (comment) > I think the underlying problem is clippy suggests code with complete parameters, not clippy triggers this lint even for complex types. AFAIK, If code compiles with `Default::default`, it doesn't need to specify any parameters, as type inference is working. (So, in this case, `default_trait_access` should suggest `RefCell::default`.) Fixes rust-lang#5975 Fixes rust-lang#5990 changelog: `default_trait_access`: fixed wrong suggestion

huonw added 3 commits April 21, 2013 01:40

librustc: implement and use fixed_stack_segment attribute for intri…

93c0888

…nsics.

librustc: use LLVM intrinsics for several floating point operations.

d9c54f8

Achieves at least 5x speed up for some functions! Also, reorganise the delegation code so that the delegated function wrappers have the #[inline(always)] annotation, and reduce the repetition of delegate!(..).

testsuite: update tests to not use math intrinsics directly

c5baeb1

bors closed this Apr 20, 2013

huonw mentioned this pull request Apr 21, 2013

Use sqrt intrinsic for floating point sqrt #5686

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

huonw commented Apr 20, 2013

huonw commented Apr 20, 2013

auroranockert commented Apr 20, 2013

pcwalton commented Apr 20, 2013

Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975

Conversation

huonw commented Apr 20, 2013

huonw commented Apr 20, 2013

auroranockert commented Apr 20, 2013

pcwalton commented Apr 20, 2013