-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow intrinsics to use #[fixed_stack_segment], and use it for numeric intrinsics #5975
Closed
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Achieves at least 5x speed up for some functions! Also, reorganise the delegation code so that the delegated function wrappers have the #[inline(always)] annotation, and reduce the repetition of delegate!(..).
@JensNockert and @brson were involved in the original discussion and are possibly interested in this. |
Cool! |
The |
bors
added a commit
that referenced
this pull request
Apr 20, 2013
…lton This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully. @pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in `base.rs` in `trans_closure`, but without adding the `set_no_inline` (reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect. I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved. Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million): ``` $ ./intrinsics-perf.sh f32 func new old speedup sin 0.80 2.75 3.44 cos 0.80 2.76 3.45 sqrt 0.56 2.73 4.88 ln 1.01 2.94 2.91 log10 0.97 2.90 2.99 log2 1.01 2.95 2.92 exp 0.90 2.85 3.17 exp2 0.92 2.87 3.12 pow 6.95 8.57 1.23 geometric mean: 2.97 $ ./intrinsics-perf.sh f64 func new old speedup sin 12.08 14.06 1.16 cos 12.04 13.67 1.14 sqrt 0.49 2.73 5.57 ln 4.11 5.59 1.36 log10 5.09 6.54 1.28 log2 2.78 5.10 1.83 exp 2.00 3.97 1.99 exp2 1.71 3.71 2.17 pow 5.90 7.51 1.27 geometric mean: 1.72 ``` So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup. (fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)
flip1995
pushed a commit
to flip1995/rust
that referenced
this pull request
Sep 10, 2020
default_trait_access: Fix wrong suggestion rust-lang/rust-clippy#5975 (comment) > I think the underlying problem is clippy suggests code with complete parameters, not clippy triggers this lint even for complex types. AFAIK, If code compiles with `Default::default`, it doesn't need to specify any parameters, as type inference is working. (So, in this case, `default_trait_access` should suggest `RefCell::default`.) Fixes rust-lang#5975 Fixes rust-lang#5990 changelog: `default_trait_access`: fixed wrong suggestion
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This implements the fixed_stack_segment for items with the rust-intrinsic abi, and then uses it to make f32 and f64 use intrinsics where appropriate, but without overflowing stacks and killing canaries (cf. #5686 and #5697). Hopefully.
@pcwalton, the fixed_stack_segment implementation involved mirroring its implementation in
base.rs
intrans_closure
, but without adding theset_no_inline
(reasoning: that would defeat the purpose of intrinsics), which is possibly incorrect.I'm a little hazy about how the underlying structure works, so I've annotated the 4 that have caused problems so far, but there's no guarantee that the other intrinsics are entirely well-behaved.
Anyway, it has good results (the following are just summing the result of each function for 1 up to 100 million):
So about 3x faster on average for f32, and 1.7x for f64. This isn't exactly apples to apples though, since this patch also adds #[inline(always)] to all the function definitions too, which possibly gives a speedup.
(fwiw, GitHub is showing 93c0888 after d9c54f8 (since I cherry-picked the latter from #5697), but git's order is the other way.)