-
Notifications
You must be signed in to change notification settings - Fork 198
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add more inlines for better efficiency #999
add more inlines for better efficiency #999
Conversation
On my Mac (M1), the numbers look similar. It would be nice to benchmark this also on a simple x86 machine.
|
I likely overdid it with the |
That's a lot of I agree there are definitely measurable places it helps, but it would be nice to check each one's impact versus just an |
I agree. Would this be something that could be merged if I reduce the inlines to the minimum subset for the improvement? However, in general it might make sense to without |
I would feel a lot better about merging ones where you saw measurable improvements |
The smallest subset of operations that improves the performance of point-scalar multiplication (the operation we are most interested in) seems to be just 5 inlines.
|
I also manually checked that removing |
If I add more
inline
s, the performance in most current benchmarks improves on my machine (AMD EPYC 7302). Benchmarks were run asRUSTFLAGS='-C target-cpu=native' cargo bench --features expose-field
.This PR also adds
criterion::black_box()
to the input variables in benchmarks s.t. they are not optimized away.