-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tristan/war 540 sylow initial performance optimization #27
Tristan/war 540 sylow initial performance optimization #27
Conversation
WAR-540 Sylow initial performance optimization
Address the initial findings of WAR-528 and fix:
|
The big changes in this are the following:
the results of this pr are the following benchmarks: generating a signature from a floating point to a signed Warlock price takes 670 us (compared to 750 us before), and executing an entire pairing operation now takes 8.3 ms (compared to a massive 15 ms before). The State-of-the-art (speed wise) is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice.
@@ -72,7 +72,7 @@ impl<'a, 'b, const D: usize, const N: usize, F: FieldExtensionTrait<D, N>> | |||
Add<&'b FieldExtension<D, N, F>> for &'a FieldExtension<D, N, F> | |||
{ | |||
type Output = FieldExtension<D, N, F>; | |||
|
|||
#[inline] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, interesting, didn't know this existed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah its not exactly what you might expect from inline
in c++. in debug mode, its basically totally ignore by the compiler, and in release mode, the compiler will codegen an #[inline] function into every single referencing codegen unit, and then it will also add inlinehint. This means that if you have 16 CGUs and they all reference an item, every single one is getting the entire item's implementation inlined into it, which isn't exactly the same as c++
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
an excellent overview is given here rust-lang/hashbrown#119
let tmp = (c0_squared | ||
- (c1_squared * <Fp as FieldExtensionTrait<1, 1>>::quadratic_non_residue())) | ||
.inv(); | ||
let tmp = (c0_squared - (c1_squared * FP_QUADRATIC_NON_RESIDUE)).inv(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could insert a trace here.
#![allow( | ||
clippy::needless_doctest_main, | ||
clippy::doc_lazy_continuation, | ||
clippy::too_long_first_doc_paragraph |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Heh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
a bit ridiculous imo, but a necessary evil for now lol
@@ -102,6 +102,7 @@ impl MillerLoopResult { | |||
/// This implements algorithm 9 from https://eprint.iacr.org/2010/354.pdf, with the notable | |||
/// difference that instead of passing an element of Fp4 (which I did not implement), we pass | |||
/// in only the two components from Fp2 that comprise the Fp4 element. | |||
#[must_use] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the rationale for these? As these are calculations one would assume they wouldn't simply be discarded by client code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the#[must_use]
s are mainly for developer assistance since its a nested function call, which requires that the inputs and outputs to those nested functions are all explicitly handled. for instance, the Result
type is a must_use
to force the user to handle the case where nothing is returned. it's mainly to handhold devs to use those functions right, but not strictly necessary
@@ -202,7 +204,7 @@ impl<const D: usize, const N: usize, F: FieldExtensionTrait<D, N>> GroupAffine<D | |||
infinity: Choice::from(1u8), | |||
} | |||
} | |||
|
|||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rationale for inline vs inline(always)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a micro-optimization, basically sometimes the comiler will choose to not actually inline things, so #[inline]
is really a suggestion to the compiler, and #[inline(always)]
is a strong suggestion to the compiler to essentially tell it I know more than you, which you should really only ever do for non-generic, very small, nearly trivial things, or unless you're confident you knoww hat you're doing, since otherwise small trivial functions just won't get inlined due to the lack of apparent improvement in code execution compared to the increase in compile time estimated by the compiler for that function
This addresses the issues presented by #24, and makes substrantial improvements to pairing speeds, now clocking in at 8 ms per pairing, as opposed to 15 ms before.