-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement alternative semi-fixslice #195
Conversation
- under cfg feature 'semi_fixslice' - reduces code size at small cost to performance
I originally translated this as well but got rid of it as I wasn't really sure the meager code size benefits were actually worth it (or how much code size it actually saves you in compiled Rust) |
inv_shift_rows_1(&mut rkeys[i..(i + 8)]); | ||
inv_shift_rows_2(&mut rkeys[(i + 8)..(i + 16)]); | ||
inv_shift_rows_3(&mut rkeys[(i + 16)..(i + 24)]); | ||
#[cfg(feature = "semi_fixslice")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't look like you have a semi_fixslice
feature declared in Cargo.toml?
I'm also not sure how I feel about a "feature" making things slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, wasn't aware that was a thing. I can add it. As to making things slower, there is the already-existing example of the "no_unroll" feature (in the kuznyechik implementation) that makes the code smaller and slower.
Is it possible in rust to have a public feature like "small_code" that is translated internally in each algorithm to whatever local features make the code smaller?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish there were better way to hint at these sort of things in Rust, but unfortunately there aren't AFAIK.
The problem with using features to degrade performance is it no longer becomes possible for a dependency to request the faster implementation, since feature unification will mean anything that requests the smaller implementation overrides it.
Going by the size of the rlib files in target/release/deps, semi_fixslice reduces code size by about 15%. The performance difference is tiny on my x86_64 machine; an educated guess based on the paper and the code is that it would be ~4% slower on constrained 32-bit devices (even smaller performance loss on arm because of the barrel shifter). |
It might be worth it, but it comes at a cost of more possible codepaths to test and upkeep/maintenance. I think there's also a lot bigger wins to be made in the short term here, like eliminating the decryption code for AES-CTR use. Speaking of codepaths to test, can you edit |
Alternative, slightly simpler variant described in the paper. I'm not sure if this is the ideal way to present the alternative, but it gets working code in place.