-
Notifications
You must be signed in to change notification settings - Fork 135
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial rc5 impl #346
initial rc5 impl #346
Conversation
rc5/src/block_cipher.rs
Outdated
use cipher::{impl_simple_block_encdec, AlgorithmName, KeyInit}; | ||
use cipher::{inout::InOut, Block, BlockCipher, KeySizeUser}; | ||
|
||
pub struct RC5_32_20_16 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The choice of 20 rounds is not obvious. Both OpenSSL and RFC 2040 default to 12 rounds.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from the: https://www.ietf.org/archive/id/draft-krovetz-rc6-rc5-vectors-00.txt
" The original RC5 publication suggested using 12 rounds when w=32 and
16 rounds when w=64. In response to cryptanalysis, the authors
changed the recommendation when w=32 to 16 rounds [RC5sec]."
I'm not sure what should be the defaults. I have no problem changing it to 12. But where should I find the test vectors?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure how to proceed. Should i expose RC5 32, 12, 16 or 32, 16, 16? I implemented RC5 32, 20, 16 since that is the only w=32 test vector i found already generated in the krovetz draft. I suppose it is fine to use the code given there to generate others.
A possible solution would be to implement cipher api for all posible w/r/b. Then we expose some recomended / common configurations through type aliases. The issue I see is covering all the possible configurations with tests. Would it could be okay to test only the recomended configurations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exposing them as generic parameters makes sense to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll tweak the code to do that. I don't see how to escape repeating the bounds for every impl. The code wont be pretty :D And I still have to add bounds to limit the range of rounds and key bytes.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the generic implementation, but the compiler seems to strugle with optimizing it. I notice a 3x penalty in performance compared to the previous version.
name control ns/iter generic ns/iter diff ns/iter diff % speedup
rc5_32_12_16_decrypt_block 14,611 (560 MB/s) 63,711 (128 MB/s) 49,100 336.05% x 0.23
rc5_32_12_16_decrypt_blocks 14,659 (558 MB/s) 58,842 (139 MB/s) 44,183 301.41% x 0.25
rc5_32_12_16_encrypt_block 11,242 (728 MB/s) 51,916 (157 MB/s) 40,674 361.80% x 0.22
rc5_32_12_16_encrypt_blocks 11,467 (714 MB/s) 51,360 (159 MB/s) 39,893 347.89% x 0.22
rc5_32_16_16_decrypt_block 19,621 (417 MB/s) 77,881 (105 MB/s) 58,260 296.93% x 0.25
rc5_32_16_16_decrypt_blocks 19,800 (413 MB/s) 76,284 (107 MB/s) 56,484 285.27% x 0.26
rc5_32_16_16_encrypt_block 17,708 (462 MB/s) 81,863 (100 MB/s) 64,155 362.29% x 0.22
rc5_32_16_16_encrypt_blocks 17,704 (462 MB/s) 75,664 (108 MB/s) 57,960 327.38% x 0.23
Any idea why this might be?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've used cargo-show-asm to generate the asm of the bench functions in order to provide some clarity on the performance regression.
I've replaced the bench with something easier to generate asm from and looked at the asm from fn encrypt_
#![feature(test)]
extern crate test;
use cipher::generic_array::GenericArray;
use cipher::typenum::U8;
use cipher::BlockEncryptMut;
use cipher::{block_decryptor_bench, block_encryptor_bench};
use rc5::{RC5_32_12_16, RC5_32_16_16};
#[bench]
pub fn encrypt(bh: &mut test::Bencher) {
use cipher::KeyInit;
let key = test::black_box(Default::default());
let mut cipher = RC5_32_12_16::new(&key);
let mut block: GenericArray<u8, U8> = test::black_box(Default::default());
bh.iter(|| {
encrypt_(&mut cipher, &mut block);
});
bh.bytes = (block.len()) as u64;
}
#[inline(never)]
pub fn encrypt_(cipher: &mut RC5_32_12_16, block: &mut GenericArray<u8, U8>) {
cipher.encrypt_block_mut(block);
}
The asm for the non generic code just calls the fn encrypt_block
. This function contains the call to
the generic version of fn encrypt, but with fixed generic parameters. The asm for the less performant generic version inlines the entire encrypt algorithm.
I've managed to reproduce the same performance regresion in the non generic code by adding #[inline(always)]
to the fn encrypt_block
.
It is unclear for me why this happens or how to fix it. Should this pr be turned into a draft until I find a solution or should the non generic version be submitted in the meantime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Upon closer inspection i found that the generic version would not inline calls to word functions, while the non generic one would. After adding inline annotations to Word impl the performance is once again comparable.
after 5b84ad4: generic rc5 pefromance regression was noticed. after analizing the generated asm I noticed that while the generic top level generic impl does get inlined the actual algorithm impl differs in the fact that it does not inline Word functions. After adding inline annotations performance is once again compareable.
Can we get this merged and released? Is it usable @antonio-dropulic? I'd very much like to use it, and would be happy to do from your branch until it gets merged. |
Implemented RC5 based on the revised paper. Got test vector at: https://www.ietf.org/archive/id/draft-krovetz-rc6-rc5-vectors-00.txt
Core impl features a generic impl using typenum and generic array. Const generic implementation
without generic expressions doesn't seem appropriate.
I was not sure what api would work best with the cipher crate so i decided to propagate the InOut ptr.
Put all the core fn behind a trait to avoid rewriting generic bounds.
For now only 32, 20, 20 parametrization made public.