Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement AES backend for riscv using crypto extensions #399

Closed
wants to merge 1 commit into from
Closed

Implement AES backend for riscv using crypto extensions #399

wants to merge 1 commit into from

Conversation

silvanshade
Copy link
Contributor

@silvanshade silvanshade commented Jan 21, 2024

Continued from #397.

This PR implements AES for riscv64 using the scalar and vector crypto extensions.

The scalar implementation is complete, sans hazmat support.

The vector implementation is complete, sans hazmat support. I plan to make some further refinements and optimizations though.

Currently there's no easy way to auto-detect the RISC-V features so instead the implementation selection relies on the appropriate target-features being enabled (plus a cfg flags called target_feature_zvkned) since that is apparently too new to have it's own Rust target-feature.

For the vector implementation, I resorted to using the global_asm!. There are a few reasons for behind this:

First, Rust doesn't even have a way to represent scalable vectors yet so there's no clear way how to even encode the appropriate function signatures for this. It might be possible to do it somehow through an encoding using opaque types and a C-FFI API wrapping the C intrinsics, but that would be difficult just for this relatively limited use case.

There is some progress toward addressing this situation though: rust-lang/rust#118917

Secondly, there seem to be some problems with using certain RISC-V features with inline asm!. It may be related to this issue, which also affects global_asm!, but in the case of asm! the errors actually cause compilation to fail (whereas for global_asm! they just add noise, but can be silenced with the .architecture attribute). The feature in question is zvkned, and it may be that this causes a problem because Rust doesn't even expose this feature yet (which is why I'm using the cfg setting).

In any case, things work fine with global_asm!.

I don't have any real hardware that implements these extensions, so I don't have any useful information about benchmarks. If anyone has access to hardware that can run this it would be great to get some feedback about that (or any other issues).

One last thing to note, I think it might be worth considering extending the cipher API to account for VLA-style implementations.

For example, consider the vector implementation for AES-128:

fn encrypt_vla(keys: &RoundKeys<11>, mut data: InOut<'_, '_, Block>, blocks: usize) {
    let dst = data.get_out().as_mut_ptr();
    let src = data.get_in().as_ptr();
    let len = blocks * 16;
    let key = keys.as_ptr().cast::<u32>();
    unsafe { aes_riscv_rv64_vector_encdec_aes128_encrypt(dst, src, len, key) };
}

fn encrypt1(keys: &RoundKeys<11>, mut data: InOut<'_, '_, Block>) {
    let data = unsafe {
        InOut::from_raw(
            data.get_in().as_ptr().cast::<Block>(),
            data.get_out().as_mut_ptr().cast::<Block>(),
        )
    };
    encrypt_vla(keys, data, 1)
}

fn encrypt8(keys: &RoundKeys<11>, mut data: InOut<'_, '_, Block8>) {
    let data = unsafe {
        InOut::from_raw(
            data.get_in().as_ptr().cast::<Block>(),
            data.get_out().as_mut_ptr().cast::<Block>(),
        )
    };
    encrypt_vla(keys, data, 8)
}

Here you can see that the encrypt_vla function is the general case, which is N-ary (per blocks). The 1-block and 8-block cases are just special cases of this and entirely redundant.

Generally you would want to pass as many blocks as possible to encrypt_vla, which might be substantially larger than 8, depending on the size of the vector registers for the specific RISC-V platform.

What might be an appropriate API then would be to add a method to one of the traits which would query the implementation for a "block parallelism hint", which would indicate how many blocks would be optimal to pass to the codec. This hint method would in turn execute some of the RVV instructions to calculate what is the (current) maximum available length for processing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant