Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs #362

Closed
wants to merge 26 commits into from

Conversation

nibrunieAtSi5
Copy link
Contributor

@nibrunieAtSi5 nibrunieAtSi5 commented Aug 24, 2023

/!\ This pull request has been moved to the main riscv-isa-manual repository: riscv/riscv-isa-manual#1306

This pull requests draft the changes associated with two fast track extensions for vector crypto.

During the specification process for vector crypto 1.0.0 a few items had to be discarded because they appeared too late in the process. This fast track extension tries to address some of them.

The official demand that will be discussed in the Task Group and submitted to the Unpriv Committee is being drafter here: https://docs.google.com/document/d/1zpYhnZi2NxhjfcBGvPOy0oDhx6lTXchscG17Qcl6wv8/edit?usp=sharing

New features:

  • Zvbc32e: Extending vclmul[h].v[vh] instruction to support SEW=32-bit value
    • should be available standalone (ELEN >= 32) or in addition to Zvbc (ELEN >= 64)
    • no new encoding
  • Zvkgs: Adding .vs variants to vghsh and vghmul
    • should depend on Zvkgs
    • new encodings

Open questions:

  • Should Zvbc32e be allowed when ELEN >= 32 without depending on Zvbc ? (Answer: YES)
  • Should Zvbc32e support SEW=16 ? (SEW=8 ?)
  • Find encodings
  • How to name the two new extensions
  • Do we need to define a Zvkt(bc/bc32e) to extend Zvkt to the extension of vclmul[h/] defined in Zvbc32e ?

Related changes:

Draft versions:

Version pdf
v0.0.1 (August 31st 2023) https://github.com/riscv/riscv-crypto/files/12487628/riscv-crypto-spec-vector-extra.pdf
v0.0.2 (January 17th 2024) https://github.com/riscv/riscv-crypto/files/13970691/riscv-crypto-spec-vector-extra.pdf
v0.0.3 (February 1st 2024) https://github.com/riscv/riscv-crypto/files/14146438/riscv-crypto-spec-vector-extra_v0.0.3.pdf
v0.0.4 (February 6th 2024) riscv-crypto-spec-vector-extra_v0.0.4.pdf
v0.0.5 (March 7th 2024) riscv-crypto-spec-vector-extra_v0.0.5.pdf

|

Original Plan for the fast track schedule

image

References

@kdockser
Copy link
Collaborator

Hi Nicolas.
Are you intending to merge these new instructions into the existing Vector Crypto specification? That cannot be done as Vector Crypto is frozen. You will need to create a different specification for these new extensions.

@nibrunieAtSi5
Copy link
Contributor Author

I was wondering what was the best way to eventually specify the fast track, I think @jjscheel mentioned that some fast tracks were specified as patch / diff over existing specifications.
At this point, this patch should not be seen as something that I intend to merge as is but as a basis for the proposal (describing its content using a diff with the frozen spec under ratification) I will adapt it to a proper patch (on a different directory certainly) if the crypto group agrees with the proposal submission, the committee authorizes the presentation to the ARC and the ARC gives its approval to move forward.

@kdockser
Copy link
Collaborator

My suggestion (which could very well be a bad idea) would be to create a new directory under /riscv-crypto/doc for this extension. That would match what we have done for scalar, vector, and vector-all rounds.

The Zve32 extension should be able to stand on its own. Thus, for example, it could be implemented to just add 32-bit vclmul* vector instructions without the need for any of the Vector Crypto Extensions. Or it could implemented to augment Zvbc. It might only find use in Zve32 implementations, or it might be used to add 32-bit clmul to, for example, V implementations.

The Zvkgs extensions, on the other hand, only make sense as an optional augmentation of the Zvkg extension. One option would be to define this as standalone extension that could optionally be combined with Zvkg as yet another extension. Or, perhaps Zvkgs is the superset. In such a case, I would think that it would reference the Zvkg specification and then define the new variants. I don't think it would be wise to have this extension redefine the existing instructions, even if the exact same words are used - it is best to define each instruction only once.

I hope that helps.

@nibrunieAtSi5
Copy link
Contributor Author

Thank you @kdockser that helps a lot. I will perform the changes before the next TG meeting.

Signed-off-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com>
nibrunieAtSi5 and others added 3 commits August 31, 2023 08:11
Signed-off-by: Nicolas Brunie <82109999+nibrunieAtSi5@users.noreply.github.com>
@nibrunie
Copy link
Contributor

Draft generated August 31st 2023 (version 0.0.1):
riscv-crypto-spec-vector-extra.pdf

Copy link
Contributor

@lianakoleva lianakoleva left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Definitely the next logical step, especially as explicitly outlined as a future direction in the ratified v1.0 spec. Just in case you missed it, I think it would be good to change 128-bit to 64-bit in vclmul.adoc and vclmulh.adoc.

@nibrunie
Copy link
Contributor

Draft generated Jan 17th 2024 (thank you to @lianakoleva for spotting typos)
riscv-crypto-spec-vector-extra.pdf

@nibrunieAtSi5 nibrunieAtSi5 changed the title Zv fast track [extension fast track] extra vector crypto instructions, Zvbc32e/Zvkgs Feb 2, 2024
@nibrunie
Copy link
Contributor

nibrunie commented Feb 2, 2024

Draft generated Feb 1st 2024 (more cleanups and self review): riscv-crypto-spec-vector-extra_v0.0.3.pdf

This document describes the proposed _vector_ _extra_ cryptography
extensions for RISC-V.
Those extensions extend the _vector_ cryptography extensions for RISC-V,
providing extra features not mandatory for a high performace implementation but which
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "not mandatory for a high performance implementation" mean? RISC-V extensions do not specify whether they are mandatory or not. That is the job of separate documents.

Note: performace is misspelled.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was formulated to distinguish the extra vector crypto extension from the vector crypto extension itself.
This set of instructions is considered to be an addition to the current vector crypto spec
The vector crypto spec is the main extension implementers and users may want to adopt to get good performance for the supported cryptographic primitives.
These new extensions are additional improvements to the base vector crypto extensions.

I will rephrase this and fix the typo.

extensions for RISC-V.
Those extensions extend the _vector_ cryptography extensions for RISC-V,
providing extra features not mandatory for a high performace implementation but which
can help further improve the efficiency of the algorithms that use them.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the "them" in "algorithms that use them"? The new extensions or the existing ones?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was suppose to be "extra features" so "new extensions".

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, without the extensions there are no algorithms that use them, so that phrasing is a bit strange. Maybe list some of the specific algorithms that are intended to be improved?

[[zvkgs,Zvkgs]]
=== `Zvkgs` - Vector-Scalar GCM/GMAC

`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

", it" => ". It "

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.
`Zvkgs` depends on `Zvkg`. It extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.

[colophon]
= Colophon

This document describes the Vector Cryptography Extra extensions to the
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a better name for this? What name will be used for the next set of "extra" extensions? What is the difference between an extra extension and a non-extra extension?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zfa is using "additional" which may be better here (https://github.com/riscv/riscv-isa-manual/blob/main/src/zfa.adoc).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That document is titled ""Zfa" Standard Extension for Additional Floating-Point Instructions, Version 1.0". That's less ambiguous because it includes the actual extension code Zfa, which won't be reused by any future extension. This document just refers to "Vector Cryptography Extra extensions" as if there will never be any more. Maybe include the actual extension codes in the title?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I will update the title to manage room for future "extra"/additional extensions.

It is important to note that the Vector Crypto instructions are independent of the
implementation of the `Zkt` extension and do not require that `Zkt` is implemented.

//This specification includes a <<Zvkt>> extension that, when implemented, requires certain vector instructions
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why doesn't Zvkt require that these new instructions be constant-time?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zvkt will require that these new extensions be constant time, I need to clarify this.
More precisely, Similarly to Zvkg which mandates

To help avoid side-channel timing attacks, these instructions shall be implemented with data-independent timing.

Zvkgs should inherit the same mandate and require constant-time even when Zvkt is not implemented.

Constant time for Zvbc32e will depend on Zvkt.

@@ -0,0 +1,23 @@
[[zvbc32e,Zvbc32e]]
=== `Zvbc32e` - Vector Carryless Multiplication
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe "32-bit Vector Carryless Multiplication"?

Also, are there specific use cases in mind where this feature would be useful? #309 mentioned CRC-32, but I'm not sure why that would be true. CRC-32 implementations that use "folding" with carryless multiplication multiply segments of the data by a 32-bit multiplicand, but those segments of data can be any length. Other CPUs (and the currently ratified Zvbc extension) provide 64-bit carryless multiplication, so CRC-32 implementations typically multiply 64 bits of data by a 32-bit multiplier to get a 95-bit result. You could multiply 32 bits of data by a 32-bit multiplier to get a 63-bit result, but that would result in twice as many multiplications being needed to compute the CRC-32, as each one would process half as much data. That probably wouldn't be faster.

A similar observation applies to e.g. CRC-16 where the multiplications are typically 64-bit by 16-bit.

BTW, if adding 32-bit, why not also add 16-bit and 8-bit? What is special about 32?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those are very good points.
There are multiple considerations at play here.

  • 32-bit CLM allows vector implementation with smaller ELEN=32 to implement it (which was not possible for Zvbc)
  • 64-bit or 32-bit CLM for folding CRC does not make a performance difference as far as I can tell but should make a power difference. It is true that in folding CRC the data segment block can be of arbitrary length but the constant multiplicand is of the size of the CRC so in the case of a CRC-32 at least half the 64-bit CLM would be unused (which might not be an issue on all micro-arch, but 32-bit CRC should allow easier determination of expected multiplier activity).

And finally there is nothing special about 32-bit. It was initially introduced because of the ELEN=32 case, but we listed in the questions above

Should Zvbc32e support SEW=16 ? (SEW=8 ?)

I think there is a good case for supporting other SEW:

  • no need encoding required
  • should allow other use cases

If we want to go that route the question is:

  • Should we split different SEW support into different sub-extensions (Zvbc16e, Zvbc8e) so people can pick and chose ? I think this might not be such a good idea because I don't expect supporting smaller element width to have a big cost (although I did not measure it yet) and the added complexity of numerous extensions does not seem worth it.

(One of the option which was discarded for the fast track was suggesting a widening CLM to merge vclmul and vclmulh together, but it seems too big of a change to fit into a fast track and the potential impact of having to switch SEW back and forth was not yet properly evaluated on the target use cases).


`Zvkgs` depends on `Zvkg`, it extends the existing `vghsh.vv` and `vgmul.vv` instructions with new vector-scalar variants: `vghsh.vs` and `vgmul.vs`.

Instructions to enable the efficient implementation of parallel versions of GHASH~H~ which is used in Galois/Counter Mode (GCM) and
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a complete sentence.

More importantly, it's not clear to me how these vs instructions for GHASH are helpful, because parallelized GHASH requires multiplying by powers of the key (like H, H^2, H^3, H^4, ...). Does that not preclude the use of these vs instructions?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are multiple different implementations of parallel GHASH.
One implementation on 4 block in paralell uses a vector of 4 constants H^4 in the loop body which could leverage the .vs. It still needs a reduction with H, H^2, H^3 in the loop epilog.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I rephrased things a bit (in the upcoming v0.0.4). The introduction sentence is re-used from https://github.com/riscv/riscv-crypto/blob/main/doc/vector/riscv-crypto-vector-zvkg.adoc.

I need to work on an example of use of these instructions.

@nibrunie
Copy link
Contributor

nibrunie commented Feb 7, 2024

Draft generated Feb 6th 2024:
riscv-crypto-spec-vector-extra_v0.0.4.pdf

Changelog:

@nibrunieAtSi5
Copy link
Contributor Author

I did some experiments around a multi-width vector carry-less multiply unit.
This unit has a 128-bit datapath and support high or low operations.
I tested several variants, with one or more element widths supported. The unit provides 128 / SEW results (e.g. 2 results for 64-bit vclmul or vclmulh, 4 results for 32-bit, ...).

Synthesis were all done at the same frequency / corner / tech node.
The max number of logic levels is 17 for the 8/16/32/64 variant.

Here are the normalized results:
image

  • adding 32-bit support costs about 14% extra area for the vector carry-less multiplier (combinational area increase)
  • adding 16 and 32-bit support costs about 25% extra area (+9 points from adding 32-bit only)
  • adding 8, 16 and 32-bit support costs about 32% extra area

@nibrunie
Copy link
Contributor

nibrunie commented Mar 8, 2024

Draft generated March 7th 2024 (v0.0.5)
riscv-crypto-spec-vector-extra_v0.0.5.pdf

Changelog:

  • Numerous typo fixes and corrections
  • Adding RVV encoding array (with Zvbc32e in bold)

@wmat
Copy link
Contributor

wmat commented Mar 22, 2024

Note that this PR will need to be applied against the integrated chapter in the riscv-isa-manual repository if it is still relevant. This repository will be made read only and archived.

@nibrunieAtSi5
Copy link
Contributor Author

Note that this PR will need to be applied against the integrated chapter in the riscv-isa-manual repository if it is still relevant. This repository will be made read only and archived.

Ok, I will transition to a PR on the main repo.

@nibrunieAtSi5
Copy link
Contributor Author

Moving to the main riscv-isa-manual repository: riscv/riscv-isa-manual#1306

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants