Skip to content
This repository has been archived by the owner on Nov 15, 2023. It is now read-only.

VRF refactory #13889

Merged
Merged

Conversation

davxy
Copy link
Member

@davxy davxy commented Apr 12, 2023

For SASSAFRAS we need to introduce a new VRF primitive.

This new primitive:

  • doesn't depend on schnorrkel
  • Currently it uses merlin as vrf transcript backend but in the future will not depend on it (cc @burdges).

Right now schnorrkel and merlin dependencies are too much exposed all over the project.

We wish to bury them in the sp-core crate to allow better coexistence with other VRF implementations and future changes of backends.


In this PR:

  • merlin and schnorrkel are confined to sp-core. Their direct usage has been removed from client, frame and other primitive crates (e.g. keystore).
  • Two new traits are introduced in crypto module:
    • VrfSigner: for types capable of producing a vrf signature (implemented by sr25519::Pair)
    • VrfVerifier: for types capable of verify a vrf signature (implemented by sr25519::Public)
  • sp-consensus-vrf crate and sp-keystore::vrf::schnorrkel module were removed. We rely on vrf types defined by sp-core::crypto
  • VrfProof and VrfPreOut are kept together as VrfSignature in the BABE digests. This simplifies a bit the code and prevents some copies. Because the objects order is preserved, serialization using SCALE is backward compatible

Further work:

  • In a follow up (not very urgent) I'm going to change the naming of VrfOutput in the vrf signature to something like VrfState or (as Jeff suggests) VrfPreOut. Because this is what it is. Technically the VRF output is what is returned by the make_bytes function called for example to get babe randomness or the slot lottery value.

polkadot companion: paritytech/polkadot#7063

@davxy davxy requested a review from andresilva as a code owner April 12, 2023 07:55
@davxy davxy marked this pull request as draft April 12, 2023 09:38
@burdges
Copy link

burdges commented Apr 12, 2023

We should rename sp_core::sr25519::vrf::VrfOutput to VrfPreOut or similar.

At a high level, these transcript flavored hashers like merlin or now ark-transcript are safer interfaces than regular hash functions, largely by making it hard to fuck up internal domain separation. A priori I'd prefer if they were used more, not encapsulated more deeply. Anything opinionated brings messy-make-work-arounds though, so not really sure the best course there.

I suspect VrfTranscript existed for the remote signer, but afaik now serves no real purpose. A remote signer which signs whatever you ask achieves little anyways, so maybe easier to use merlin directly, which saves building the Vec. Also VRFs access the secret key more than once, and run their own logic in between, so remote signers become really messy for them. We always use merlin directly in parachains code in polkadot, which we decided does not benefit from a remote signer anyways..

We've superficial similarities between schnorrkel types and ring vrf types, but they've no common ground under the hood: merlin vs ark-transcript, dalek type wrappers vs arkworks type wrappers, etc.

As an aside, a VrfSignature is logically a VrfProof along with some VrfPreOuts, attached to a multi-part message that determined the inputs associated to each VrfPreOut and any extra message being signed. The reason why VrfPreOut stays detached from the VrfProof in schnorrkel is simply that I do not know the number or order of construction for the VrfPreOuts. If you're downstream then you can make these decisions, ala https://github.com/w3f/ring-vrf/blob/master/bandersnatch_vrfs/src/lib.rs#L136 or https://github.com/w3f/ring-vrf/blob/master/bandersnatch_vrfs/src/lib.rs#L161 or https://github.com/w3f/ring-vrf/blob/master/nugget_bls/src/lib.rs#L160 or https://github.com/w3f/ring-vrf/blob/master/nugget_bls/src/lib.rs#L132-L140

@davxy davxy added A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit labels Apr 12, 2023
@davxy davxy self-assigned this Apr 12, 2023
@davxy davxy marked this pull request as ready for review April 12, 2023 14:04
@paritytech paritytech deleted a comment from paritytech-cicd-pr Apr 12, 2023
@paritytech paritytech deleted a comment from paritytech-cicd-pr Apr 13, 2023
@davxy
Copy link
Member Author

davxy commented Apr 13, 2023

We should rename sp_core::sr25519::vrf::VrfOutput to VrfPreOut or similar.

As I wrote in the PR description I definitely will. But I would prefer doing that in a follow-up PR do don't mess-up too much stuff in a single PR.


A priori I'd prefer if they were used more, not encapsulated more deeply. Anything opinionated brings messy-make-work-arounds though, so not really sure the best course there.

IMO exposing crypto primitives libraries interfaces in a framework is not a good idea and almost always introduces some form of technical debt in the long term. May be good for quick prototyping but as a best engineering practice is better to keep the dependencies as buried as possible.


We always use merlin directly in parachains code in polkadot,

This (again IMO) doesn't justify adopting the same strategy in a framework.
Indeed (again IMO 🤣) I would prefer an application that when uses a framework type does rely on the framework interfaces and not force the framework implementor to keep the newtypes inner types public (as here)


We've superficial similarities between schnorrkel types and ring vrf types, but they've no common ground under the hood: merlin vs ark-transcript, dalek type wrappers vs arkworks type wrappers, etc.

I had a quick look at the ark-transcript transcript structures.
If I got it right here the domain separation between the commitments is done using the lengths of the committed messages (before and after) instead of having to pass a label for each added message.

This indeed make the type diverge from the one used by merlin and thus my initial idea of having a unique shared Transcript type doesn't fit very well. So I agree with you.
I changed the code to have a dedicated Transcript associated type in the VrfSigner and VrfVerifier traits.
Now sr25519 just uses a merlin::Transcript newtype


As an aside, a VrfSignature is logically a VrfProof along with some VrfPreOuts, ...

I agree


Once this goes upstream I start the integration of bandersnatch-vrf

@davxy davxy requested a review from a team April 13, 2023 10:33
@burdges
Copy link

burdges commented Apr 13, 2023

If I got it right here the domain separation between the commitments is done using the lengths of the committed messages (before and after) instead of having to pass a label for each added message.

ark-transcript commits lengths after the bytes, not before. It's literally just shake128 with postfix lengths in big endian u32s. This makes it possible to stream bytes into the transcript, and only learn how much you streamed at the end. This is necessary to maximize compatibility with arkworks' polymorphism and serialization.

We've no reason for an additional VrfTranscript abstracted type when using ark-transcript because I added an accumulation mode in w3f/ring-vrf@2ae00a1

If you need to pass a partial transcript to a remote signer or whatever, then you accumulate the transcript in a Vec<u8>, pass the Vec<u8>, and then start the real transcript on the signing end. This does not break the domain separation, although maybe you want to commit an application label to the protocol on the remote end.

Copy link
Contributor

@koute koute left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice cleanups!

client/consensus/babe/src/verification.rs Outdated Show resolved Hide resolved
Comment on lines +1119 to +1120
/// Verify input data signature.
fn vrf_verify(&self, data: &Self::VrfInput, signature: &Self::VrfSignature) -> bool;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
/// Verify input data signature.
fn vrf_verify(&self, data: &Self::VrfInput, signature: &Self::VrfSignature) -> bool;
/// Verify input data signature.
#[must_use]
fn vrf_verify(&self, data: &Self::VrfInput, signature: &Self::VrfSignature) -> bool;

Comment on lines 565 to 571
impl Deref for VrfTranscript {
type Target = merlin::Transcript;

fn deref(&self) -> &Self::Target {
&self.0
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to get rid of merlin wouldn't it be better not to expose its type here and just add necessary wrappers in impl VrfTranscript for whatever is accessed through this deref?

(Also similar for other Derefs which are going to leak the underlying type.)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is planned but I prefer doing it in a follow up since it affects too much stuff on polkadot

@davxy davxy requested a review from a team April 18, 2023 07:41
Co-authored-by: Koute <koute@users.noreply.github.com>
Comment on lines +704 to +715
MuSigInconsistent { musig_stage: Commitment, duplicate: true } =>
"Signature error: `MuSigInconsistent` at stage `Commitment` on duplicate".into(),
MuSigInconsistent { musig_stage: Commitment, duplicate: false } =>
"Signature error: `MuSigInconsistent` at stage `Commitment` on not duplicate".into(),
MuSigInconsistent { musig_stage: Reveal, duplicate: true } =>
"Signature error: `MuSigInconsistent` at stage `Reveal` on duplicate".into(),
MuSigInconsistent { musig_stage: Reveal, duplicate: false } =>
"Signature error: `MuSigInconsistent` at stage `Reveal` on not duplicate".into(),
MuSigInconsistent { musig_stage: Cosignature, duplicate: true } =>
"Signature error: `MuSigInconsistent` at stage `Cosignature` on duplicate".into(),
MuSigInconsistent { musig_stage: Cosignature, duplicate: false } =>
"Signature error: `MuSigInconsistent` at stage `Cosignature` on not duplicate"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Do we need to hand-write all combinations here? Could we use format! or sth similar?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The thing is that we are using some functions returning schnorrkel errors in no_std environments (e.g. make_bytes). And the schnorrkel::Error to_string will return a String.

Even though is possible to use String in no_std (because is exposed there via some other crates...) , I'm not sure we want to do that 🤔 . In general I see that frame pallets avoid using directly Strings.

primitives/core/src/sr25519.rs Outdated Show resolved Hide resolved
Comment on lines 574 to 580
#[derive(Clone, Debug, PartialEq, Eq, Encode, Decode, MaxEncodedLen, TypeInfo)]
pub struct VrfSignature {
/// The initial VRF configuration
pub output: VrfOutput,
/// The calculated VRF proof
pub proof: VrfProof,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we put some comment that this structure is part of header digest and its layout needs to be backward compatible?
It is not obvious (or my understanding is not correct).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

at this level (i.e. in the sr25519 module) this is not defined to be part of something. Is just the output of the vrf_sign primitive.

The fact that babe is going to serialize it and requires a precise order is something that will be eventually catched by some test.

Indeed the same argument applies if we store them separated since babe relies on a VrfProof encoding that comes from outside (i.e. the order is decided by the to_bytes() method of the VrfProof.).

So probably (for these structures that are supposed to be serialized to the wire) each protocol (as babe) should define some tests vectors to check for encoding assumptions, thus to double check that nothing is broken in the external dep.

I'm going to add some test vectors. Ty for the hint

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, some explicit test makes more sense.

client/consensus/babe/src/lib.rs Show resolved Hide resolved
@michalkucharczyk michalkucharczyk requested a review from a team April 18, 2023 09:21
pub use schnorrkel::vrf::VRF_OUTPUT_LENGTH;
use schnorrkel::{errors::MultiSignatureStage, vrf::VRF_PROOF_LENGTH, SignatureError};
use schnorrkel::{
errors::MultiSignatureStage,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we're going to deprecate the current multi-sig there btw, but maybe that does not help you right now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know that schnorrkel 0.11 is coming soon. As soon as it is released there will be a follow up PR with renaming of VrfOutput -> VrfPreOut, and all not required stuff removal.

This PR is mostly to prepare the soil for a painless introduction of bandersnatch-vrf

@davxy
Copy link
Member Author

davxy commented Apr 18, 2023

bot merge

@paritytech-processbot
Copy link

Error: "Check reviews" status is not passing for paritytech/polkadot#7063

@davxy
Copy link
Member Author

davxy commented Apr 19, 2023

bot merge

@paritytech paritytech deleted a comment from paritytech-processbot bot Apr 19, 2023
@paritytech paritytech deleted a comment from paritytech-processbot bot Apr 19, 2023
@davxy
Copy link
Member Author

davxy commented Apr 19, 2023

bot merge

@paritytech paritytech deleted a comment from paritytech-processbot bot Apr 19, 2023
@davxy
Copy link
Member Author

davxy commented Apr 19, 2023

bot merge

@paritytech-processbot paritytech-processbot bot merged commit 37f3d58 into paritytech:master Apr 19, 2023
/// VRF Verifier.
pub trait VrfVerifier: VrfCrypto {
/// Verify input data signature.
fn vrf_verify(&self, data: &Self::VrfInput, signature: &Self::VrfSignature) -> bool;
Copy link

@burdges burdges Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the general type signature of an EC VRF verify method. An extra message is signed too in general. There may also be multiple input-output pairs. Also these pairs should usually be returned by the verify routine. Also, crypto verifiers should be #[must_use] which usually suggests they return Result not bool.

pub output: VrfOutput,
/// The calculated VRF proof
pub proof: VrfProof,
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea to consolidate the VrfPreOut(s) inside the VrfSignature like this, but realize there might be more than one, as happens in sassafras.

/// VRF Signer.
pub trait VrfSigner: VrfCrypto {
/// Sign input data.
fn vrf_sign(&self, data: &Self::VrfInput) -> Self::VrfSignature;
Copy link

@burdges burdges Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not possible to know if you should sign or not given this interface. It slows down singing somewhat too.

@davxy davxy deleted the davxy-sr25519-vrf-refactory branch May 4, 2023 14:25
nathanwhit pushed a commit to nathanwhit/substrate that referenced this pull request Jul 19, 2023
* First iteration to encapsulate schnorrkel and merlin usage

* Remove schnorkel direct dependency from BABE pallet

* Remove schnorrkel direct dependency from BABE client

* Trivial renaming for VrfTranscript data and value

* Better errors

* Expose a function to get a schnorrkel friendly transcript

* Keep the vrf signature stuff together (preventing some clones around)

* Fix tests

* Remove vrf agnostic transcript and define it as an associated type for VrfSigner and VrfVerifier

* Fix babe pallet mock

* Inner types are required to be public for polkadot

* Update client/consensus/babe/src/verification.rs

Co-authored-by: Koute <koute@users.noreply.github.com>

* Nit

* Remove Deref implementations

* make_bytes as a method

* Trigger CI

---------

Co-authored-by: Koute <koute@users.noreply.github.com>
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A0-please_review Pull request needs code review. B0-silent Changes should not be mentioned in any release notes C1-low PR touches the given topic and has a low impact on builders. D3-trivial 🧸 PR contains trivial changes in a runtime directory that do not require an audit
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants