Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Design Discussion] Zk Accel API via trait object [skip ci - Don't merge] #13

Draft
wants to merge 2 commits into
base: taiko/unstable
Choose a base branch
from

Conversation

mratsim
Copy link

@mratsim mratsim commented Dec 18, 2023

This draft PR should not be merged, it's for laying out design tradeoffs of ZAL in Halo2.
see #12 for the generics approach

This uses:

  • An extra dyn trait object to add any backend supporting MsmAccel<C: CurveAffine> to Halo2
  • An extra lifetime parameter to represent the ZalEngine lifetime.

Zal: https://github.com/taikoxyz/halo2curves/blob/zal-on-0.3.2/src/zal.rs

// The ZK Accel Layer API
// ---------------------------------------------------

pub trait ZalEngine: Debug {}

pub trait MsmAccel<C: CurveAffine>: ZalEngine {
    fn msm(&self, coeffs: &[C::Scalar], base: &[C]) -> C::Curve;
}

// ZAL using Halo2curves as a backend
// ---------------------------------------------------

#[derive(Debug)]
pub struct H2cEngine;

impl H2cEngine {
    pub fn new() -> Self {
        Self {}
    }
}

impl ZalEngine for H2cEngine {}

impl<C: CurveAffine> MsmAccel<C> for H2cEngine {
    fn msm(&self, coeffs: &[C::Scalar], bases: &[C]) -> C::Curve {
        #[allow(deprecated)]
        best_multiexp(coeffs, bases)
    }
}

impl Default for H2cEngine {
    fn default() -> Self {
        Self::new()
    }
}

Issues:

  • reasonably noisy, there is an extra 'zal lifetime parameter on some traits, but at a high level it gets merged into the 'params lifetime.
  • ZalEngine does not implement Send + Sync and some high-level code in shplonk requires it for Plonk permutations.

cc @einar-taiko

einar-taiko and others added 2 commits December 13, 2023 11:23
Deprecate pre-ZAL API

Insert patch in `Cargo.toml` for `../halo2curves`
.map(|commitment| {
let evals: Vec<F> = rotations_vec
.par_iter()
// .par_iter()
.iter()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requires Send + Sync but having ZalEngine being part of Commitment prevents that

@@ -192,7 +192,8 @@ where
let v: ChallengeV<_> = transcript.squeeze_challenge_scalar();

let quotient_polynomials = rotation_sets
.par_iter()
// .par_iter()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

@@ -62,7 +62,7 @@ where
mut msm_accumulator: DualMSM<'params, E>,
) -> Result<Self::Guard, Error>
where
I: IntoIterator<Item = VerifierQuery<'com, E::G1Affine, MSMKZG<E>>> + Clone,
I: IntoIterator<Item = VerifierQuery<'com, E::G1Affine, MSMKZG<'params, E>>> + Clone,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler complains here:

image

but changing to 'com or adding a constraint 'params: 'com then creates a not enough constrained lifetime.

@@ -58,7 +58,7 @@ where
mut msm_accumulator: DualMSM<'params, E>,
) -> Result<Self::Guard, Error>
where
I: IntoIterator<Item = VerifierQuery<'com, E::G1Affine, MSMKZG<E>>> + Clone,
I: IntoIterator<Item = VerifierQuery<'com, E::G1Affine, MSMKZG<'params, E>>> + Clone,
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same issue as shplonk
image

@mratsim
Copy link
Author

mratsim commented Dec 18, 2023

While the 'com and 'params / 'zal lifetime issue can probably be solved,
solving the Send = Sync issue for parallelizing rotations/permutations in Plonk would likely require a large refactoring to put the engine in a different data structure.

In the first place, stuffing the engine in those data-structure was to avoid changing the function signature everywhere, but in the end we have to.

Hence we should probably make the engine an input only in the functions that require MSM evaluation and pass it as an extra input:

  • we would have changed the function signatures anyway
  • no lifetime to deal with in MSM MSMKZG, DualMSM, MSMIPA, CommitmentScheme, ... data structure
  • relatively easy to maintain, understand for further refactoring.

@einar-taiko
Copy link

einar-taiko commented Dec 20, 2023

While the 'com and 'params / 'zal lifetime issue can probably be solved

Regarding the lifetime issue. I think it could make a lot of sense to use the 'static lifetime for the engine reference, implying the engine is available for full running-time of the program.

@einar-taiko
Copy link

I think I need to understand better what is the nature of the ZAL engine object. Here is my current understanding:

We have a caller, e.g. the zkevm-circuits crate.
We have a callee the halo2 proof system crate.

  1. Caller creates a single engine object (unknown size so dyn trait object) on the stack and keeps ownership.
  2. Caller passes a reference (immutable burrow) to this object to the callee who may lend multiple burrows of it internally.
  3. Callee uses these references to conduct the computations.
  4. Callee returns to caller who may or may not deallocate the engine.

If this is the way, I don't think we can avoid doing heavy annotation of lifetimes. My intuition is, that if we do not annotate it every where we pass something, that contains a reference, how can the lifetime checker know which lifetimes to link? This seems to be a heavy drawback of the &'zal dyn ZalEngine type approach.

An alternative could be to put it in an Arc<dyn ZalEngine> on the heap and this way pass runtime RC'd smart pointers around. Since we already allocate the engine dynamically, keeping track of the references at runtime should not occur any significant overhead.

The last approach, i.e.

make the engine an input only in the functions that require MSM evaluation and pass it as an extra input:

raises some questions, I need input on:

  1. Can the engine object contain data?
  2. Can methods on the engine object be called in parallel by different threads at the same time?
  3. Should it contain a synchronized work queue?
  4. (or) Should the second thread to invoke an operation panick?
  5. Is the engine Send?
  6. Is the engine Sync?
  7. Can you copy/clone the engine object?
  8. If yes, what happens, if two engines schedule work on a single GPU?

I think it boils down to that the API is clear, but the semantics are still in flux.

@mratsim
Copy link
Author

mratsim commented Dec 20, 2023

I think I need to understand better what is the nature of the ZAL engine object. Here is my current understanding:

We have a caller, e.g. the zkevm-circuits crate. We have a callee the halo2 proof system crate.

1. Caller creates a single engine object (unknown size so `dyn` trait object) on the stack and keeps ownership.

Yes for the engine object.
The caller actually know the size, and within Halo2 it's just a pointer. dyn was there for type-erasure and avoiding generics.

2. Caller passes a reference (immutable burrow) to this object to the callee who may lend multiple burrows of it internally.

3. Callee uses these references to conduct the computations.

4. Callee returns to caller who may or may not deallocate the engine.

Yes exactly.

If this is the way, I don't think we can avoid doing heavy annotation of lifetimes. My intuition is, that if we do not annotate it every where we pass something, that contains a reference, how can the lifetime checker know which lifetimes to link? This seems to be a heavy drawback of the &'zal dyn ZalEngine type approach.

Adding lifetimes within Halo2 is OK, having new lifetimes that leak into end users like zkevm-circuits or Powdr is API breakage. If we break the API, I think the smarter way is to just not store the engine in the low-level datastructure of Halo2 and just pass it around: no lifetimes, no issues of Send+Sync, easy to understand, maintain and refactor.

An alternative could be to put it in an Arc<dyn ZalEngine> on the heap and this way pass runtime RC'd smart pointers around. Since we already allocate the engine dynamically, keeping track of the references at runtime should not occur any significant overhead.

This is possible, though Arc<&dyn ZalEngine> or Arc<Box<dyn ZalEngine>> i think.

Regarding overhead, the whole point of Send+Sync is to allow the following section:

let rotation_sets = rotation_set_commitment_map
.into_par_iter()
.map(|(rotations, commitments)| {
let rotations_vec = rotations.iter().collect::<Vec<_>>();
let commitments: Vec<Commitment<F, Q::Commitment>> = commitments
.into_par_iter()
.map(|commitment| {
let evals: Vec<F> = rotations_vec
.par_iter()
.map(|&&rotation| get_eval(commitment, rotation))
.collect();
Commitment((commitment, evals))
})
.collect();

with 3 nested parallel loops, that will all increment and decrement Arc, leading to cache flushes, for a resource that is not even used in those loops.

So we use Arc to allow this section to be parallel but introduce overhead.

Hence:

  1. Either this section is a bottleneck, parallelism help and Arc should go (and the engine not be stored there)
  2. Or this section is not a bottleneck and we don't care about make it parallel so we don't require Send+Sync and so we don't require Arc.

The last approach, i.e.

make the engine an input only in the functions that require MSM evaluation and pass it as an extra input:

raises some questions, I need input on:

1. Can the engine object contain data?

Halo2 doesn't need to know, it only has a pointer to it. It may use thread-local resources.

2. Can methods on the engine object be called in parallel by different threads at the same time?

In Halo2, there is no part of the code that call 2 MSMs in parallel.

In general I think if we use an "accelerator", Halo2 expresses concurrency to it (with the async API) and let it handle parallelism best.

And it would mess up with thread-local storage.

For example, assuming Cuda, you would really want those concurrent MSMs to be issued on different cuda streams:

Similarly for OpenCL event queues:

Or approaches using a supervisor thread with load-balancing queue to reduce contention:

but only the accel runtime knows of those, not rayon.

Also the async API is out-of-scope for this PR. Implementation complexity would be easier though, in order:

  1. (Proposed) Engine always called from same thread, function returns when result is there.
  2. (Async API) Engine always called from same thread, function returns and leave an handle. Halo2 can issue another computation (from the same thread), receives a second handle. Then can wait/synchronize for computations to be ready using those handles.
  3. (Send+Sync Engine) Any thread can call the engine, meaning internally it requires queues for handling incoming jobs/tasks.
3. Should it contain a synchronized work queue?

This is a per-accelerator design decision.

4. (or) Should the second thread to invoke an operation panick?

If 2 MSMs can be issued in parallel, then we can use the async API, but it's not in scope for this. And Halo2 doesn't issue multiple MSMs in parallel as far as I've seen.

5. Is the engine Send?

Cannot be moved to another thread, cannot be copied is the less restrictive constraint for engine design. But we use a reference to it.

It is possible to allow an engine to be called from any thread with locks or lock-free queues but debugging concurrent data structure to remove deadlocks, livelocks and race conditions is extremely time-consuming, when it doesn't straight up require formal verification.

6. Is the engine Sync?

Same as above.

7. Can you copy/clone the engine object?

No, just like you can't copy a database handle, memory allocated, a network connection or a GPU. It's an uncopyable and unmovable resource and you interact with it through a reference to it.

8. If yes, what happens, if two engines schedule work on a single GPU?

N/A

I think it boils down to that the API is clear, but the semantics are still in flux.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants