Skip to content
This repository has been archived by the owner on Nov 6, 2020. It is now read-only.

Feat/stream contrib #8

Closed
wants to merge 4 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
30 changes: 16 additions & 14 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -799,23 +799,18 @@ impl MPCParameters {
for (base, projective) in bases.iter_mut().zip(projective.iter_mut()) {
*projective = wnaf.base(base.into_projective(), 1).scalar(coeff);
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know if this sped things up? I believe the reason why we wait for multiplication to end before converting from projective into affine is that we don't want point conversion stealing cpu from the multiplications.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I may have observed a slight improvement, or it might have been noise. It didn't get worse.

I would be surprised if the other way were better. The first part was assigning the work to specific thread, one per available cpu. So they should all be running in parallel. Then the work is being chunked back up into (presumably) the same number of threads and continued. It seems to me that this incurs at least:

  • the penalty of waiting for all threads to complete before beginning any normalization/conversion.
  • the rayon overhead
  • possibly moving data between cpus, if work isn't reassigned to the same cpu it started on (despite the same chunking)

As far as 'stealing cpu' goes, the normalization and conversion are single-threaded, so they shouldn't be stealing CPU from other threads — just using the already-assigned idle CPU.

Doing all the work for each chunk on the originally allocated thread at least avoids all the potential costs above. As an aside, if we weren't using the raw format to speed up i/o, we could probably improve performance by moving the affine<->uncompressed conversions into this parallel section. And we could likely get even more speedup than raw by just reading/writing the projective points and skipping the conversions (at the cost of larger small params files). The latter might be worth exploring for future ceremonies.

C::Projective::batch_normalization(projective);
projective
.iter()
.zip(bases.iter_mut())
.for_each(|(projective, affine)| {
*affine = projective.into_affine();
});
});
}
})
.unwrap();

// Perform batch normalization
projective
.par_chunks_mut(chunk_size)
.for_each(|p| C::Projective::batch_normalization(p));

// Turn it all back into affine points
projective
.par_iter()
.zip(bases.par_iter_mut())
.for_each(|(projective, affine)| {
*affine = projective.into_affine();
});
}

let delta_inv = privkey.delta.inverse().expect("nonzero");
Expand Down Expand Up @@ -1131,7 +1126,14 @@ impl MPCParameters {
"small params are internally inconsistent wrt. G1 deltas"
);

let MPCSmall { delta_g1, delta_g2, h, l, contributions, .. } = contrib;
let MPCSmall {
delta_g1,
delta_g2,
h,
l,
contributions,
..
} = contrib;
self.params.vk.delta_g1 = delta_g1;
self.params.vk.delta_g2 = delta_g2;
self.params.h = Arc::new(h);
Expand Down
Loading