Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify MSM implementation, and small speedup #157

Merged
merged 4 commits into from
Dec 30, 2020
Merged

Conversation

Pratyush
Copy link
Member

@Pratyush Pratyush commented Dec 29, 2020

Description

As per the title, this PR simplifies the internal implementation of variable-base MSMs, and also provides a small speedup (by doing less work).


Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.

  • Targeted PR against correct branch (main)
  • Linked to Github issue with discussion and accepted design OR have an explanation in the PR that describes this work.
  • Wrote unit tests
  • Updated relevant documentation in the code
  • Added a relevant changelog entry to the Pending section in CHANGELOG.md
  • Re-reviewed Files changed in the Github PR explorer

@weikengchen
Copy link
Member

The PR looks good. Do we need to update the CHANGELOG.md for this? It belongs to an improvement.

@Pratyush
Copy link
Member Author

I'll update the CHANGELOG, I guess it is a slight performance improvement.

Comment on lines 28 to 31
let size = ark_std::cmp::min(bases.len(), scalars.len());
let scalars = &scalars[..size];
let bases = &bases[..size];
let scalars_and_bases = scalars.iter().zip(bases).filter(|(s, _)| !s.is_zero());
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is because in some downstream projects, (poly-commit) algorithms that use MSM are slower if the vectors are mismatched. The zip should fix this, but in benchmarks it doesn't.

(For that matter, neither does this new code, so I can remove it if that's better)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks to me like this is due to line 19, where c is set just on the scalars length? That should probably be moved down, and calculated based on size.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, this was happening when bases.len() > scalars.len(), so the c calculation wasn't affected.

(But you're right, we should move the c calculation down anyway)

Copy link
Member

@ValarDragon ValarDragon Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also scalars should never be bigger then the number of bases, right? Perhaps an assert should be added for that?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm not sure if that's necessary; you could imagine a case where zero-padding makes scalars larger, but because the extras are zero, it doesn't affect the result

(This might be happening in ark-poly-commit already)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, the function doesn't really make sense in the case where its non-zero. (And in the case where there are more 0's than bases, its being inefficient in allocating them)

Regardless of if this function accepts extra zeroes, imo we should (eventually) change poly-commit to not allocate extra zeros

res += running_sum;
}

buckets.into_iter().rev().for_each(|b| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an aside, we should probably parallelize this loop. (Will have some complexity added, to handle the running sum's updates for subsequent chunks)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that worthwhile? We're already parallelizing the outer loop (over the windows)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hrmm, probably should benchmark this. I thought its a linear number of buckets in the degree

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought its a linear number of buckets in the degree

It is, but the idea right now is that each thread gets its own window to operate over. Usually the number of windows is more than the number of threads, so the existing allocation is already fine for those cases. However if the number of threads is higher, then we could try leverage the extra threads for better parallelism; but we have to do so in a way that doesn't harm the normal mode of operation.

One idea would be to conditionally parallelize only if we have spare threads; this can be done via rayon's ThreadPool. The idea would be allocate some number of threads to the pool (say 2), and then execute operations inside the pool. The threadpool approach would ensure that all Rayon operations inside the pool use at most two threads

Copy link
Member

@ValarDragon ValarDragon Dec 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see. Didn't realize its already parallelized, then I agree its probably unlikely to be a perf bottleneck

Copy link
Member

@ValarDragon ValarDragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now! Out of curiousity, any idea how much of a speedup this was?

@Pratyush
Copy link
Member Author

It was pretty small, less than 5% at best

(it did get slightly more pronounced with more threads, but that might have been because normalization is multi-threaded by default, so there was some cross-thread interference)

@Pratyush Pratyush merged commit 64ec4fe into master Dec 30, 2020
@Pratyush Pratyush deleted the better-msm-api branch December 30, 2020 00:20
@ValarDragon
Copy link
Member

I see, we should probably expose a single core batch inversion then

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants