-
Notifications
You must be signed in to change notification settings - Fork 256
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify MSM implementation, and small speedup #157
Conversation
595dee7
to
e0a9ff4
Compare
The PR looks good. Do we need to update the CHANGELOG.md for this? It belongs to an improvement. |
I'll update the CHANGELOG, I guess it is a slight performance improvement. |
ec/src/msm/variable_base.rs
Outdated
let size = ark_std::cmp::min(bases.len(), scalars.len()); | ||
let scalars = &scalars[..size]; | ||
let bases = &bases[..size]; | ||
let scalars_and_bases = scalars.iter().zip(bases).filter(|(s, _)| !s.is_zero()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is because in some downstream projects, (poly-commit) algorithms that use MSM are slower if the vectors are mismatched. The zip
should fix this, but in benchmarks it doesn't.
(For that matter, neither does this new code, so I can remove it if that's better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks to me like this is due to line 19, where c
is set just on the scalars length? That should probably be moved down, and calculated based on size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To clarify, this was happening when bases.len() > scalars.len()
, so the c
calculation wasn't affected.
(But you're right, we should move the c
calculation down anyway)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also scalars should never be bigger then the number of bases, right? Perhaps an assert should be added for that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm not sure if that's necessary; you could imagine a case where zero-padding makes scalars
larger, but because the extras are zero, it doesn't affect the result
(This might be happening in ark-poly-commit
already)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, the function doesn't really make sense in the case where its non-zero. (And in the case where there are more 0's than bases, its being inefficient in allocating them)
Regardless of if this function accepts extra zeroes, imo we should (eventually) change poly-commit to not allocate extra zeros
res += running_sum; | ||
} | ||
|
||
buckets.into_iter().rev().for_each(|b| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an aside, we should probably parallelize this loop. (Will have some complexity added, to handle the running sum's updates for subsequent chunks)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that worthwhile? We're already parallelizing the outer loop (over the windows)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hrmm, probably should benchmark this. I thought its a linear number of buckets in the degree
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought its a linear number of buckets in the degree
It is, but the idea right now is that each thread gets its own window to operate over. Usually the number of windows is more than the number of threads, so the existing allocation is already fine for those cases. However if the number of threads is higher, then we could try leverage the extra threads for better parallelism; but we have to do so in a way that doesn't harm the normal mode of operation.
One idea would be to conditionally parallelize only if we have spare threads; this can be done via rayon's ThreadPool
. The idea would be allocate some number of threads to the pool (say 2), and then execute operations inside the pool. The threadpool approach would ensure that all Rayon operations inside the pool use at most two threads
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I see. Didn't realize its already parallelized, then I agree its probably unlikely to be a perf bottleneck
852fc1c
to
656bc6b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now! Out of curiousity, any idea how much of a speedup this was?
It was pretty small, less than 5% at best (it did get slightly more pronounced with more threads, but that might have been because normalization is multi-threaded by default, so there was some cross-thread interference) |
I see, we should probably expose a single core batch inversion then |
Description
As per the title, this PR simplifies the internal implementation of variable-base MSMs, and also provides a small speedup (by doing less work).
Before we can merge this PR, please make sure that all the following items have been
checked off. If any of the checklist items are not applicable, please leave them but
write a little note why.
Pending
section inCHANGELOG.md
Files changed
in the Github PR explorer