-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fboemer/ntt fix #58
Fboemer/ntt fix #58
Conversation
@@ -266,7 +266,7 @@ void ForwardTransformToBitReverseAVX512( | |||
const uint64_t* W = &root_of_unity_powers[W_idx]; | |||
const uint64_t* W_precon = &precon_root_of_unity_powers[W_idx]; | |||
|
|||
if (input_mod_factor <= 2) { | |||
if ((input_mod_factor <= 2) && (recursion_depth == 0)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The actual fix
@@ -35,7 +35,7 @@ static void BM_EltwiseFMAModAddNative(benchmark::State& state) { // NOLINT | |||
|
|||
BENCHMARK(BM_EltwiseFMAModAddNative) | |||
->Unit(benchmark::kMicrosecond) | |||
->ArgsProduct({{1024, 8192, 16384}, {false, true}}); | |||
->ArgsProduct({{1024, 4096, 16384}, {false, true}}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just making parameters consistent with other benchmarks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LG though I added an optional suggestion
std::random_device rd; | ||
std::mt19937 gen(rd()); | ||
std::uniform_int_distribution<uint64_t> distrib(1, modulus - 1); | ||
|
||
for (size_t trial = 0; trial < num_trials; ++trial) { | ||
std::vector<uint64_t> input64(N, 0); | ||
for (size_t i = 0; i < N; ++i) { | ||
input64[i] = distrib(gen); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code block is repeated a few times. Would it be worth, refactoring into a helper function that returns a vector? Or, abstract away distrib and gen into a functor and use std::generate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the suggestion. I've created https://jiratest.idoc.intel.com/browse/GLADE-79 to fix in a separate PR
* Fix NTT AVX512 implementation
Fixes bug in AVX512 NTT that appeared in the IFMA implementation with primes close to 2**50. Added
bool prefer_small_primes
parameter toGeneratePrimes()
which is used to trigger the failing condition.