-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Randomized Quasi-Monte Carlo Sampling in The Random Ray Method #3268
base: develop
Are you sure you want to change the base?
Conversation
question: Wouldn't it also be nice if this procedure could be generalized to also be used for stochastic volume runs? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @spasmann for putting in this PR -- it's nicely implemented and is a very fun addition to the code. I also did some of my own testing of the new feature. I ran 2D C5G7 30 times with each sampling method (with different seeds each time) but otherwise the same configuration (flat source, 142k FSRs, 1750 rays/iteration). The ensemble statistics look fantastic for the new randomized Halton sampling scheme:
PRNG | Halton | |
---|---|---|
k-eff | 1.18675 | 1.18676 |
k-eff std dev. [pcm] | 39 | 15 |
Average Pin Error (AAPE) | 0.53 | 0.47 |
AAPE Std. Dev. | 0.11 | 0.05 |
RMS | 0.68 | 0.59 |
RMS Std. Dev. | 0.14 | 0.06 |
MAX Pin Power Error | 2.37 | 2.19 |
MAX Std. Dev. | 0.46 | 0.29 |
The spatial error metrics show a modest ~10% absolute improvement (and look pretty much identical to what you reported). There is also a really huge reduction in the std. deviation of the spatial error metrics throughout the ensemble sample, generally around 2x on average. The improvement in the eigenvalue is really major. Without introducing any observable bias, the std. dev. is being reduced by 2.6x. Assuming approximately equal runtimes, this is like a 7X FOM improvement overall! I'll also note that these are all ensemble uncertainties (rather than what gets listed by OpenMC), so is definitely a legitimate reduction, we are not getting fooled by underestimation of uncertainty.
I also tried out the Kobayashi dog leg problem (a fixed source problem). Ensemble testing has shown both methods get the same answer, so that's good! However, there isn't much impact on uncertainty. For a tally in the far region of the problem, PRNG gets 2.4% std. dev., whereas RQMS gets 2.5%.
Anyway, my early take is that it seems to really help in some cases, and doesn't hurt in others, so on the whole I'd recommend folks to use it!
// Function to generate randomized Halton sequence samples | ||
// | ||
// Algorithm adapted from: | ||
// A. B. Owen. A randomized halton algorithm in r. Arxiv, 6 2017. | ||
// URL https://arxiv.org/abs/1706.02808 | ||
vector<vector<float>> rhalton(int N, int dim, uint64_t* seed, int64_t skip = 0) | ||
{ | ||
int b; | ||
double b2r; | ||
vector<double> ans(N); | ||
vector<int> ind(N); | ||
vector<int> primes = {2, 3, 5, 7, 11, 13, 17, 19, 23, 29}; | ||
vector<vector<float>> halton(N, vector<float>(dim, 0.0)); | ||
|
||
std::iota(ind.begin(), ind.end(), skip); | ||
|
||
for (int D = 0; D < dim; ++D) { | ||
b = primes[D]; | ||
b2r = 1.0 / b; | ||
vector<int> res(ind); | ||
std::fill(ans.begin(), ans.end(), 0.0); | ||
|
||
while ((1.0 - b2r) < 1.0) { | ||
vector<int> dig(N); | ||
// randomaly permute a sequence from skip to skip+N | ||
vector<int> perm(b); | ||
std::iota(perm.begin(), perm.end(), 0); | ||
fisher_yates_shuffle(perm, seed); | ||
|
||
for (int i = 0; i < N; ++i) { | ||
dig[i] = res[i] % b; | ||
ans[i] += perm[dig[i]] * b2r; | ||
res[i] = (res[i] - dig[i]) / b; | ||
} | ||
b2r /= b; | ||
} | ||
|
||
for (int i = 0; i < N; ++i) { | ||
halton[i][D] = ans[i]; | ||
} | ||
} | ||
|
||
return halton; | ||
} | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
-
Given that the particle positions and directions are stored in FP64, would it make sense to allow the sampling to also be 64-bit?
-
This is a nice and high-level/abstract implementation of the sampling scheme, though the "abstractness" is a double edged sword as it requires a significant amount of dynamic memory allocation with all the usage of
vector
. My guess is that this causes most of the added expense in the sampling scheme as compared to PRNG sampling. With that in mind, as the function is only ever called in OpenMC as:
rhalton(1, 5, current_seed(), skip = skip)
we could in theory make a simpler and faster function that only generates a single 5 dimensional sample, thus eliminating the need for several vectors (or replacement with statically sized std::array
). However, it's possible other areas in OpenMC may want to use Halton sampling in the future, in which case perhaps it is worthwhile to keep things abstract? At the very least, it seems like we can limit things to a single sample at a time (eliminating the N
argument and thus several dynamically sized vectors)?
|
||
// Sample spatial distribution | ||
Position xi {samples[0][0], samples[0][1], samples[0][2]}; | ||
Position shift {1e-9, 1e-9, 1e-9}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wherever possible, "magic numbers" in OpenMC are defined in constants.h
, so it's easier to keep track of how many things like the 1e-9
shift here we have baked into the code. You could make a new macro in constants.h
with this value, or potentially just reuse an existing macro since this shift is related to geometry floating point sensitivity. Some existing options from constants.h
:
// Used for surface current tallies
constexpr double TINY_BIT {1e-8};
// User for precision in geometry
constexpr double FP_PRECISION {1e-14};
constexpr double FP_REL_PRECISION {1e-5};
constexpr double FP_COINCIDENT {1e-12};
// Set random number seed | ||
int64_t batch_seed = (simulation::current_batch - 1) * settings::n_particles; | ||
int64_t skip = id(); | ||
init_particle_seeds(batch_seed, seeds()); | ||
stream() = STREAM_TRACKING; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a similar manner to my comment regarding simplification vs. leaving abstract for the rhalton
function, if we end up deciding to just have it generate a single sample, can we change the interface to only rely on the current_seed()
(appropriately offset of the particle ID), thus allowing for this code to be moved back up to the caller, such that it doesn't need to be (partially) reproduced between sample_rng()
vs. sample_halton
? The logic of RQMC sampling makes it seem like we need to keep the seed to be batch-specific, but wanted to double check!
Hey @ebknudsen! Definitely -- I have talked with @paulromano and @pshriwise about this (in particular, with regards to PR #3129). The tentative plan is to get #3129 merged, and then we can look at testing alternative methods for sampling the rays. |
Description
This PR adds a randomized Halton sequence, a form of randomized Quasi-Monte Carlo (RQMC) sampling, as an alternative to typical random number sampling for use in TRRM. RQMC methods decrease the variance in the simulation by providing a more uniform distribution of samples for initializing ray positions and directions of travel compared to random samples.
The Halton sampling method can be specified using the Python API with:
OpenMC's native rng is the default sampling method but it can also be specified using:
QMC & RQMC Background
Quasi-Monte Carlo (QMC) techniques use low-discrepancy sequences in place of typical pseudo-random number generators in Monte Carlo techniques. Like pseudo-random number generators, low-discrepancy sequences are intended to produce$N$ points $U[0,1]^D$ . Unlike pseudo-random numbers, low-discrepancy sequences use deterministic algorithms designed to maximize the distance between the points which is measured using discrepancy statistics.
Despite improved sampling characteristics, QMC techniques are not commonly used in particle transport applications. This is because each point generated in the sequence is dependent on the previous, making QMC techniques ill-suited for modeling the particle random-walk process. Fortunately, as the name implies, TRRM performs a ray-tracing procedure uniformly through the problem, bypassing the need to explicitly model the scattering or fission processes. This provides a well-suited application for (R)QMC techniques.
Standard QMC methods are inherently deterministic, each time a sequence is initialized the same samples are generated. However, for each batch in the TRRM rays are emitted at different locations and traveling in different directions. Therefore new QMC samples are required for each batch. Most QMC sequences are extensible in$D$ however QMC performance takes a significant hit beyond $D>9$ . Instead, we can utilize randomized-QMC (RQMC) techniques. RQMC represents a host of techniques which attempt to randomize or shuffle the original QMC sequence but maintain the low-discrepancy of the samples. By randomizing the sequence each iteration, we can generate a unique set of RQMC samples for each set of particles. Additionally, by randomizing the QMC samples we can estimate the variance of the integral (just like in analog MC and traditional Random Ray), something that is difficult to do with unrandomized QMC.
The image below shows samples from a pseudo-random number generator, the Halton sequence, and the randomized Halton sequence.
A few helpful resources on QMC and RQMC include
Implementation
The RQMC method implemented here is Owen's randomization of the Halton sequence. Owen’s randomization is well-suited for applications like TRRM because it introduces independent random permutations to the samples, making it possible to generate$R$ distinct sample sets. Similar to OpenMC's native RNG, the reproducibility of samples is ensured by seeding OpenMC's native RNG which is used in the randomization process. Additionally, Owen's randomization allows for the efficient generation of new samples in parallel by constructing rows in the range $\left[N, N^\prime\right]$ . Finally, Owen's randomization method was chosen for its straightforward implementation and lack of any additional user input.
Verification & Performance
The implementation has been verified by evaluating fission pin cell powers from the 2D-C5G7 problem. The performance boost provided by the RQMC samples compared to random samples will vary by problem. In general, RQMC should give better performance to problems with higher spatial sensitivity e.g. a fine spatial mesh or TRRM's linear source approximation.
Results were generated on one Perlmutter node (2x AMD EPYC 7763), utilizing one MPI rank and 256 OpenMP threads. In the following figures, each data point represents the mean result from 80 individual simulations (each set with a different seed), while the shaded regions represent one standard deviation.
142,964 Flat Source Regions
Surprisingly, simulations using Halton samples were on average over 10% faster using a fine mesh.
17,924 Linear Source Regions
The coarse mesh runtimes are nearly identical.
Runtime Performance
It is very unlikely that the Halton sampling algorithm itself is faster than OpenMC's native RNG (although I have not compared them). My current hypothesis is that the Halton samples result in reduced shared memory contention because the samples are generated in such a way that each subsequent point will be "far away" from the previous point. The plot below shows the percent speedup of the Halton sample simulations compared to the RNG simulations as a function of the number of OpenMP threads used and generally supports this hypothesis.
However, the trend is noisy, and I would have expected this phenomenon to be more pronounced on coarse meshes. Instead, the runtime performance on the coarse mesh simulations were nearly identical. I'll need to run more tests to figure out what exactly is giving the Halton a runtime advantage in the fine mesh simulations. In any case, it doesn't seem that the Halton sampling hurts runtime, I think this makes sense as the bulk of the compute time is spent in the ray tracing procedure.
Checklist