-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uniform float improvements #1289
Conversation
Note: sample_single and sample_single_inclusive use different code paths for sampling.
Around 20% faster for f32, slightly less for f64.
Closes #1270. This branch was created to evaluate the mentioned paper: https://hal.science/hal-03282794/document The paper ignores possible underflow, but notes the following potential issues for uniform FP sampling from a defined range:
GSL's method, The method using The paper defines method SESS (Same Exponent Same Sign) which we shall ignore due to the narrow circumstances in which it is applicable. The paper also defines method γsection for general applicability. This is used for let (a, b) = (low, high);
let g = (a.next_up() - a).max(b - b.next_down());
let s = b/g - a/g;
let eps = if a.abs() <= b.abs() {
-a/g - (s - b/g)
} else {
b /g - (s + a/g)
};
let si = s.ceil() as $uty;
let hi = if s != si as $ty {
si
} else {
si + (eps > 0.0) as $uty
};
let k = (0..=hi).sample_single(rng).unwrap();
Ok(if a.abs() <= b.abs() {
if k == hi {
a
} else {
b - (k as $ty) *g
}
} else {
if k == hi {
b
} else {
a + (k as $ty) *g
}
}) Results are around 10 times slower for
In comparison, our current implementations benchmark at 1-1.5ns for distribution sampling, ~2ns for single The paper's implementation of SESS benchmarked at 10ns per value for both Note also that #531 should (if I remember correctly) be able to offer more precise sampling and with similar performance, thus I do not see much benefit in considering the γsection method further. |
Are there any significant optimizations we can make for the [0,1) case? |
I didn't look at that here, but I don't think so (we essentially just mask and bit-cast an integer, then subtract 1). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I only have some small questions about the benchmarks.
Also avoid using internal API of Uniform distribution.
sample_single_inclusive
(~20% perf increase)