-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Speed up Uniform Duration sampling. #583
Conversation
Looks good, though there might be some ranges which are tricky to sample (e.g. 4.999, 5.001). Since such ranges are (probably) unlikely, this may not be a big issue. This does bring up two points though:
Adding |
By the way: your extended benchmarks show a significant performance regression for one set of bounds (I don't know what these are). I'm not keen on accepting this as-is due to the potential for very poor performance on certain bounds (but if we did, then there should definitely be a benchmark highlighting the problem). |
I hadn't really considered the worst case. I think it should have a much better worst case performance now. I reduced the rejection zone by using an offset of Current bench-difference
new:
One thing that is a little weird right now is that I've set the offset used in the Large case as a field in the struct instead of as a field on the enum itself. This is simply because I found it to improve the benchmarks.
I think that's just noise caused by my computer prioritizing something else. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work. Unfortunately it's also wrong; I'm surprised it didn't panic in your tests! Should be easy to fix though.
src/distributions/uniform.rs
Outdated
high_n = high_n + 1_000_000_000; | ||
} else { | ||
high_s = high_s; | ||
high_n = high_n; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant
benches/distributions.rs
Outdated
Uniform::new(Duration::new(10000, 423423), Duration::new(200000, 6969954)) | ||
); | ||
distr_duration!(distr_uniform_duration_edge, | ||
Uniform::new_inclusive(Duration::new(u64::max_value() / 10, 999_999_999), Duration::new((u64::max_value() / 10) + 1, 0)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is clearer if you put the "max / 10" bit above (let binding); you can use any value > 4 (to avoid the Medium sampling method). Edge case with high_ns = 1 may be worse in some implementations and is probably more important to check (keeping low_ns the same).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Medium sampling method actually occurs for any value > (2^64)/(10^9). But I simplified it by using a const binding and the same number of seconds in distr_uniform_duration_edge
and distr_uniform_duration_large
mode: UniformDurationMode, | ||
offset: u32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The offset is only used by Large mode so can be moved there.
Edit: sorry, I see your comment about this. Fine. It should make no difference to the size of the struct either way.
match self.mode { | ||
UniformDurationMode::Small { secs, nanos } => { | ||
let n = nanos.sample(rng); | ||
Duration::new(secs, n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately this is now incorrect since it's possible that n > 10^9 (in this case new
should panic). Hopefully using an if
to check here won't have too big an impact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Duration::new()
actually handles that already. In the documentation it says:
If the number of nanoseconds is greater than 1 billion (the number of nanoseconds in a second), then it will carry over into the seconds provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a debug_assert
?
let s = secs.sample(rng); | ||
let n = nano_range.sample(rng); | ||
if !(s == max_secs && n > max_nanos) { | ||
let sum = n + self.offset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After adding the offset back, s * 1_000_000_000 + sum
is correct, but again sum
may be more than 10^9. Again, using an if
to check whether to decrement sum
and increment s
is probably the best option.
❌ Build rand 1.0.59 failed (commit 970720fb41 by @Pazzaz) |
These changes speed up the process of sampling Durations from a uniform distribution primarily by avoiding two expensive operations:
I also added some simple benchmarks to measure these changes.
Before:
After:
To really avoid any regressions, I also used a more comprehensive benchmark.
The code for those benchmarks (rust to bench and python to graph) can be found here.
Comprehensive Benchmark Results
In these heatmaps, the x axis is the upper bound and the y axis is the lower bound. 800 different Durations were used, ranging from 0s to 18446744074.7095512s, in order.
The z-value is the average number of nanoseconds it took to sample from the corresponding distribution.
Before:
After:
Percentage of improvement:
Zoomed in:
So about a 10-20% improvement and the values outside of that range seem to be just noise.
Edit: changed a call to
Duration::from_nanos
toDuration::new(nanos / 1_000_000_000, (nanos % 1_000_000_000) as u32)
for compatibility with older rust versions. Shouldn't make much of a difference but the benchmark can be rerun.