-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce generators that respect function domains #348
base: master
Are you sure you want to change the base?
Conversation
6838d3b
to
7c236b8
Compare
Rebased to build on #349 |
107b4ab
to
a16e073
Compare
dc1a1da
to
385e850
Compare
@beetrees @quaternic you both suggested this at different points, would you mind reviewing this? I still have to wire up tests but I think the generator itself is good, see the above plots. The interesting parts are |
const DEFINED: (Bound<T>, Bound<T>); | ||
|
||
/// The region, if any, for which the function repeats. Used to test within. | ||
const PERIODIC: Option<(Bound<T>, Bound<T>)> = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced periodicity will be that useful here, since not that many of the functions actually are periodic. Mainly sin
, cos
, tan
, and fract
I suppose. Out of those, only fract
has a period (1.0
) which is actually representable as a float. The functions periodic in π
are problematic exactly because of that; the worst-case inputs are very large ones where the function just happens to land very close to zero. Unlike fract
which is just exactly zero for numbers large enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I got this from another math library (I thought it was core-math maybe not since I can't find it now...) that suggested the heaviest testing within one period to get better coverage with the larger float types. But in retrospect this doesn't make a lot of sense to me given that -π..π already includes more than 50% of the float's representable values.
Do you think it is better to just drop any references to periodicity and instead test with Unbounded
? Or is there still some advantage to heavier testing within this range, or keeping the information around for some future use.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd just drop it, but it's not something I've thought about all that much. It could be useful to have the information somewhere (e.g. [0,2π] or [-π,π] might be very reasonable input ranges to have on some benchmarks) but it's mostly only relevant for sin/cos/tan.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it applies to a lot less than I expected it to when starting this. Okay, I'll remove it (but not immediately in case other refactoring comes up in review). Keeping check_points
with multiples of π/2
seems useful to generate extra tests around there, considering I think the approximations tend to blow up at poles & zeros of
Github thinks you're going to comment in the future? 🤔 edit: oh, must be related to DST in the US
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was wondering about that as well, good to see it's just a visual glitch. Yours is too 🤔 Apparently the US is turning clocks back at this time (daylight savings)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for removing PERIODIC
.
Fix up traits Wip up down Tests pass! Cleanup Enable full test range More tests Code cleanup More tests, all pass Cleanup, passing Add visualization scripts Add visualization scripts updates to script clippy clippy clippy Docs update docs docs
/// Number of values near an interesting point to check. | ||
const AROUND: usize = 100; | ||
|
||
/// Number of tests to run. | ||
const NTESTS: usize = { | ||
if cfg!(optimizations_enabled) { | ||
if crate::emulated() | ||
|| !cfg!(target_pointer_width = "64") | ||
|| cfg!(all(target_arch = "x86_64", target_vendor = "apple")) | ||
{ | ||
// Tests are pretty slow on non-64-bit targets, x86 MacOS, and targets that run | ||
// in QEMU. | ||
100_000 | ||
} else { | ||
5_000_000 | ||
} | ||
} else { | ||
// Without optimizations just run a quick check | ||
800 | ||
} | ||
}; | ||
|
||
/// Some functions have infinite asymptotes, limit how many we check. | ||
const MAX_ASYMPTOTES: usize = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to refactor this all a bit considering I just copied and pasted NTESTS from the random tests. Still brainstorming how to do this (input appreciated) but I'm thinking maybe something like:
- Have
N
number of tests by default, which is 5M in the existing code (but probably needs to be reduced based on the below) f32
unary functions runN
random testsf64
andf128
unary functions runN * 4
random tests- If the function takes two inputs, multiply the number of tests by 4. If it takes three inputs, multiply by 8.
- If a domain test exists for the function, run
N
domain-based tests and reduce the number of random tests by a factor of 100 - If an exhaustive test for
f32
(not yet implemented) or high-iteration test (f64
/f128
or multi-input functions, also not implemented) should be run, still run the "interesting points" portion of domain-based tests but replace thelogspace
tests with whatever fits. I should probably split this into two generators rather than chaining...
Basically in all cases, either the exhaustive tests or the logspace are going to consume the bulk of time. But we still want to run "interesting points" and the random tests in hopes that they will find errors earlier than waiting for the whole exhaustive check to run. Also random tests should cover some signaling NaNs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a reasonable plan to start with, especially checking the "interesting" cases first will make development easier. Ultimately the total number of test cases that get run is mainly a function of how much CI time we're willing to spend running them; it might be worth deciding on a rough estimate for that and then tuning the total test count to fit within it. Also, all the "magic" somewhat-arbitrary constants (AROUND
, NTESTS
, MAX_ASYMPTOTES
etc.) for how many of each type of test to run should probably be centralised in a single file somewhere to make it easier to keep track of what tests are being run.
86fe390
to
668ea32
Compare
let is_nan = |x: Self| -> bool { | ||
// } | ||
// fn is_nan(x: Self) -> bool { | ||
// When using mangled-names, the "real" compiler-builtins might not have the | ||
// necessary builtin (__unordtf2) to test whether `f128` is NaN. | ||
// FIXME(f16_f128): Remove once the nightly toolchain has the __unordtf2 builtin | ||
// x is NaN if all the bits of the exponent are set and the significand is non-0 | ||
x.to_bits() & Self::EXP_MASK == Self::EXP_MASK | ||
&& x.to_bits() & Self::SIG_MASK != Self::Int::ZERO | ||
}; | ||
if is_nan(self) && is_nan(rhs) { true } else { self.to_bits() == rhs.to_bits() } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let is_nan = |x: Self| -> bool { | |
// } | |
// fn is_nan(x: Self) -> bool { | |
// When using mangled-names, the "real" compiler-builtins might not have the | |
// necessary builtin (__unordtf2) to test whether `f128` is NaN. | |
// FIXME(f16_f128): Remove once the nightly toolchain has the __unordtf2 builtin | |
// x is NaN if all the bits of the exponent are set and the significand is non-0 | |
x.to_bits() & Self::EXP_MASK == Self::EXP_MASK | |
&& x.to_bits() & Self::SIG_MASK != Self::Int::ZERO | |
}; | |
if is_nan(self) && is_nan(rhs) { true } else { self.to_bits() == rhs.to_bits() } | |
if self.is_nan() && rhs.is_nan() { true } else { self.to_bits() == rhs.to_bits() } |
__unordtf2
is in nightly now, so may as well remove the workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh right, we could do this in builtins as well.
pub const ALL_LEN: usize = 240; | ||
|
||
/// All non-infinite non-NaN values of `f8` excluding `-0`. | ||
pub const ALL: [Self; Self::ALL_LEN] = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could a compile time for loop be used to generate this array instead of listing all the values out manually? Something like:
pub const ALL: [Self; Self::ALL_LEN] = {
let mut all = [Self(0); Self::ALL_LEN];
let mut i = 0;
let mut next = 0b1_1110_111;
while next >= 0b1_0000_000 {
all[i] = Self(next);
i += 1;
next -= 1;
}
let mut next = 0b0_0000_000;
while next <= 0b0_1110_111 {
all[i] = Self(next);
i += 1;
next += 1;
}
assert!(i == Self::ALL_LEN);
all
};
impl f8 { | ||
pub const ALL_LEN: usize = 240; | ||
|
||
/// All non-infinite non-NaN values of `f8` excluding `-0`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment says the array excludes -0
, but the array includes -0
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I did change that at some point.
// comparison to get the correct result. (This assumes a twos- or ones- | ||
// complement integer representation; if integers are represented in a | ||
// sign-magnitude representation, then this flip is incorrect). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// comparison to get the correct result. (This assumes a twos- or ones- | |
// complement integer representation; if integers are represented in a | |
// sign-magnitude representation, then this flip is incorrect). | |
// comparison to get the correct result. |
Rust integers are always twos-complement.
const DEFINED: (Bound<T>, Bound<T>); | ||
|
||
/// The region, if any, for which the function repeats. Used to test within. | ||
const PERIODIC: Option<(Bound<T>, Bound<T>)> = None; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 for removing PERIODIC
.
for i in 0..f8::ALL_LEN { | ||
let v = f8::ALL[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in 0..f8::ALL_LEN { | |
let v = f8::ALL[i]; | |
for (i, v) in f8::ALL.into_iter().enumerate() { |
for i in 0..f8::ALL_LEN { | ||
let v = f8::ALL[i]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in 0..f8::ALL_LEN { | |
let v = f8::ALL[i]; | |
for (i, v) in f8::ALL.into_iter().enumerate() { |
for i in 0..f8::ALL_LEN { | ||
for j in 0..f8::ALL_LEN { | ||
let x = f8::ALL[i]; | ||
let y = f8::ALL[j]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for i in 0..f8::ALL_LEN { | |
for j in 0..f8::ALL_LEN { | |
let x = f8::ALL[i]; | |
let y = f8::ALL[j]; | |
for (i, x) in f8::ALL.into_iter().enumerate() { | |
for (j, y) in f8::ALL.into_iter().enumerate() { |
values.extend(count_down(F::MAX).take(AROUND)); | ||
|
||
// Check some special values that aren't included in the above ranges | ||
values.push(F::NAN); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check +
/-
an sNaN and -F::NAN
here as well?
/// Number of values near an interesting point to check. | ||
const AROUND: usize = 100; | ||
|
||
/// Number of tests to run. | ||
const NTESTS: usize = { | ||
if cfg!(optimizations_enabled) { | ||
if crate::emulated() | ||
|| !cfg!(target_pointer_width = "64") | ||
|| cfg!(all(target_arch = "x86_64", target_vendor = "apple")) | ||
{ | ||
// Tests are pretty slow on non-64-bit targets, x86 MacOS, and targets that run | ||
// in QEMU. | ||
100_000 | ||
} else { | ||
5_000_000 | ||
} | ||
} else { | ||
// Without optimizations just run a quick check | ||
800 | ||
} | ||
}; | ||
|
||
/// Some functions have infinite asymptotes, limit how many we check. | ||
const MAX_ASYMPTOTES: usize = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like a reasonable plan to start with, especially checking the "interesting" cases first will make development easier. Ultimately the total number of test cases that get run is mainly a function of how much CI time we're willing to spend running them; it might be worth deciding on a rough estimate for that and then tuning the total test count to fit within it. Also, all the "magic" somewhat-arbitrary constants (AROUND
, NTESTS
, MAX_ASYMPTOTES
etc.) for how many of each type of test to run should probably be centralised in a single file somewhere to make it easier to keep track of what tests are being run.
Introduce a
Domain
trait that allows us to define what the interesting inputs are on a per-function basis. This trait is used ingen::domain
to create sequences of values that are either (1) around interesting points of this domain, or (2) logarithmically spaced within the domain.Compared to the random generators, this means that we don't waste time checking large quantities of different NaNs or out of bound inputs (e.g. negative numbers and NaNs take up more than half the float space, this would be wasted checking
sqrt
which is only defined for x >= 0). It also means we know that coverage is uniform across the entire domain.Currently only unary operations are supported.
This also includes a
f8
type that is just helpful for testing ULP ops since it is easily possible to list all values. I was going to remove this, but it turned out to be useful enough that I think I'll keep it around for future development.