Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce generators that respect function domains #348

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Oct 29, 2024

Introduce a Domain trait that allows us to define what the interesting inputs are on a per-function basis. This trait is used in gen::domain to create sequences of values that are either (1) around interesting points of this domain, or (2) logarithmically spaced within the domain.

Compared to the random generators, this means that we don't waste time checking large quantities of different NaNs or out of bound inputs (e.g. negative numbers and NaNs take up more than half the float space, this would be wasted checking sqrt which is only defined for x >= 0). It also means we know that coverage is uniform across the entire domain.

Currently only unary operations are supported.

This also includes a f8 type that is just helpful for testing ULP ops since it is easily possible to list all values. I was going to remove this, but it turned out to be useful enough that I think I'll keep it around for future development.

@tgross35
Copy link
Contributor Author

tgross35 commented Nov 1, 2024

Rebased to build on #349

@tgross35 tgross35 force-pushed the function-domains branch 2 times, most recently from 107b4ab to a16e073 Compare November 2, 2024 06:33
@tgross35
Copy link
Contributor Author

tgross35 commented Nov 2, 2024

I think the current logic is pretty good. Still needs cleanup but these are the check points in debug mode:

sqrt-inputs
cbrt-inputs
cos-inputs

Release numbers take forever to generate the plots but it just shows more even coverage as expected

sqrt-inputs-log

@tgross35 tgross35 force-pushed the function-domains branch 7 times, most recently from dc1a1da to 385e850 Compare November 2, 2024 22:40
@tgross35 tgross35 changed the title Function domains Generators that respect function domains Nov 2, 2024
@tgross35
Copy link
Contributor Author

tgross35 commented Nov 2, 2024

@beetrees @quaternic you both suggested this at different points, would you mind reviewing this? I still have to wire up tests but I think the generator itself is good, see the above plots.

The interesting parts are crates/libm-test/src/domain.rs (domain definitions) crates/libm-test/src/gen/domain.rs (making use of the domain to generate test cases) and crates/libm-test/src/num.rs (operations for skipping up/down a fixed number of ULP).

const DEFINED: (Bound<T>, Bound<T>);

/// The region, if any, for which the function repeats. Used to test within.
const PERIODIC: Option<(Bound<T>, Bound<T>)> = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced periodicity will be that useful here, since not that many of the functions actually are periodic. Mainly sin, cos, tan, and fract I suppose. Out of those, only fract has a period (1.0) which is actually representable as a float. The functions periodic in π are problematic exactly because of that; the worst-case inputs are very large ones where the function just happens to land very close to zero. Unlike fract which is just exactly zero for numbers large enough.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I got this from another math library (I thought it was core-math maybe not since I can't find it now...) that suggested the heaviest testing within one period to get better coverage with the larger float types. But in retrospect this doesn't make a lot of sense to me given that -π..π already includes more than 50% of the float's representable values.

Do you think it is better to just drop any references to periodicity and instead test with Unbounded? Or is there still some advantage to heavier testing within this range, or keeping the information around for some future use.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just drop it, but it's not something I've thought about all that much. It could be useful to have the information somewhere (e.g. [0,2π] or [-π,π] might be very reasonable input ranges to have on some benchmarks) but it's mostly only relevant for sin/cos/tan.

Copy link
Contributor Author

@tgross35 tgross35 Nov 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it applies to a lot less than I expected it to when starting this. Okay, I'll remove it (but not immediately in case other refactoring comes up in review). Keeping check_points with multiples of π/2 seems useful to generate extra tests around there, considering I think the approximations tend to blow up at poles & zeros of $f^{(n)}(x)$.

image

Github thinks you're going to comment in the future? 🤔 edit: oh, must be related to DST in the US

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering about that as well, good to see it's just a visual glitch. Yours is too 🤔 Apparently the US is turning clocks back at this time (daylight savings)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for removing PERIODIC.

Fix up traits

Wip up down

Tests pass!

Cleanup

Enable full test range

More tests

Code cleanup

More tests, all pass

Cleanup, passing

Add visualization scripts

Add visualization scripts

updates to script

clippy

clippy

clippy

Docs update

docs

docs
Comment on lines +21 to +44
/// Number of values near an interesting point to check.
const AROUND: usize = 100;

/// Number of tests to run.
const NTESTS: usize = {
if cfg!(optimizations_enabled) {
if crate::emulated()
|| !cfg!(target_pointer_width = "64")
|| cfg!(all(target_arch = "x86_64", target_vendor = "apple"))
{
// Tests are pretty slow on non-64-bit targets, x86 MacOS, and targets that run
// in QEMU.
100_000
} else {
5_000_000
}
} else {
// Without optimizations just run a quick check
800
}
};

/// Some functions have infinite asymptotes, limit how many we check.
const MAX_ASYMPTOTES: usize = 10;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I need to refactor this all a bit considering I just copied and pasted NTESTS from the random tests. Still brainstorming how to do this (input appreciated) but I'm thinking maybe something like:

  • Have N number of tests by default, which is 5M in the existing code (but probably needs to be reduced based on the below)
  • f32 unary functions run N random tests
  • f64 and f128 unary functions run N * 4 random tests
  • If the function takes two inputs, multiply the number of tests by 4. If it takes three inputs, multiply by 8.
  • If a domain test exists for the function, run N domain-based tests and reduce the number of random tests by a factor of 100
  • If an exhaustive test for f32 (not yet implemented) or high-iteration test (f64/f128 or multi-input functions, also not implemented) should be run, still run the "interesting points" portion of domain-based tests but replace the logspace tests with whatever fits. I should probably split this into two generators rather than chaining...

Basically in all cases, either the exhaustive tests or the logspace are going to consume the bulk of time. But we still want to run "interesting points" and the random tests in hopes that they will find errors earlier than waiting for the whole exhaustive check to run. Also random tests should cover some signaling NaNs.

Copy link
Contributor

@beetrees beetrees Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a reasonable plan to start with, especially checking the "interesting" cases first will make development easier. Ultimately the total number of test cases that get run is mainly a function of how much CI time we're willing to spend running them; it might be worth deciding on a rough estimate for that and then tuning the total test count to fit within it. Also, all the "magic" somewhat-arbitrary constants (AROUND, NTESTS, MAX_ASYMPTOTES etc.) for how many of each type of test to run should probably be centralised in a single file somewhere to make it easier to keep track of what tests are being run.

@tgross35 tgross35 changed the title Generators that respect function domains Introduce generators that respect function domains Nov 3, 2024
Comment on lines +82 to +92
let is_nan = |x: Self| -> bool {
// }
// fn is_nan(x: Self) -> bool {
// When using mangled-names, the "real" compiler-builtins might not have the
// necessary builtin (__unordtf2) to test whether `f128` is NaN.
// FIXME(f16_f128): Remove once the nightly toolchain has the __unordtf2 builtin
// x is NaN if all the bits of the exponent are set and the significand is non-0
x.to_bits() & Self::EXP_MASK == Self::EXP_MASK
&& x.to_bits() & Self::SIG_MASK != Self::Int::ZERO
};
if is_nan(self) && is_nan(rhs) { true } else { self.to_bits() == rhs.to_bits() }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let is_nan = |x: Self| -> bool {
// }
// fn is_nan(x: Self) -> bool {
// When using mangled-names, the "real" compiler-builtins might not have the
// necessary builtin (__unordtf2) to test whether `f128` is NaN.
// FIXME(f16_f128): Remove once the nightly toolchain has the __unordtf2 builtin
// x is NaN if all the bits of the exponent are set and the significand is non-0
x.to_bits() & Self::EXP_MASK == Self::EXP_MASK
&& x.to_bits() & Self::SIG_MASK != Self::Int::ZERO
};
if is_nan(self) && is_nan(rhs) { true } else { self.to_bits() == rhs.to_bits() }
if self.is_nan() && rhs.is_nan() { true } else { self.to_bits() == rhs.to_bits() }

__unordtf2 is in nightly now, so may as well remove the workaround.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh right, we could do this in builtins as well.

pub const ALL_LEN: usize = 240;

/// All non-infinite non-NaN values of `f8` excluding `-0`.
pub const ALL: [Self; Self::ALL_LEN] = [
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could a compile time for loop be used to generate this array instead of listing all the values out manually? Something like:

    pub const ALL: [Self; Self::ALL_LEN] = {
        let mut all = [Self(0); Self::ALL_LEN];
        let mut i = 0;
        let mut next = 0b1_1110_111;
        while next >= 0b1_0000_000 {
            all[i] = Self(next);
            i += 1;
            next -= 1;
        }
        let mut next = 0b0_0000_000;
        while next <= 0b0_1110_111 {
            all[i] = Self(next);
            i += 1;
            next += 1;
        }
        assert!(i == Self::ALL_LEN);
        all
    };

impl f8 {
pub const ALL_LEN: usize = 240;

/// All non-infinite non-NaN values of `f8` excluding `-0`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment says the array excludes -0, but the array includes -0.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I did change that at some point.

Comment on lines +453 to +455
// comparison to get the correct result. (This assumes a twos- or ones-
// complement integer representation; if integers are represented in a
// sign-magnitude representation, then this flip is incorrect).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// comparison to get the correct result. (This assumes a twos- or ones-
// complement integer representation; if integers are represented in a
// sign-magnitude representation, then this flip is incorrect).
// comparison to get the correct result.

Rust integers are always twos-complement.

const DEFINED: (Bound<T>, Bound<T>);

/// The region, if any, for which the function repeats. Used to test within.
const PERIODIC: Option<(Bound<T>, Bound<T>)> = None;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 for removing PERIODIC.

Comment on lines +168 to +169
for i in 0..f8::ALL_LEN {
let v = f8::ALL[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i in 0..f8::ALL_LEN {
let v = f8::ALL[i];
for (i, v) in f8::ALL.into_iter().enumerate() {

Comment on lines +256 to +257
for i in 0..f8::ALL_LEN {
let v = f8::ALL[i];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i in 0..f8::ALL_LEN {
let v = f8::ALL[i];
for (i, v) in f8::ALL.into_iter().enumerate() {

Comment on lines +305 to +308
for i in 0..f8::ALL_LEN {
for j in 0..f8::ALL_LEN {
let x = f8::ALL[i];
let y = f8::ALL[j];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for i in 0..f8::ALL_LEN {
for j in 0..f8::ALL_LEN {
let x = f8::ALL[i];
let y = f8::ALL[j];
for (i, x) in f8::ALL.into_iter().enumerate() {
for (j, y) in f8::ALL.into_iter().enumerate() {

values.extend(count_down(F::MAX).take(AROUND));

// Check some special values that aren't included in the above ranges
values.push(F::NAN);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check +/- an sNaN and -F::NAN here as well?

Comment on lines +21 to +44
/// Number of values near an interesting point to check.
const AROUND: usize = 100;

/// Number of tests to run.
const NTESTS: usize = {
if cfg!(optimizations_enabled) {
if crate::emulated()
|| !cfg!(target_pointer_width = "64")
|| cfg!(all(target_arch = "x86_64", target_vendor = "apple"))
{
// Tests are pretty slow on non-64-bit targets, x86 MacOS, and targets that run
// in QEMU.
100_000
} else {
5_000_000
}
} else {
// Without optimizations just run a quick check
800
}
};

/// Some functions have infinite asymptotes, limit how many we check.
const MAX_ASYMPTOTES: usize = 10;
Copy link
Contributor

@beetrees beetrees Nov 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like a reasonable plan to start with, especially checking the "interesting" cases first will make development easier. Ultimately the total number of test cases that get run is mainly a function of how much CI time we're willing to spend running them; it might be worth deciding on a rough estimate for that and then tuning the total test count to fit within it. Also, all the "magic" somewhat-arbitrary constants (AROUND, NTESTS, MAX_ASYMPTOTES etc.) for how many of each type of test to run should probably be centralised in a single file somewhere to make it easier to keep track of what tests are being run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants