-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736
Conversation
@llvm/pr-subscribers-libc Author: None (lntue) Changes
Patch is 65.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95736.diff 18 Files Affected:
diff --git a/libc/config/darwin/arm/entrypoints.txt b/libc/config/darwin/arm/entrypoints.txt
index e1303265b9ac4..d486916a542d6 100644
--- a/libc/config/darwin/arm/entrypoints.txt
+++ b/libc/config/darwin/arm/entrypoints.txt
@@ -226,6 +226,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.scalbnl
libc.src.math.sincosf
libc.src.math.sinhf
+ libc.src.math.sin
libc.src.math.sinf
libc.src.math.sqrt
libc.src.math.sqrtf
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index dfed6acbdf257..07d6583c0a816 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -481,6 +481,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.scalbnl
libc.src.math.sincosf
libc.src.math.sinhf
+ libc.src.math.sin
libc.src.math.sinf
libc.src.math.sqrt
libc.src.math.sqrtf
diff --git a/libc/config/linux/arm/entrypoints.txt b/libc/config/linux/arm/entrypoints.txt
index d4f932416bd9f..5716b777c8c96 100644
--- a/libc/config/linux/arm/entrypoints.txt
+++ b/libc/config/linux/arm/entrypoints.txt
@@ -345,6 +345,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.scalbnf
libc.src.math.scalbnl
libc.src.math.sincosf
+ libc.src.math.sin
libc.src.math.sinf
libc.src.math.sinhf
libc.src.math.sqrt
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index e12d6b3957e51..18968f5b07b59 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -489,6 +489,7 @@ set(TARGET_LIBM_ENTRYPOINTS
libc.src.math.scalbnl
libc.src.math.sincosf
libc.src.math.sinhf
+ libc.src.math.sin
libc.src.math.sinf
libc.src.math.sqrt
libc.src.math.sqrtf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index 293edd1c15100..48867d7012b51 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -314,7 +314,7 @@ Higher Math Functions
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
| rsqrt | | | | | | 7.12.7.9 | F.10.4.9 |
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| sin | |check| | large | | | | 7.12.4.6 | F.10.1.6 |
+| sin | |check| | |check| | | | | 7.12.4.6 | F.10.1.6 |
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
| sincos | |check| | large | | | | | |
+-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/src/__support/FPUtil/double_double.h b/libc/src/__support/FPUtil/double_double.h
index b9490b52f6b41..a94902b7ec7ae 100644
--- a/libc/src/__support/FPUtil/double_double.h
+++ b/libc/src/__support/FPUtil/double_double.h
@@ -44,7 +44,12 @@ LIBC_INLINE constexpr DoubleDouble add(const DoubleDouble &a, double b) {
return exact_add(r.hi, r.lo + a.lo);
}
-// Velkamp's Splitting for double precision.
+// Veltkamp's Splitting for double precision.
+// Note: This is the original version of Veltkamp's Splitting, which is only
+// correct for round-to-nearest mode. See:
+// Graillat, S., Lafevre, V., and Muller, J.-M., "Alternative Split Functions
+// and Dekker's Product," ARITH'2020.
+// http://arith2020.arithsymposium.org/resources/paper_31.pdf
LIBC_INLINE constexpr DoubleDouble split(double a) {
DoubleDouble r{0.0, 0.0};
// Splitting constant = 2^ceil(prec(double)/2) + 1 = 2^27 + 1.
@@ -56,6 +61,9 @@ LIBC_INLINE constexpr DoubleDouble split(double a) {
return r;
}
+// Note: When FMA instruction is not available, the `exact_mult` function relies
+// on Veltkamp's Splitting algorithm, and is only correct for round-to-nearest
+// mode.
LIBC_INLINE DoubleDouble exact_mult(double a, double b) {
DoubleDouble r{0.0, 0.0};
diff --git a/libc/src/__support/FPUtil/dyadic_float.h b/libc/src/__support/FPUtil/dyadic_float.h
index 12a69228d36c7..9e25cc6486011 100644
--- a/libc/src/__support/FPUtil/dyadic_float.h
+++ b/libc/src/__support/FPUtil/dyadic_float.h
@@ -270,11 +270,11 @@ LIBC_INLINE constexpr DyadicFloat<Bits> quick_add(DyadicFloat<Bits> a,
// don't need to normalize the inputs again in this function. If the inputs are
// not normalized, the results might lose precision significantly.
template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(DyadicFloat<Bits> a,
- DyadicFloat<Bits> b) {
+LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(const DyadicFloat<Bits> &a,
+ const DyadicFloat<Bits> &b) {
DyadicFloat<Bits> result;
result.sign = (a.sign != b.sign) ? Sign::NEG : Sign::POS;
- result.exponent = a.exponent + b.exponent + int(Bits);
+ result.exponent = a.exponent + b.exponent + static_cast<int>(Bits);
if (!(a.mantissa.is_zero() || b.mantissa.is_zero())) {
result.mantissa = a.mantissa.quick_mul_hi(b.mantissa);
@@ -301,7 +301,7 @@ multiply_add(const DyadicFloat<Bits> &a, const DyadicFloat<Bits> &b,
// Simple exponentiation implementation for printf. Only handles positive
// exponents, since division isn't implemented.
template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a,
+LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(const DyadicFloat<Bits> &a,
uint32_t power) {
DyadicFloat<Bits> result = 1.0;
DyadicFloat<Bits> cur_power = a;
@@ -317,7 +317,7 @@ LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a,
}
template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> mul_pow_2(DyadicFloat<Bits> a,
+LIBC_INLINE constexpr DyadicFloat<Bits> mul_pow_2(const DyadicFloat<Bits> &a,
int32_t pow_2) {
DyadicFloat<Bits> result = a;
result.exponent += pow_2;
diff --git a/libc/src/__support/macros/optimization.h b/libc/src/__support/macros/optimization.h
index 59886ca44be12..05a47791deed8 100644
--- a/libc/src/__support/macros/optimization.h
+++ b/libc/src/__support/macros/optimization.h
@@ -33,4 +33,18 @@ LIBC_INLINE constexpr bool expects_bool_condition(T value, T expected) {
#error "Unhandled compiler"
#endif
+// Defining optimization options for math functions.
+// TODO: Exporting this to public generated headers?
+#define LIBC_MATH_SKIP_ACCURATE_PASS 0x01
+#define LIBC_MATH_SMALL_TABLES 0x02
+#define LIBC_MATH_NO_ERRNO 0x04
+#define LIBC_MATH_NO_EXCEPT 0x08
+#define LIBC_MATH_FAST \
+ (LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES | \
+ LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)
+
+#ifndef LIBC_MATH
+#define LIBC_MATH 0
+#endif // LIBC_MATH
+
#endif // LLVM_LIBC_SRC___SUPPORT_MACROS_OPTIMIZATION_H
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index aa0069d821d0d..2e33c37dd90f0 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -135,6 +135,22 @@ add_header_library(
libc.src.__support.common
)
+add_header_library(
+ range_reduction_double
+ HDRS
+ range_reduction_double.h
+ range_reduction_double_fma.h
+ DEPENDS
+ libc.src.__support.FPUtil.double_double
+ libc.src.__support.FPUtil.dyadic_float
+ libc.src.__support.FPUtil.fp_bits
+ libc.src.__support.FPUtil.fma
+ libc.src.__support.FPUtil.multiply_add
+ libc.src.__support.FPUtil.nearest_integer
+ libc.src.__support.common
+ libc.src.__support.integer_literals
+)
+
add_header_library(
sincosf_utils
HDRS
@@ -146,6 +162,15 @@ add_header_library(
libc.src.__support.common
)
+add_header_library(
+ sincos_eval
+ HDRS
+ sincos_eval.h
+ DEPENDS
+ libc.src.__support.FPUtil.double_double
+ libc.src.__support.FPUtil.multiply_add
+)
+
add_entrypoint_object(
cosf
SRCS
@@ -167,6 +192,29 @@ add_entrypoint_object(
-O3
)
+add_entrypoint_object(
+ sin
+ SRCS
+ sin.cpp
+ HDRS
+ ../sin.h
+ DEPENDS
+ libc.hdr.errno_macros
+ libc.src.errno.errno
+ libc.src.__support.FPUtil.double_double
+ libc.src.__support.FPUtil.dyadic_float
+ libc.src.__support.FPUtil.fenv_impl
+ libc.src.__support.FPUtil.fp_bits
+ libc.src.__support.FPUtil.fma
+ libc.src.__support.FPUtil.multiply_add
+ libc.src.__support.FPUtil.nearest_integer
+ libc.src.__support.FPUtil.polyeval
+ libc.src.__support.FPUtil.rounding_mode
+ libc.src.__support.macros.optimization
+ COMPILE_OPTIONS
+ -O3
+)
+
add_entrypoint_object(
sinf
SRCS
diff --git a/libc/src/math/generic/range_reduction_double.h b/libc/src/math/generic/range_reduction_double.h
new file mode 100644
index 0000000000000..3cd76d722da1a
--- /dev/null
+++ b/libc/src/math/generic/range_reduction_double.h
@@ -0,0 +1,67 @@
+//===-- Range reduction for double precision sin/cos/tan --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
+#define LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+using fputil::DoubleDouble;
+
+LIBC_INLINE constexpr int FAST_PASS_EXPONENT = 23;
+
+// Digits of pi/128, generated by Sollya with:
+// > a = round(pi/128, D, RN);
+// > b = round(pi/128 - a, D, RN);
+LIBC_INLINE constexpr DoubleDouble PI_OVER_128 = {0x1.1a62633145c07p-60,
+ 0x1.921fb54442d18p-6};
+
+// Digits of -pi/128, generated by Sollya with:
+// > a = round(pi/128, 25, RN);
+// > b = round(pi/128 - a, 23, RN);
+// > c = round(pi/128 - a - b, 25, RN);
+// > d = round(pi/128 - a - b - c, D, RN);
+// The precisions of the parts are chosen so that:
+// 1) k * a, k * b, k * c are exact in double precision
+// 2) k * b + fractional part of (k * a) is exact in double precsion
+LIBC_INLINE constexpr double MPI_OVER_128[4] = {
+ -0x1.921fb5p-6, -0x1.110b48p-32, +0x1.ee59dap-56, -0x1.98a2e03707345p-83};
+
+LIBC_INLINE constexpr double ONE_TWENTY_EIGHT_OVER_PI_D = 0x1.45f306dc9c883p5;
+
+namespace generic {
+
+LIBC_INLINE int range_reduction_small(double x, DoubleDouble &u) {
+ double prod_hi = x * ONE_TWENTY_EIGHT_OVER_PI_D;
+ double kd = fputil::nearest_integer(prod_hi);
+ int k = static_cast<int>(kd);
+
+ // x - k * (pi/128)
+ double c = fputil::multiply_add(kd, MPI_OVER_128[0], x); // Exact
+ double y_hi = fputil::multiply_add(kd, MPI_OVER_128[1], c); // Exact
+ double y_lo = fputil::multiply_add(kd, MPI_OVER_128[2], kd * MPI_OVER_128[3]);
+ u = fputil::exact_add(y_hi, y_lo);
+
+ return k;
+}
+
+// TODO: Implement generic's range_reduction_large correctly rounded for all
+// rounding modes. The current fma's range_reduction_large only works for
+// round-to-nearest without FMA instruction.
+
+} // namespace generic
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
diff --git a/libc/src/math/generic/range_reduction_double_fma.h b/libc/src/math/generic/range_reduction_double_fma.h
new file mode 100644
index 0000000000000..dafcefe4ef72e
--- /dev/null
+++ b/libc/src/math/generic/range_reduction_double_fma.h
@@ -0,0 +1,334 @@
+//===-- Range reduction for double precision sin/cos/tan w/ FMA -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_FMA_H
+#define LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_FMA_H
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/dyadic_float.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/common.h"
+#include "src/__support/integer_literals.h"
+
+namespace LIBC_NAMESPACE {
+
+namespace fma {
+
+using fputil::DoubleDouble;
+using Float128 = fputil::DyadicFloat<128>;
+
+LIBC_INLINE constexpr int FAST_PASS_EXPONENT = 32;
+
+// Digits of pi/128, generated by Sollya with:
+// > a = round(pi/128, D, RN);
+// > b = round(pi/128 - a, D, RN);
+LIBC_INLINE constexpr DoubleDouble PI_OVER_128 = {0x1.1a62633145c07p-60,
+ 0x1.921fb54442d18p-6};
+LIBC_INLINE constexpr Float128 PI_OVER_128_F128 = {
+ Sign::POS, -133, 0xc90f'daa2'2168'c234'c4c6'628b'80dc'1cd1_u128};
+
+// Digits of 2^(16*i) / pi, generated by Sollya with:
+// For [2..62]:
+// > for i from 3 to 63 do {
+// pi_inv = 2^(16*(i - 3)) / pi;
+// pn = nearestint(pi_inv);
+// pi_frac = pi_inv - pn;
+// a = round(pi_frac, D, RN);
+// b = round(pi_frac - a, D, RN);
+// c = round(pi_frac - a - b, D, RN);
+// d = round(pi_frac - a - b - c, D, RN);
+// print("{", 2^7 * a, ",", 2^7 * b, ",", 2^7 * c, ",", 2^7 * d, "},");
+// };
+// For [0..1]:
+// The leading bit of 2^(16*(i - 3)) / pi is very small, so we add 0.25 so that
+// the conditions for the algorithms are still satisfied, and one of those
+// conditions guarantees that ulp(0.25 * x_reduced) >= 2, and will safely be
+// discarded.
+// > for i from 0 to 2 do {
+// pi_frac = 0.25 + 2^(16*(i - 3)) / pi;
+// a = round(pi_frac, D, RN);
+// b = round(pi_frac - a, D, RN);
+// c = round(pi_frac - a - b, D, RN);
+// d = round(pi_frac - a - b - c, D, RN);
+// print("{", 2^7 * a, ",", 2^7 * b, ",", 2^7 * c, ",", 2^7 * d, "},");
+// };
+// For The fast pass using double-double, we only need 3 parts (a, b, c), but
+// for the accurate pass using Float128, instead of using another table of
+// Float128s, we simply add the fourth path (a, b, c, d), which simplify the
+// implementation a bit and saving some memory.
+LIBC_INLINE constexpr double ONE_TWENTY_EIGHT_OVER_PI[64][4] = {
+ {0x1.0000000000014p5, 0x1.7cc1b727220a9p-49, 0x1.3f84eafa3ea6ap-103,
+ -0x1.11f924eb53362p-157},
+ {0x1.0000000145f3p5, 0x1.b727220a94fe1p-49, 0x1.d5f47d4d37703p-104,
+ 0x1.b6295993c439p-158},
+ {0x1.000145f306dcap5, -0x1.bbead603d8a83p-50, 0x1.f534ddc0db629p-106,
+ 0x1.664f10e4107f9p-160},
+ {0x1.45f306dc9c883p5, -0x1.6b01ec5417056p-49, -0x1.6447e493ad4cep-103,
+ 0x1.e21c820ff28b2p-157},
+ {-0x1.f246c6efab581p4, 0x1.3abe8fa9a6eep-53, 0x1.b6c52b3278872p-107,
+ 0x1.07f9458eaf7afp-164},
+ {0x1.391054a7f09d6p4, -0x1.70565911f924fp-53, 0x1.2b3278872084p-107,
+ -0x1.ae9c5421443aap-162},
+ {0x1.529fc2757d1f5p2, 0x1.a6ee06db14acdp-53, -0x1.8778df7c035d4p-107,
+ 0x1.d5ef5de2b0db9p-161},
+ {-0x1.ec54170565912p-1, 0x1.b6c52b3278872p-59, 0x1.07f9458eaf7afp-116,
+ -0x1.d4f246dc8e2dfp-173},
+ {-0x1.505c1596447e5p5, 0x1.b14acc9e21c82p-49, 0x1.fe5163abdebbcp-106,
+ 0x1.586dc91b8e909p-160},
+ {-0x1.596447e493ad5p1, 0x1.93c439041fe51p-54, 0x1.8eaf7aef1586ep-108,
+ -0x1.b7238b7b645a4p-163},
+ {0x1.bb81b6c52b328p5, -0x1.de37df00d74e3p-49, 0x1.7bd778ac36e49p-103,
+ -0x1.1c5bdb22d1ffap-158},
+ {0x1.b6c52b3278872p5, 0x1.07f9458eaf7afp-52, -0x1.d4f246dc8e2dfp-109,
+ 0x1.374b801924bbbp-164},
+ {0x1.2b3278872084p5, -0x1.ae9c5421443aap-50, 0x1.b7246e3a424ddp-106,
+ 0x1.700324977504fp-161},
+ {-0x1.8778df7c035d4p5, 0x1.d5ef5de2b0db9p-49, 0x1.1b8e909374b8p-104,
+ 0x1.924bba8274648p-160},
+ {-0x1.bef806ba71508p4, -0x1.443a9e48db91cp-50, -0x1.6f6c8b47fe6dbp-104,
+ -0x1.115f62e6de302p-158},
+ {-0x1.ae9c5421443aap-2, 0x1.b7246e3a424ddp-58, 0x1.700324977504fp-113,
+ -0x1.cdbc603c429c7p-167},
+ {-0x1.38a84288753c9p5, -0x1.b7238b7b645a4p-51, 0x1.924bba8274648p-112,
+ 0x1.cfe1deb1cb12ap-166},
+ {-0x1.0a21d4f246dc9p3, 0x1.d2126e9700325p-53, -0x1.a22bec5cdbc6p-107,
+ -0x1.e214e34ed658cp-162},
+ {-0x1.d4f246dc8e2dfp3, 0x1.374b801924bbbp-52, -0x1.f62e6de301e21p-106,
+ -0x1.38d3b5963045ep-160},
+ {-0x1.236e4716f6c8bp4, -0x1.1ff9b6d115f63p-50, 0x1.921cfe1deb1cbp-106,
+ 0x1.29a73ee88235fp-162},
+ {0x1.b8e909374b802p4, -0x1.b6d115f62e6dep-50, -0x1.80f10a71a76b3p-105,
+ 0x1.cfba208d7d4bbp-160},
+ {0x1.09374b801924cp4, -0x1.15f62e6de301ep-50, -0x1.0a71a76b2c609p-105,
+ 0x1.1046bea5d7689p-159},
+ {-0x1.68ffcdb688afbp3, -0x1.736f180f10a72p-53, 0x1.62534e7dd1047p-107,
+ -0x1.0568a25dbd8b3p-161},
+ {0x1.924bba8274648p0, 0x1.cfe1deb1cb12ap-54, -0x1.63045df7282b4p-108,
+ -0x1.44bb7b16638fep-162},
+ {-0x1.a22bec5cdbc6p5, -0x1.e214e34ed658cp-50, -0x1.177dca0ad144cp-106,
+ 0x1.213a671c09ad1p-160},
+ {0x1.3a32439fc3bd6p1, 0x1.cb129a73ee882p-54, 0x1.afa975da24275p-109,
+ -0x1.8e3f652e8207p-164},
+ {-0x1.b78c0788538d4p4, 0x1.29a73ee88235fp-50, 0x1.4baed1213a672p-104,
+ -0x1.fb29741037d8dp-159},
+ {0x1.fc3bd63962535p5, -0x1.822efb9415a29p-51, 0x1.a24274ce38136p-105,
+ -0x1.741037d8cdc54p-159},
+ {-0x1.4e34ed658c117p2, -0x1.f7282b4512edfp-52, 0x1.d338e04d68bfp-107,
+ -0x1.bec66e29c67cbp-162},
+ {0x1.62534e7dd1047p5, -0x1.0568a25dbd8b3p-49, -0x1.c7eca5d040df6p-105,
+ -0x1.9b8a719f2b318p-160},
+ {-0x1.63045df7282b4p4, -0x1.44bb7b16638fep-50, 0x1.ad17df904e647p-104,
+ 0x1.639835339f49dp-158},
+ {0x1.d1046bea5d769p5, -0x1.bd8b31c7eca5dp-49, -0x1.037d8cdc538dp-107,
+ 0x1.a99cfa4e422fcp-161},
+ {0x1.afa975da24275p3, -0x1.8e3f652e8207p-52, 0x1.3991d63983534p-106,
+ -0x1.82d8dee81d108p-160},
+ {-0x1.a28976f62cc72p5, 0x1.35a2fbf209cc9p-53, -0x1.4e33e566305b2p-109,
+ 0x1.08bf177bf2507p-163},
+ {-0x1.76f62cc71fb29p5, -0x1.d040df633714ep-49, -0x1.9f2b3182d8defp-104,
+ 0x1.f8bbdf9283b2p-158},
+ {0x1.d338e04d68bfp5, -0x1.bec66e29c67cbp-50, 0x1.9cfa4e422fc5ep-105,
+ -0x1.036be27003b4p-161},
+ {0x1.c09ad17df904ep4, 0x1.91d639835339fp-50, 0x1.272117e2ef7e5p-104,
+ -0x1.7c4e007680022p-158},
+ {0x1.68befc827323bp5, -0x1.c67cacc60b638p-50, 0x1.17e2ef7e4a0ecp-104,
+ 0x1.ff897ffde0598p-158},
+ {-0x1.037d8cdc538dp5, 0x1.a99cfa4e422fcp-49, 0x1.77bf250763ff1p-103,
+ 0x1.7ffde05980fefp-158},
+ {-0x1.8cdc538cf9599p5, 0x1.f49c845f8bbep-50, -0x1.b5f13801da001p-104,
+ 0x1.e05980fef2f12p-158},
+ {-0x1.4e33e566305b2p3, 0x1.08bf177bf2507p-51, 0x1.8ffc4bffef02dp-105,
+ -0x1.fc04343b9d298p-160},
+ {-0x1.f2b3182d8dee8p4, -0x1.d1081b5f13802p-52, 0x1.2fffbc0b301fep-107,
+ -0x1.a1dce94beb25cp-163},
+ {-0x1.8c16c6f740e88p5, -0x1.036be27003b4p-49, -0x1.0fd33f8086877p-109,
+ -0x1.d297d64b824b2p-164},
+ {0x1.3908bf177bf25p5, 0x1.d8ffc4bffef03p-53, -0x1.9fc04343b9d29p-108,
+ -0x1.f592e092c9813p-162},
+ {0x1.7e2ef7e4a0ec8p4, -0x1.da00087e99fcp-56, -0x1.0d0ee74a5f593p-110,
+ 0x1.f6d367ecf27cbp-166},
+ {-0x1.081b5f13801dap4, -0x1.0fd33f8086877p-61, -0x1.d297d64b824b2p-116,
+ -0x1.8130d834f648bp-170},
+ {-0x1.af89c00ed0004p5, -0x1.fa67f010d0ee7p-50, -0x1.297d64b824b26p-104,
+ -0x1.30d834f648b0cp-162},
+ {-0x1.c00ed00043f4dp5, 0x1.fde5e2316b415p-55, -0x1.2e092c98130d8p-110,
+ -0x1.a7b24585ce...
[truncated]
|
Performance tests with the CORE-MATH project for FMA version:
Latency:
Latency
|
it seems the inexact exception is not handled:
|
On a Intel(R) Xeon(R) Silver 4214 I get:
|
How does this perform on targets without 128-bit float support? Guessing those skip the final accurate pass? |
The 128-bit precision floating point type that we use for accurate pass is not quad precision, and it is emulated using |
…ystem malloc in overlay mode. (#95845) This causes an issue in overlay mode: #95736 (comment)
@zimmermann6 : Do you mind sync to the latest update and check again? It was caused by some unrelated changes. Thanks! |
unless I did something wrong, I still get missing inexact exceptions:
|
Can you apply #95845 to your repo and rerun the cmake + ninja commands? |
thanks to the help of Tue, now all tests do pass (including the check of the inexact exception) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not finished reviewing, but I'm in and out of meetings all day. Posting what I have for now.
Thanks @zimmermann6 for letting me know about the recent results at: https://inria.hal.science/hal-04480440. With the insight from that paper, I was able to make non-FMA targets also correctly rounded to all rounding modes. So I rearrange the range reduction headers, with |
I still have a question. Since the full set of hard-to-round cases for sin is not known, in CORE-MATH we issue an error message when we are not able to guarantee correct rounding. How do you deal with this issue? |
@zimmermann6 : right now we will probably rely on fuzz testing for it, comparing against MPFR outputs. @nickdesaulniers : shall we add accuracy assertion under |
Yes, I'm a big fan of |
I created an issue (#96452) and added a TODO to provide an exact error bound for the accurate pass and add assertion. |
@@ -36,7 +36,8 @@ TEST_F(LlvmLibcSinTest, TrickyInputs) { | |||
0x1.e639103a05997p+2, 0x1.13114266f9764p+4, -0x1.3eec5912ea7cdp+331, | |||
0x1.08087e9aad90bp+887, 0x1.2b5fe88a9d8d5p+903, -0x1.a880417b7b119p+1023, | |||
-0x1.6deb37da81129p+205, 0x1.08087e9aad90bp+887, 0x1.f6d7518808571p+1023, | |||
-0x1.8bb5847d49973p+845, 0x1.f08b14e1c4d0fp+890, | |||
-0x1.8bb5847d49973p+845, 0x1.f08b14e1c4d0fp+890, 0x1.6ac5b262ca1ffp+849, | |||
0x1.e0000000001c2p-20, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with the two additional constants?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two extra worst cases that popped up with the CORE-MATH test suites when I was playing around with the error bounds for fast passes for non-FMA targets.
…recision sincos correctly rounded to all rounding modes. (#96719) Sharing the same algorithm as double precision sin: llvm/llvm-project#95736 and cos: llvm/llvm-project#96591 GitOrigin-RevId: 8eda5e8754fda4422617d069315bd0ce34d99b0d Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993 Roller-URL: https://ci.chromium.org/b/8743956522971831761 CQ-Do-Not-Cancel-Tryjobs: true Change-Id: Ic8e99f78feabb9deb3ac49fc04608cfadfc16a66 Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1073452
… sincos correctly rounded to all rounding modes. (#96719) Sharing the same algorithm as double precision sin: llvm/llvm-project#95736 and cos: llvm/llvm-project#96591 GitOrigin-RevId: 8eda5e8754fda4422617d069315bd0ce34d99b0d Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993 Change-Id: Ie2bd882ad9952a0969feb84edcf1daffab6901b4
…] Implement double precision sincos correctly rounded to all rounding modes. (#96719) Sharing the same algorithm as double precision sin: llvm/llvm-project#95736 and cos: llvm/llvm-project#96591 GitOrigin-RevId: 6e644e80df2ddd3b6292ca7f77c48922a7848a1c Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993 Roller-URL: https://ci.chromium.org/b/8743956522971831761 CQ-Do-Not-Cancel-Tryjobs: true Original-Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1073452 Original-Revision: 660fa89a50c932fbfcacafdc59eab859631e4390 Change-Id: Iba8f96b2ab385de0dc2388226870f9c5af969b2b
…ll rounding modes. (llvm#96719) Sharing the same algorithm as double precision sin: llvm#95736 and cos: llvm#96591
…rounding modes. (llvm#95736) - Algorithm: - Step 1 - Range reduction: for a double precision input `x`, return `k` and `u` such that - k is an integer - u = x - k * pi / 128, and |u| < pi/256 - Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA. - Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`. - Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded. - Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision. - Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST). - Provide `LIBC_MATH` flag so that users can set `LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4 and 4'.
…rounding modes. (llvm#96591) Sharing the same algorithm as double precision sin: llvm#95736
…ll rounding modes. (llvm#96719) Sharing the same algorithm as double precision sin: llvm#95736 and cos: llvm#96591
x
, returnk
andu
such thatsin(u)
andcos(u)
in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u)
using look-up table forsin(k*pi/128)
andcos(k*pi/128)
.LIBC_MATH
flag so that users can setLIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS
to build thesin
function without step 4 and 4'.