[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736

lntue · 2024-06-17T05:03:15Z

Algorithm:
- Step 1 - Range reduction: for a double precision input x, return k and u such that
  - k is an integer
  - u = x - k * pi / 128, and |u| < pi/256
- Step 2 - Calculate sin(u) and cos(u) in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
- Step 3 - Calculate sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u) using look-up table for sin(k*pi/128) and cos(k*pi/128).
- Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded.
- Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision.
Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST).
Provide LIBC_MATH flag so that users can set LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS to build the sin function without step 4 and 4'.

lntue · 2024-06-17T05:03:49Z

@zimmermann6

llvmbot · 2024-06-17T05:03:55Z

@llvm/pr-subscribers-libc

Author: None (lntue)

Changes

Algorithm:
- Step 1 - Range reduction: for a double precision input x, return k and u such that
  - k is an integer
  - u = x - k * pi / 128, and |u| < pi/256
- Step 2 - Calculate sin(u) and cos(u) in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
- Step 3 - Calculate sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u) using look-up table for sin(k*pi/128) and cos(k*pi/128).
- Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded.
- Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision.
Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST).
Provide LIBC_MATH flag so that users can set LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS to build the sin function without step 4 and 4'.

Patch is 65.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95736.diff

18 Files Affected:

(modified) libc/config/darwin/arm/entrypoints.txt (+1)
(modified) libc/config/linux/aarch64/entrypoints.txt (+1)
(modified) libc/config/linux/arm/entrypoints.txt (+1)
(modified) libc/config/linux/riscv/entrypoints.txt (+1)
(modified) libc/docs/math/index.rst (+1-1)
(modified) libc/src/__support/FPUtil/double_double.h (+9-1)
(modified) libc/src/__support/FPUtil/dyadic_float.h (+5-5)
(modified) libc/src/__support/macros/optimization.h (+14)
(modified) libc/src/math/generic/CMakeLists.txt (+48)
(added) libc/src/math/generic/range_reduction_double.h (+67)
(added) libc/src/math/generic/range_reduction_double_fma.h (+334)
(added) libc/src/math/generic/sin.cpp (+567)
(added) libc/src/math/generic/sincos_eval.h (+81)
(modified) libc/src/math/x86_64/CMakeLists.txt (-10)
(removed) libc/src/math/x86_64/sin.cpp (-19)
(modified) libc/test/src/math/sin_test.cpp (+93-13)
(modified) libc/test/src/math/smoke/CMakeLists.txt (+10)
(added) libc/test/src/math/smoke/sin_test.cpp (+26)

diff --git a/libc/config/darwin/arm/entrypoints.txt b/libc/config/darwin/arm/entrypoints.txt
index e1303265b9ac4..d486916a542d6 100644
--- a/libc/config/darwin/arm/entrypoints.txt
+++ b/libc/config/darwin/arm/entrypoints.txt
@@ -226,6 +226,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnl
     libc.src.math.sincosf
     libc.src.math.sinhf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sqrt
     libc.src.math.sqrtf
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index dfed6acbdf257..07d6583c0a816 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -481,6 +481,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnl
     libc.src.math.sincosf
     libc.src.math.sinhf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sqrt
     libc.src.math.sqrtf
diff --git a/libc/config/linux/arm/entrypoints.txt b/libc/config/linux/arm/entrypoints.txt
index d4f932416bd9f..5716b777c8c96 100644
--- a/libc/config/linux/arm/entrypoints.txt
+++ b/libc/config/linux/arm/entrypoints.txt
@@ -345,6 +345,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnf
     libc.src.math.scalbnl
     libc.src.math.sincosf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sinhf
     libc.src.math.sqrt
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index e12d6b3957e51..18968f5b07b59 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -489,6 +489,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnl
     libc.src.math.sincosf
     libc.src.math.sinhf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sqrt
     libc.src.math.sqrtf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index 293edd1c15100..48867d7012b51 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -314,7 +314,7 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | rsqrt     |                  |                 |                        |                      |                        | 7.12.7.9               | F.10.4.9                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| sin       | |check|          | large           |                        |                      |                        | 7.12.4.6               | F.10.1.6                   |
+| sin       | |check|          | |check|         |                        |                      |                        | 7.12.4.6               | F.10.1.6                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | sincos    | |check|          | large           |                        |                      |                        |                        |                            |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/src/__support/FPUtil/double_double.h b/libc/src/__support/FPUtil/double_double.h
index b9490b52f6b41..a94902b7ec7ae 100644
--- a/libc/src/__support/FPUtil/double_double.h
+++ b/libc/src/__support/FPUtil/double_double.h
@@ -44,7 +44,12 @@ LIBC_INLINE constexpr DoubleDouble add(const DoubleDouble &a, double b) {
   return exact_add(r.hi, r.lo + a.lo);
 }
 
-// Velkamp's Splitting for double precision.
+// Veltkamp's Splitting for double precision.
+// Note: This is the original version of Veltkamp's Splitting, which is only
+// correct for round-to-nearest mode.  See:
+//   Graillat, S.,  Lafevre, V., and Muller, J.-M., "Alternative Split Functions
+//   and Dekker's Product," ARITH'2020.
+//   http://arith2020.arithsymposium.org/resources/paper_31.pdf
 LIBC_INLINE constexpr DoubleDouble split(double a) {
   DoubleDouble r{0.0, 0.0};
   // Splitting constant = 2^ceil(prec(double)/2) + 1 = 2^27 + 1.
@@ -56,6 +61,9 @@ LIBC_INLINE constexpr DoubleDouble split(double a) {
   return r;
 }
 
+// Note: When FMA instruction is not available, the `exact_mult` function relies
+// on Veltkamp's Splitting algorithm, and is only correct for round-to-nearest
+// mode.
 LIBC_INLINE DoubleDouble exact_mult(double a, double b) {
   DoubleDouble r{0.0, 0.0};
 
diff --git a/libc/src/__support/FPUtil/dyadic_float.h b/libc/src/__support/FPUtil/dyadic_float.h
index 12a69228d36c7..9e25cc6486011 100644
--- a/libc/src/__support/FPUtil/dyadic_float.h
+++ b/libc/src/__support/FPUtil/dyadic_float.h
@@ -270,11 +270,11 @@ LIBC_INLINE constexpr DyadicFloat<Bits> quick_add(DyadicFloat<Bits> a,
 // don't need to normalize the inputs again in this function.  If the inputs are
 // not normalized, the results might lose precision significantly.
 template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(DyadicFloat<Bits> a,
-                                                  DyadicFloat<Bits> b) {
+LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(const DyadicFloat<Bits> &a,
+                                                  const DyadicFloat<Bits> &b) {
   DyadicFloat<Bits> result;
   result.sign = (a.sign != b.sign) ? Sign::NEG : Sign::POS;
-  result.exponent = a.exponent + b.exponent + int(Bits);
+  result.exponent = a.exponent + b.exponent + static_cast<int>(Bits);
 
   if (!(a.mantissa.is_zero() || b.mantissa.is_zero())) {
     result.mantissa = a.mantissa.quick_mul_hi(b.mantissa);
@@ -301,7 +301,7 @@ multiply_add(const DyadicFloat<Bits> &a, const DyadicFloat<Bits> &b,
 // Simple exponentiation implementation for printf. Only handles positive
 // exponents, since division isn't implemented.
 template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a,
+LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(const DyadicFloat<Bits> &a,
                                               uint32_t power) {
   DyadicFloat<Bits> result = 1.0;
   DyadicFloat<Bits> cur_power = a;
@@ -317,7 +317,7 @@ LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a,
 }
 
 template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> mul_pow_2(DyadicFloat<Bits> a,
+LIBC_INLINE constexpr DyadicFloat<Bits> mul_pow_2(const DyadicFloat<Bits> &a,
                                                   int32_t pow_2) {
   DyadicFloat<Bits> result = a;
   result.exponent += pow_2;
diff --git a/libc/src/__support/macros/optimization.h b/libc/src/__support/macros/optimization.h
index 59886ca44be12..05a47791deed8 100644
--- a/libc/src/__support/macros/optimization.h
+++ b/libc/src/__support/macros/optimization.h
@@ -33,4 +33,18 @@ LIBC_INLINE constexpr bool expects_bool_condition(T value, T expected) {
 #error "Unhandled compiler"
 #endif
 
+// Defining optimization options for math functions.
+// TODO: Exporting this to public generated headers?
+#define LIBC_MATH_SKIP_ACCURATE_PASS 0x01
+#define LIBC_MATH_SMALL_TABLES 0x02
+#define LIBC_MATH_NO_ERRNO 0x04
+#define LIBC_MATH_NO_EXCEPT 0x08
+#define LIBC_MATH_FAST                                                         \
+  (LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES |                     \
+   LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)
+
+#ifndef LIBC_MATH
+#define LIBC_MATH 0
+#endif // LIBC_MATH
+
 #endif // LLVM_LIBC_SRC___SUPPORT_MACROS_OPTIMIZATION_H
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index aa0069d821d0d..2e33c37dd90f0 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -135,6 +135,22 @@ add_header_library(
     libc.src.__support.common
 )
 
+add_header_library(
+  range_reduction_double
+  HDRS
+    range_reduction_double.h
+    range_reduction_double_fma.h
+  DEPENDS
+  libc.src.__support.FPUtil.double_double
+  libc.src.__support.FPUtil.dyadic_float
+  libc.src.__support.FPUtil.fp_bits
+  libc.src.__support.FPUtil.fma
+  libc.src.__support.FPUtil.multiply_add
+  libc.src.__support.FPUtil.nearest_integer
+  libc.src.__support.common
+  libc.src.__support.integer_literals
+)
+
 add_header_library(
   sincosf_utils
   HDRS
@@ -146,6 +162,15 @@ add_header_library(
     libc.src.__support.common
 )
 
+add_header_library(
+  sincos_eval
+  HDRS
+    sincos_eval.h
+  DEPENDS
+    libc.src.__support.FPUtil.double_double
+    libc.src.__support.FPUtil.multiply_add
+)
+
 add_entrypoint_object(
   cosf
   SRCS
@@ -167,6 +192,29 @@ add_entrypoint_object(
     -O3
 )
 
+add_entrypoint_object(
+  sin
+  SRCS
+    sin.cpp
+  HDRS
+    ../sin.h
+  DEPENDS
+    libc.hdr.errno_macros
+    libc.src.errno.errno
+    libc.src.__support.FPUtil.double_double
+    libc.src.__support.FPUtil.dyadic_float
+    libc.src.__support.FPUtil.fenv_impl
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.fma
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.nearest_integer
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.optimization
+  COMPILE_OPTIONS
+    -O3
+)
+
 add_entrypoint_object(
   sinf
   SRCS
diff --git a/libc/src/math/generic/range_reduction_double.h b/libc/src/math/generic/range_reduction_double.h
new file mode 100644
index 0000000000000..3cd76d722da1a
--- /dev/null
+++ b/libc/src/math/generic/range_reduction_double.h
@@ -0,0 +1,67 @@
+//===-- Range reduction for double precision sin/cos/tan --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
+#define LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+using fputil::DoubleDouble;
+
+LIBC_INLINE constexpr int FAST_PASS_EXPONENT = 23;
+
+// Digits of pi/128, generated by Sollya with:
+// > a = round(pi/128, D, RN);
+// > b = round(pi/128 - a, D, RN);
+LIBC_INLINE constexpr DoubleDouble PI_OVER_128 = {0x1.1a62633145c07p-60,
+                                                  0x1.921fb54442d18p-6};
+
+// Digits of -pi/128, generated by Sollya with:
+// > a = round(pi/128, 25, RN);
+// > b = round(pi/128 - a, 23, RN);
+// > c = round(pi/128 - a - b, 25, RN);
+// > d = round(pi/128 - a - b - c, D, RN);
+// The precisions of the parts are chosen so that:
+// 1)  k * a, k * b, k * c are exact in double precision
+// 2)  k * b + fractional part of (k * a) is exact in double precsion
+LIBC_INLINE constexpr double MPI_OVER_128[4] = {
+    -0x1.921fb5p-6, -0x1.110b48p-32, +0x1.ee59dap-56, -0x1.98a2e03707345p-83};
+
+LIBC_INLINE constexpr double ONE_TWENTY_EIGHT_OVER_PI_D = 0x1.45f306dc9c883p5;
+
+namespace generic {
+
+LIBC_INLINE int range_reduction_small(double x, DoubleDouble &u) {
+  double prod_hi = x * ONE_TWENTY_EIGHT_OVER_PI_D;
+  double kd = fputil::nearest_integer(prod_hi);
+  int k = static_cast<int>(kd);
+
+  // x - k * (pi/128)
+  double c = fputil::multiply_add(kd, MPI_OVER_128[0], x);    // Exact
+  double y_hi = fputil::multiply_add(kd, MPI_OVER_128[1], c); // Exact
+  double y_lo = fputil::multiply_add(kd, MPI_OVER_128[2], kd * MPI_OVER_128[3]);
+  u = fputil::exact_add(y_hi, y_lo);
+
+  return k;
+}
+
+// TODO: Implement generic's range_reduction_large correctly rounded for all
+// rounding modes.  The current fma's range_reduction_large only works for
+// round-to-nearest without FMA instruction.
+
+} // namespace generic
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
diff --git a/libc/src/math/generic/range_reduction_double_fma.h b/libc/src/math/generic/range_reduction_double_fma.h
new file mode 100644
index 0000000000000..dafcefe4ef72e
--- /dev/null
+++ b/libc/src/math/generic/range_reduction_double_fma.h
@@ -0,0 +1,334 @@
+//===-- Range reduction for double precision sin/cos/tan w/ FMA -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_FMA_H
+#define LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_FMA_H
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/dyadic_float.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/common.h"
+#include "src/__support/integer_literals.h"
+
+namespace LIBC_NAMESPACE {
+
+namespace fma {
+
+using fputil::DoubleDouble;
+using Float128 = fputil::DyadicFloat<128>;
+
+LIBC_INLINE constexpr int FAST_PASS_EXPONENT = 32;
+
+// Digits of pi/128, generated by Sollya with:
+// > a = round(pi/128, D, RN);
+// > b = round(pi/128 - a, D, RN);
+LIBC_INLINE constexpr DoubleDouble PI_OVER_128 = {0x1.1a62633145c07p-60,
+                                                  0x1.921fb54442d18p-6};
+LIBC_INLINE constexpr Float128 PI_OVER_128_F128 = {
+    Sign::POS, -133, 0xc90f'daa2'2168'c234'c4c6'628b'80dc'1cd1_u128};
+
+// Digits of 2^(16*i) / pi, generated by Sollya with:
+// For [2..62]:
+// > for i from 3 to 63 do {
+//     pi_inv = 2^(16*(i - 3)) / pi;
+//     pn = nearestint(pi_inv);
+//     pi_frac = pi_inv - pn;
+//     a = round(pi_frac, D, RN);
+//     b = round(pi_frac - a, D, RN);
+//     c = round(pi_frac - a - b, D, RN);
+//     d = round(pi_frac - a - b - c, D, RN);
+//     print("{", 2^7 * a, ",", 2^7 * b, ",", 2^7 * c, ",", 2^7 * d, "},");
+//   };
+// For [0..1]:
+// The leading bit of 2^(16*(i - 3)) / pi is very small, so we add 0.25 so that
+// the conditions for the algorithms are still satisfied, and one of those
+// conditions guarantees that ulp(0.25 * x_reduced) >= 2, and will safely be
+// discarded.
+// > for i from 0 to 2 do {
+//     pi_frac = 0.25 + 2^(16*(i - 3)) / pi;
+//     a = round(pi_frac, D, RN);
+//     b = round(pi_frac - a, D, RN);
+//     c = round(pi_frac - a - b, D, RN);
+//     d = round(pi_frac - a - b - c, D, RN);
+//     print("{", 2^7 * a, ",", 2^7 * b, ",", 2^7 * c, ",", 2^7 * d, "},");
+//   };
+// For The fast pass using double-double, we only need 3 parts (a, b, c), but
+// for the accurate pass using Float128, instead of using another table of
+// Float128s, we simply add the fourth path (a, b, c, d), which simplify the
+// implementation a bit and saving some memory.
+LIBC_INLINE constexpr double ONE_TWENTY_EIGHT_OVER_PI[64][4] = {
+    {0x1.0000000000014p5, 0x1.7cc1b727220a9p-49, 0x1.3f84eafa3ea6ap-103,
+     -0x1.11f924eb53362p-157},
+    {0x1.0000000145f3p5, 0x1.b727220a94fe1p-49, 0x1.d5f47d4d37703p-104,
+     0x1.b6295993c439p-158},
+    {0x1.000145f306dcap5, -0x1.bbead603d8a83p-50, 0x1.f534ddc0db629p-106,
+     0x1.664f10e4107f9p-160},
+    {0x1.45f306dc9c883p5, -0x1.6b01ec5417056p-49, -0x1.6447e493ad4cep-103,
+     0x1.e21c820ff28b2p-157},
+    {-0x1.f246c6efab581p4, 0x1.3abe8fa9a6eep-53, 0x1.b6c52b3278872p-107,
+     0x1.07f9458eaf7afp-164},
+    {0x1.391054a7f09d6p4, -0x1.70565911f924fp-53, 0x1.2b3278872084p-107,
+     -0x1.ae9c5421443aap-162},
+    {0x1.529fc2757d1f5p2, 0x1.a6ee06db14acdp-53, -0x1.8778df7c035d4p-107,
+     0x1.d5ef5de2b0db9p-161},
+    {-0x1.ec54170565912p-1, 0x1.b6c52b3278872p-59, 0x1.07f9458eaf7afp-116,
+     -0x1.d4f246dc8e2dfp-173},
+    {-0x1.505c1596447e5p5, 0x1.b14acc9e21c82p-49, 0x1.fe5163abdebbcp-106,
+     0x1.586dc91b8e909p-160},
+    {-0x1.596447e493ad5p1, 0x1.93c439041fe51p-54, 0x1.8eaf7aef1586ep-108,
+     -0x1.b7238b7b645a4p-163},
+    {0x1.bb81b6c52b328p5, -0x1.de37df00d74e3p-49, 0x1.7bd778ac36e49p-103,
+     -0x1.1c5bdb22d1ffap-158},
+    {0x1.b6c52b3278872p5, 0x1.07f9458eaf7afp-52, -0x1.d4f246dc8e2dfp-109,
+     0x1.374b801924bbbp-164},
+    {0x1.2b3278872084p5, -0x1.ae9c5421443aap-50, 0x1.b7246e3a424ddp-106,
+     0x1.700324977504fp-161},
+    {-0x1.8778df7c035d4p5, 0x1.d5ef5de2b0db9p-49, 0x1.1b8e909374b8p-104,
+     0x1.924bba8274648p-160},
+    {-0x1.bef806ba71508p4, -0x1.443a9e48db91cp-50, -0x1.6f6c8b47fe6dbp-104,
+     -0x1.115f62e6de302p-158},
+    {-0x1.ae9c5421443aap-2, 0x1.b7246e3a424ddp-58, 0x1.700324977504fp-113,
+     -0x1.cdbc603c429c7p-167},
+    {-0x1.38a84288753c9p5, -0x1.b7238b7b645a4p-51, 0x1.924bba8274648p-112,
+     0x1.cfe1deb1cb12ap-166},
+    {-0x1.0a21d4f246dc9p3, 0x1.d2126e9700325p-53, -0x1.a22bec5cdbc6p-107,
+     -0x1.e214e34ed658cp-162},
+    {-0x1.d4f246dc8e2dfp3, 0x1.374b801924bbbp-52, -0x1.f62e6de301e21p-106,
+     -0x1.38d3b5963045ep-160},
+    {-0x1.236e4716f6c8bp4, -0x1.1ff9b6d115f63p-50, 0x1.921cfe1deb1cbp-106,
+     0x1.29a73ee88235fp-162},
+    {0x1.b8e909374b802p4, -0x1.b6d115f62e6dep-50, -0x1.80f10a71a76b3p-105,
+     0x1.cfba208d7d4bbp-160},
+    {0x1.09374b801924cp4, -0x1.15f62e6de301ep-50, -0x1.0a71a76b2c609p-105,
+     0x1.1046bea5d7689p-159},
+    {-0x1.68ffcdb688afbp3, -0x1.736f180f10a72p-53, 0x1.62534e7dd1047p-107,
+     -0x1.0568a25dbd8b3p-161},
+    {0x1.924bba8274648p0, 0x1.cfe1deb1cb12ap-54, -0x1.63045df7282b4p-108,
+     -0x1.44bb7b16638fep-162},
+    {-0x1.a22bec5cdbc6p5, -0x1.e214e34ed658cp-50, -0x1.177dca0ad144cp-106,
+     0x1.213a671c09ad1p-160},
+    {0x1.3a32439fc3bd6p1, 0x1.cb129a73ee882p-54, 0x1.afa975da24275p-109,
+     -0x1.8e3f652e8207p-164},
+    {-0x1.b78c0788538d4p4, 0x1.29a73ee88235fp-50, 0x1.4baed1213a672p-104,
+     -0x1.fb29741037d8dp-159},
+    {0x1.fc3bd63962535p5, -0x1.822efb9415a29p-51, 0x1.a24274ce38136p-105,
+     -0x1.741037d8cdc54p-159},
+    {-0x1.4e34ed658c117p2, -0x1.f7282b4512edfp-52, 0x1.d338e04d68bfp-107,
+     -0x1.bec66e29c67cbp-162},
+    {0x1.62534e7dd1047p5, -0x1.0568a25dbd8b3p-49, -0x1.c7eca5d040df6p-105,
+     -0x1.9b8a719f2b318p-160},
+    {-0x1.63045df7282b4p4, -0x1.44bb7b16638fep-50, 0x1.ad17df904e647p-104,
+     0x1.639835339f49dp-158},
+    {0x1.d1046bea5d769p5, -0x1.bd8b31c7eca5dp-49, -0x1.037d8cdc538dp-107,
+     0x1.a99cfa4e422fcp-161},
+    {0x1.afa975da24275p3, -0x1.8e3f652e8207p-52, 0x1.3991d63983534p-106,
+     -0x1.82d8dee81d108p-160},
+    {-0x1.a28976f62cc72p5, 0x1.35a2fbf209cc9p-53, -0x1.4e33e566305b2p-109,
+     0x1.08bf177bf2507p-163},
+    {-0x1.76f62cc71fb29p5, -0x1.d040df633714ep-49, -0x1.9f2b3182d8defp-104,
+     0x1.f8bbdf9283b2p-158},
+    {0x1.d338e04d68bfp5, -0x1.bec66e29c67cbp-50, 0x1.9cfa4e422fc5ep-105,
+     -0x1.036be27003b4p-161},
+    {0x1.c09ad17df904ep4, 0x1.91d639835339fp-50, 0x1.272117e2ef7e5p-104,
+     -0x1.7c4e007680022p-158},
+    {0x1.68befc827323bp5, -0x1.c67cacc60b638p-50, 0x1.17e2ef7e4a0ecp-104,
+     0x1.ff897ffde0598p-158},
+    {-0x1.037d8cdc538dp5, 0x1.a99cfa4e422fcp-49, 0x1.77bf250763ff1p-103,
+     0x1.7ffde05980fefp-158},
+    {-0x1.8cdc538cf9599p5, 0x1.f49c845f8bbep-50, -0x1.b5f13801da001p-104,
+     0x1.e05980fef2f12p-158},
+    {-0x1.4e33e566305b2p3, 0x1.08bf177bf2507p-51, 0x1.8ffc4bffef02dp-105,
+     -0x1.fc04343b9d298p-160},
+    {-0x1.f2b3182d8dee8p4, -0x1.d1081b5f13802p-52, 0x1.2fffbc0b301fep-107,
+     -0x1.a1dce94beb25cp-163},
+    {-0x1.8c16c6f740e88p5, -0x1.036be27003b4p-49, -0x1.0fd33f8086877p-109,
+     -0x1.d297d64b824b2p-164},
+    {0x1.3908bf177bf25p5, 0x1.d8ffc4bffef03p-53, -0x1.9fc04343b9d29p-108,
+     -0x1.f592e092c9813p-162},
+    {0x1.7e2ef7e4a0ec8p4, -0x1.da00087e99fcp-56, -0x1.0d0ee74a5f593p-110,
+     0x1.f6d367ecf27cbp-166},
+    {-0x1.081b5f13801dap4, -0x1.0fd33f8086877p-61, -0x1.d297d64b824b2p-116,
+     -0x1.8130d834f648bp-170},
+    {-0x1.af89c00ed0004p5, -0x1.fa67f010d0ee7p-50, -0x1.297d64b824b26p-104,
+     -0x1.30d834f648b0cp-162},
+    {-0x1.c00ed00043f4dp5, 0x1.fde5e2316b415p-55, -0x1.2e092c98130d8p-110,
+     -0x1.a7b24585ce...
[truncated]

lntue · 2024-06-17T05:19:57Z

Performance tests with the CORE-MATH project for FMA version:

For inputs in [-pi, pi]
Reciprocal Throughput

CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 38.193 + 0.560 clc/call; Median-Min = 0.364 clc/call; Max = 40.305 clc/call;
--- System LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 24.891 + 1.007 clc/call; Median-Min = 0.888 clc/call; Max = 26.638 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 22.702 + 0.400 clc/call; Median-Min = 0.270 clc/call; Max = 25.537 clc/call;

Latency:

CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH latency ---
[####################] 100 %
Ntrial = 20 ; Min = 73.839 + 0.949 clc/call; Median-Min = 0.807 clc/call; Max = 75.513 clc/call;
--- System LIBC latency ---
[####################] 100 %
Ntrial = 20 ; Min = 48.495 + 2.750 clc/call; Median-Min = 2.026 clc/call; Max = 53.403 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 58.582 + 0.534 clc/call; Median-Min = 0.537 clc/call; Max = 59.231 clc/call;

For inputs ~ 2^100
Reciprocal Throughput

CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 78.974 + 0.237 clc/call; Median-Min = 0.214 clc/call; Max = 81.168 clc/call;
--- System LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 174.806 + 0.875 clc/call; Median-Min = 0.886 clc/call; Max = 176.161 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 37.598 + 0.094 clc/call; Median-Min = 0.081 clc/call; Max = 38.052 clc/call;

Latency

CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH latency ---
[####################] 100 %
Ntrial = 20 ; Min = 114.356 + 0.779 clc/call; Median-Min = 0.819 clc/call; Max = 115.353 clc/call;
--- System LIBC latency ---
[####################] 100 %
Ntrial = 20 ; Min = 187.295 + 0.258 clc/call; Median-Min = 0.250 clc/call; Max = 189.725 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 85.237 + 0.507 clc/call; Median-Min = 0.521 clc/call; Max = 85.883 clc/call

zimmermann6 · 2024-06-17T08:59:54Z

it seems the inexact exception is not handled:

zimmerma@croissant:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=$L ./check.sh sin
Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.86fbbc688f0e9p-24 (y=0x1.86fbbc688f0ep-24)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.fe945340d1525p-22 (y=0x1.fe945340d13d3p-22)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

zimmermann6 · 2024-06-17T09:04:54Z

On a Intel(R) Xeon(R) Silver 4214 I get:

zimmerma@croissant:~/svn/core-math$ LIBM=$L CORE_MATH_PERF_MODE=rdtsc ./perf.sh sin
GNU libc version: 2.38
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 61.961 + 0.506 clc/call; Median-Min = 0.548 clc/call; Max = 62.771 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 32.978 + 0.803 clc/call; Median-Min = 0.627 clc/call; Max = 35.687 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 30.249 + 0.946 clc/call; Median-Min = 0.809 clc/call; Max = 32.471 clc/call;
zimmerma@croissant:~/svn/core-math$ LIBM=$L CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sin
GNU libc version: 2.38
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 88.747 + 0.985 clc/call; Median-Min = 0.854 clc/call; Max = 90.962 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 58.363 + 0.368 clc/call; Median-Min = 0.270 clc/call; Max = 59.351 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 62.464 + 0.795 clc/call; Median-Min = 0.687 clc/call; Max = 64.072 clc/call;

jhuber6 · 2024-06-17T11:57:12Z

How does this perform on targets without 128-bit float support? Guessing those skip the final accurate pass?

lntue · 2024-06-17T13:24:08Z

How does this perform on targets without 128-bit float support? Guessing those skip the final accurate pass?

The 128-bit precision floating point type that we use for accurate pass is not quad precision, and it is emulated using uint64_t internally: https://github.com/llvm/llvm-project/blob/main/libc/src/__support/FPUtil/dyadic_float.h#L33

…ystem malloc in overlay mode. (#95845) This causes an issue in overlay mode: #95736 (comment)

lntue · 2024-06-17T21:27:03Z

it seems the inexact exception is not handled:

zimmerma@croissant:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=$L ./check.sh sin
Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.86fbbc688f0e9p-24 (y=0x1.86fbbc688f0ep-24)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.fe945340d1525p-22 (y=0x1.fe945340d13d3p-22)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

@zimmermann6 : Do you mind sync to the latest update and check again? It was caused by some unrelated changes. Thanks!

zimmermann6 · 2024-06-18T06:59:17Z

unless I did something wrong, I still get missing inexact exceptions:

Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.de85173083242p-20 (y=0x1.de851730820d8p-20)
Missing inexact exception for x=0x1.8d06555815a2cp+9 (y=0x1.65b10b8443ecep-1)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

lntue · 2024-06-18T11:38:10Z

unless I did something wrong, I still get missing inexact exceptions:

Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.de85173083242p-20 (y=0x1.de851730820d8p-20)
Missing inexact exception for x=0x1.8d06555815a2cp+9 (y=0x1.65b10b8443ecep-1)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

Can you apply #95845 to your repo and rerun the cmake + ninja commands?
Thanks,

zimmermann6 · 2024-06-18T13:01:09Z

thanks to the help of Tue, now all tests do pass (including the check of the inexact exception)

nickdesaulniers

Not finished reviewing, but I'm in and out of meetings all day. Posting what I have for now.

libc/src/math/generic/range_reduction_double.h

libc/src/math/generic/range_reduction_double_fma.h

libc/src/math/generic/sin.cpp

…rounding modes.

libc/src/math/generic/range_reduction_double.h

libc/src/math/generic/range_reduction_double_fma.h

libc/src/math/generic/range_reduction_double.h

libc/src/math/x86_64/sin.cpp

libc/test/src/math/sin_test.cpp

…ounding modes.

lntue · 2024-06-21T05:06:37Z

Thanks @zimmermann6 for letting me know about the recent results at: https://inria.hal.science/hal-04480440.

With the insight from that paper, I was able to make non-FMA targets also correctly rounded to all rounding modes.

So I rearrange the range reduction headers, with range_reduction_double_fma.h and range_reduction_double_nofma.h contain their own specialized look-up tables, and fast small range reduction, while range_reduction_double_common.h contains double-double large range reduction and 128-bit precision small and large range reductions.

zimmermann6 · 2024-06-21T05:38:37Z

I still have a question. Since the full set of hard-to-round cases for sin is not known, in CORE-MATH we issue an error message when we are not able to guarantee correct rounding. How do you deal with this issue?

lntue · 2024-06-21T14:08:05Z

I still have a question. Since the full set of hard-to-round cases for sin is not known, in CORE-MATH we issue an error message when we are not able to guarantee correct rounding. How do you deal with this issue?

@zimmermann6 : right now we will probably rely on fuzz testing for it, comparing against MPFR outputs.

@nickdesaulniers : shall we add accuracy assertion under #ifndef NDEBUG guard?

nickdesaulniers · 2024-06-21T15:27:52Z

@nickdesaulniers : shall we add accuracy assertion under #ifndef NDEBUG guard?

Yes, I'm a big fan of -DLLVM_ENABLE_ASSERTIONS=ON. We also have LIBC_ASSERT (src/__support/libc_assert.h). The intention of -DLLVM_ENABLE_ASSERTIONS=ON is to trade some performance overhead for additional safety checks. Assertions builds are generally much faster than -DCMAKE_BUILD_TYPE=Debug builds of the rest of LLVM in my experience, so I usually develop with assertions enabled but not a full blow debug build (which takes longer to compiler and link).

libc/src/math/generic/range_reduction_double_common.h

lntue · 2024-06-24T04:48:40Z

@nickdesaulniers : shall we add accuracy assertion under #ifndef NDEBUG guard?

Yes, I'm a big fan of -DLLVM_ENABLE_ASSERTIONS=ON. We also have LIBC_ASSERT (src/__support/libc_assert.h). The intention of -DLLVM_ENABLE_ASSERTIONS=ON is to trade some performance overhead for additional safety checks. Assertions builds are generally much faster than -DCMAKE_BUILD_TYPE=Debug builds of the rest of LLVM in my experience, so I usually develop with assertions enabled but not a full blow debug build (which takes longer to compiler and link).

I created an issue (#96452) and added a TODO to provide an exact error bound for the accurate pass and add assertion.

nickdesaulniers · 2024-06-24T21:34:45Z

libc/test/src/math/sin_test.cpp

@@ -36,7 +36,8 @@ TEST_F(LlvmLibcSinTest, TrickyInputs) {
      0x1.e639103a05997p+2,    0x1.13114266f9764p+4,   -0x1.3eec5912ea7cdp+331,
      0x1.08087e9aad90bp+887,  0x1.2b5fe88a9d8d5p+903, -0x1.a880417b7b119p+1023,
      -0x1.6deb37da81129p+205, 0x1.08087e9aad90bp+887, 0x1.f6d7518808571p+1023,
-      -0x1.8bb5847d49973p+845, 0x1.f08b14e1c4d0fp+890,
+      -0x1.8bb5847d49973p+845, 0x1.f08b14e1c4d0fp+890, 0x1.6ac5b262ca1ffp+849,
+      0x1.e0000000001c2p-20,


What's up with the two additional constants?

Two extra worst cases that popped up with the CORE-MATH test suites when I was playing around with the error bounds for fast passes for non-FMA targets.

…rounding modes. (#96591) Sharing the same algorithm as double precision sin: #95736

…ll rounding modes. (#96719) Sharing the same algorithm as double precision sin: #95736 and cos: #96591

…recision sincos correctly rounded to all rounding modes. (#96719) Sharing the same algorithm as double precision sin: llvm/llvm-project#95736 and cos: llvm/llvm-project#96591 GitOrigin-RevId: 8eda5e8754fda4422617d069315bd0ce34d99b0d Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993 Roller-URL: https://ci.chromium.org/b/8743956522971831761 CQ-Do-Not-Cancel-Tryjobs: true Change-Id: Ic8e99f78feabb9deb3ac49fc04608cfadfc16a66 Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1073452

… sincos correctly rounded to all rounding modes. (#96719) Sharing the same algorithm as double precision sin: llvm/llvm-project#95736 and cos: llvm/llvm-project#96591 GitOrigin-RevId: 8eda5e8754fda4422617d069315bd0ce34d99b0d Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993 Change-Id: Ie2bd882ad9952a0969feb84edcf1daffab6901b4

…] Implement double precision sincos correctly rounded to all rounding modes. (#96719) Sharing the same algorithm as double precision sin: llvm/llvm-project#95736 and cos: llvm/llvm-project#96591 GitOrigin-RevId: 6e644e80df2ddd3b6292ca7f77c48922a7848a1c Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993 Roller-URL: https://ci.chromium.org/b/8743956522971831761 CQ-Do-Not-Cancel-Tryjobs: true Original-Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1073452 Original-Revision: 660fa89a50c932fbfcacafdc59eab859631e4390 Change-Id: Iba8f96b2ab385de0dc2388226870f9c5af969b2b

…ll rounding modes. (llvm#96719) Sharing the same algorithm as double precision sin: llvm#95736 and cos: llvm#96591

…rounding modes. (llvm#95736) - Algorithm: - Step 1 - Range reduction: for a double precision input `x`, return `k` and `u` such that - k is an integer - u = x - k * pi / 128, and |u| < pi/256 - Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA. - Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`. - Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded. - Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision. - Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST). - Provide `LIBC_MATH` flag so that users can set `LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4 and 4'.

…rounding modes. (llvm#96591) Sharing the same algorithm as double precision sin: llvm#95736

…ll rounding modes. (llvm#96719) Sharing the same algorithm as double precision sin: llvm#95736 and cos: llvm#96591

lntue requested review from michaelrj-google and nickdesaulniers June 17, 2024 05:03

llvmbot added the libc label Jun 17, 2024

lntue requested a review from petrhosek June 17, 2024 05:20

lntue mentioned this pull request Jun 17, 2024

[libc][stdlib] Only add internal malloc in full build mode. Use the system malloc in overlay mode. #95845

Merged

lntue added a commit that referenced this pull request Jun 17, 2024

[libc][stdlib] Only add internal malloc in full build mode. Use the s…

a50bcc0

…ystem malloc in overlay mode. (#95845) This causes an issue in overlay mode: #95736 (comment)

lntue force-pushed the sin branch from 5e20d89 to c30826c Compare June 17, 2024 21:25

nickdesaulniers reviewed Jun 18, 2024

View reviewed changes

shiltian mentioned this pull request Jun 18, 2024

[MLIR] Fix an assert that contains a mistake in conditional operator (#95668) shiltian/llvm-project#6

Closed

lntue added 3 commits June 19, 2024 13:43

[libc][math] Implement double precision sin correctly rounded to all …

ffaf0ba

…rounding modes.

Fix small inputs for non-FMA targets.

6210cca

Address comments.

696f27a

lntue force-pushed the sin branch from c30826c to 696f27a Compare June 19, 2024 13:45

nickdesaulniers reviewed Jun 20, 2024

View reviewed changes

Address comments and make non-FMA targets correctly rounded for all r…

6913aa9

…ounding modes.

nickdesaulniers reviewed Jun 21, 2024

View reviewed changes

libc/src/math/generic/range_reduction_double_common.h Outdated Show resolved Hide resolved

libc/src/math/generic/range_reduction_double_common.h Outdated Show resolved Hide resolved

libc/src/math/generic/range_reduction_double_common.h Outdated Show resolved Hide resolved

lntue and others added 2 commits June 22, 2024 04:10

Refactor large range reduction.

927687e

Add TODO to add assertion for accuracy tests in debug mode.

93999a0

nickdesaulniers approved these changes Jun 24, 2024

View reviewed changes

lntue merged commit 16903ac into llvm:main Jun 24, 2024
7 checks passed

lntue deleted the sin branch June 24, 2024 21:57

lntue mentioned this pull request Jun 25, 2024

[libc][math] Implement double precision cos correctly rounded to all rounding modes. #96591

Merged

lntue added a commit that referenced this pull request Jun 25, 2024

[libc][math] Implement double precision cos correctly rounded to all …

88f80ae

…rounding modes. (#96591) Sharing the same algorithm as double precision sin: #95736

lntue mentioned this pull request Jun 26, 2024

[libc][math] Implement double precision sincos correctly rounded to all rounding modes. #96719

Merged

lntue added a commit that referenced this pull request Jun 27, 2024

[libc][math] Implement double precision sincos correctly rounded to a…

4080f17

…ll rounding modes. (#96719) Sharing the same algorithm as double precision sin: #95736 and cos: #96591

AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024

[libc][math] Implement double precision cos correctly rounded to all …

8c4a7e9

…rounding modes. (llvm#96591) Sharing the same algorithm as double precision sin: llvm#95736

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736

[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736

lntue commented Jun 17, 2024

lntue commented Jun 17, 2024

llvmbot commented Jun 17, 2024

lntue commented Jun 17, 2024

zimmermann6 commented Jun 17, 2024

zimmermann6 commented Jun 17, 2024 •

edited

Loading

jhuber6 commented Jun 17, 2024

lntue commented Jun 17, 2024

lntue commented Jun 17, 2024

zimmermann6 commented Jun 18, 2024

lntue commented Jun 18, 2024

zimmermann6 commented Jun 18, 2024

nickdesaulniers left a comment

lntue commented Jun 21, 2024

zimmermann6 commented Jun 21, 2024

lntue commented Jun 21, 2024

nickdesaulniers commented Jun 21, 2024

lntue commented Jun 24, 2024

nickdesaulniers Jun 24, 2024

lntue Jun 24, 2024

[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736

[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736

Conversation

lntue commented Jun 17, 2024

lntue commented Jun 17, 2024

llvmbot commented Jun 17, 2024

lntue commented Jun 17, 2024

zimmermann6 commented Jun 17, 2024

zimmermann6 commented Jun 17, 2024 • edited Loading

jhuber6 commented Jun 17, 2024

lntue commented Jun 17, 2024

lntue commented Jun 17, 2024

zimmermann6 commented Jun 18, 2024

lntue commented Jun 18, 2024

zimmermann6 commented Jun 18, 2024

nickdesaulniers left a comment

Choose a reason for hiding this comment

lntue commented Jun 21, 2024

zimmermann6 commented Jun 21, 2024

lntue commented Jun 21, 2024

nickdesaulniers commented Jun 21, 2024

lntue commented Jun 24, 2024

nickdesaulniers Jun 24, 2024

Choose a reason for hiding this comment

lntue Jun 24, 2024

Choose a reason for hiding this comment

zimmermann6 commented Jun 17, 2024 •

edited

Loading