Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc][math] Implement double precision sin correctly rounded to all rounding modes. #95736

Merged
merged 6 commits into from
Jun 24, 2024

Conversation

lntue
Copy link
Contributor

@lntue lntue commented Jun 17, 2024

  • Algorithm:
    • Step 1 - Range reduction: for a double precision input x, return k and u such that
      • k is an integer
      • u = x - k * pi / 128, and |u| < pi/256
    • Step 2 - Calculate sin(u) and cos(u) in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
    • Step 3 - Calculate sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u) using look-up table for sin(k*pi/128) and cos(k*pi/128).
    • Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded.
    • Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision.
  • Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST).
  • Provide LIBC_MATH flag so that users can set LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS to build the sin function without step 4 and 4'.

@lntue
Copy link
Contributor Author

lntue commented Jun 17, 2024

@zimmermann6

@llvmbot
Copy link
Collaborator

llvmbot commented Jun 17, 2024

@llvm/pr-subscribers-libc

Author: None (lntue)

Changes
  • Algorithm:
    • Step 1 - Range reduction: for a double precision input x, return k and u such that
      • k is an integer
      • u = x - k * pi / 128, and |u| < pi/256
    • Step 2 - Calculate sin(u) and cos(u) in double-double using Taylor polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
    • Step 3 - Calculate sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) * sin(u) using look-up table for sin(k*pi/128) and cos(k*pi/128).
    • Step 4 - Use Ziv's rounding test to decide if the result is correctly rounded.
    • Step 4' - If the Ziv's rounding test failed, redo step 1-3 using 128-bit precision.
  • Currently, without FMA instructions, the large range reduction only works correctly for the default rounding mode (FE_TONEAREST).
  • Provide LIBC_MATH flag so that users can set LIBC_MATH = LIBC_MATH_SKIP_ACCURATE_PASS to build the sin function without step 4 and 4'.

Patch is 65.27 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/95736.diff

18 Files Affected:

  • (modified) libc/config/darwin/arm/entrypoints.txt (+1)
  • (modified) libc/config/linux/aarch64/entrypoints.txt (+1)
  • (modified) libc/config/linux/arm/entrypoints.txt (+1)
  • (modified) libc/config/linux/riscv/entrypoints.txt (+1)
  • (modified) libc/docs/math/index.rst (+1-1)
  • (modified) libc/src/__support/FPUtil/double_double.h (+9-1)
  • (modified) libc/src/__support/FPUtil/dyadic_float.h (+5-5)
  • (modified) libc/src/__support/macros/optimization.h (+14)
  • (modified) libc/src/math/generic/CMakeLists.txt (+48)
  • (added) libc/src/math/generic/range_reduction_double.h (+67)
  • (added) libc/src/math/generic/range_reduction_double_fma.h (+334)
  • (added) libc/src/math/generic/sin.cpp (+567)
  • (added) libc/src/math/generic/sincos_eval.h (+81)
  • (modified) libc/src/math/x86_64/CMakeLists.txt (-10)
  • (removed) libc/src/math/x86_64/sin.cpp (-19)
  • (modified) libc/test/src/math/sin_test.cpp (+93-13)
  • (modified) libc/test/src/math/smoke/CMakeLists.txt (+10)
  • (added) libc/test/src/math/smoke/sin_test.cpp (+26)
diff --git a/libc/config/darwin/arm/entrypoints.txt b/libc/config/darwin/arm/entrypoints.txt
index e1303265b9ac4..d486916a542d6 100644
--- a/libc/config/darwin/arm/entrypoints.txt
+++ b/libc/config/darwin/arm/entrypoints.txt
@@ -226,6 +226,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnl
     libc.src.math.sincosf
     libc.src.math.sinhf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sqrt
     libc.src.math.sqrtf
diff --git a/libc/config/linux/aarch64/entrypoints.txt b/libc/config/linux/aarch64/entrypoints.txt
index dfed6acbdf257..07d6583c0a816 100644
--- a/libc/config/linux/aarch64/entrypoints.txt
+++ b/libc/config/linux/aarch64/entrypoints.txt
@@ -481,6 +481,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnl
     libc.src.math.sincosf
     libc.src.math.sinhf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sqrt
     libc.src.math.sqrtf
diff --git a/libc/config/linux/arm/entrypoints.txt b/libc/config/linux/arm/entrypoints.txt
index d4f932416bd9f..5716b777c8c96 100644
--- a/libc/config/linux/arm/entrypoints.txt
+++ b/libc/config/linux/arm/entrypoints.txt
@@ -345,6 +345,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnf
     libc.src.math.scalbnl
     libc.src.math.sincosf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sinhf
     libc.src.math.sqrt
diff --git a/libc/config/linux/riscv/entrypoints.txt b/libc/config/linux/riscv/entrypoints.txt
index e12d6b3957e51..18968f5b07b59 100644
--- a/libc/config/linux/riscv/entrypoints.txt
+++ b/libc/config/linux/riscv/entrypoints.txt
@@ -489,6 +489,7 @@ set(TARGET_LIBM_ENTRYPOINTS
     libc.src.math.scalbnl
     libc.src.math.sincosf
     libc.src.math.sinhf
+    libc.src.math.sin
     libc.src.math.sinf
     libc.src.math.sqrt
     libc.src.math.sqrtf
diff --git a/libc/docs/math/index.rst b/libc/docs/math/index.rst
index 293edd1c15100..48867d7012b51 100644
--- a/libc/docs/math/index.rst
+++ b/libc/docs/math/index.rst
@@ -314,7 +314,7 @@ Higher Math Functions
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | rsqrt     |                  |                 |                        |                      |                        | 7.12.7.9               | F.10.4.9                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
-| sin       | |check|          | large           |                        |                      |                        | 7.12.4.6               | F.10.1.6                   |
+| sin       | |check|          | |check|         |                        |                      |                        | 7.12.4.6               | F.10.1.6                   |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
 | sincos    | |check|          | large           |                        |                      |                        |                        |                            |
 +-----------+------------------+-----------------+------------------------+----------------------+------------------------+------------------------+----------------------------+
diff --git a/libc/src/__support/FPUtil/double_double.h b/libc/src/__support/FPUtil/double_double.h
index b9490b52f6b41..a94902b7ec7ae 100644
--- a/libc/src/__support/FPUtil/double_double.h
+++ b/libc/src/__support/FPUtil/double_double.h
@@ -44,7 +44,12 @@ LIBC_INLINE constexpr DoubleDouble add(const DoubleDouble &a, double b) {
   return exact_add(r.hi, r.lo + a.lo);
 }
 
-// Velkamp's Splitting for double precision.
+// Veltkamp's Splitting for double precision.
+// Note: This is the original version of Veltkamp's Splitting, which is only
+// correct for round-to-nearest mode.  See:
+//   Graillat, S.,  Lafevre, V., and Muller, J.-M., "Alternative Split Functions
+//   and Dekker's Product," ARITH'2020.
+//   http://arith2020.arithsymposium.org/resources/paper_31.pdf
 LIBC_INLINE constexpr DoubleDouble split(double a) {
   DoubleDouble r{0.0, 0.0};
   // Splitting constant = 2^ceil(prec(double)/2) + 1 = 2^27 + 1.
@@ -56,6 +61,9 @@ LIBC_INLINE constexpr DoubleDouble split(double a) {
   return r;
 }
 
+// Note: When FMA instruction is not available, the `exact_mult` function relies
+// on Veltkamp's Splitting algorithm, and is only correct for round-to-nearest
+// mode.
 LIBC_INLINE DoubleDouble exact_mult(double a, double b) {
   DoubleDouble r{0.0, 0.0};
 
diff --git a/libc/src/__support/FPUtil/dyadic_float.h b/libc/src/__support/FPUtil/dyadic_float.h
index 12a69228d36c7..9e25cc6486011 100644
--- a/libc/src/__support/FPUtil/dyadic_float.h
+++ b/libc/src/__support/FPUtil/dyadic_float.h
@@ -270,11 +270,11 @@ LIBC_INLINE constexpr DyadicFloat<Bits> quick_add(DyadicFloat<Bits> a,
 // don't need to normalize the inputs again in this function.  If the inputs are
 // not normalized, the results might lose precision significantly.
 template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(DyadicFloat<Bits> a,
-                                                  DyadicFloat<Bits> b) {
+LIBC_INLINE constexpr DyadicFloat<Bits> quick_mul(const DyadicFloat<Bits> &a,
+                                                  const DyadicFloat<Bits> &b) {
   DyadicFloat<Bits> result;
   result.sign = (a.sign != b.sign) ? Sign::NEG : Sign::POS;
-  result.exponent = a.exponent + b.exponent + int(Bits);
+  result.exponent = a.exponent + b.exponent + static_cast<int>(Bits);
 
   if (!(a.mantissa.is_zero() || b.mantissa.is_zero())) {
     result.mantissa = a.mantissa.quick_mul_hi(b.mantissa);
@@ -301,7 +301,7 @@ multiply_add(const DyadicFloat<Bits> &a, const DyadicFloat<Bits> &b,
 // Simple exponentiation implementation for printf. Only handles positive
 // exponents, since division isn't implemented.
 template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a,
+LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(const DyadicFloat<Bits> &a,
                                               uint32_t power) {
   DyadicFloat<Bits> result = 1.0;
   DyadicFloat<Bits> cur_power = a;
@@ -317,7 +317,7 @@ LIBC_INLINE constexpr DyadicFloat<Bits> pow_n(DyadicFloat<Bits> a,
 }
 
 template <size_t Bits>
-LIBC_INLINE constexpr DyadicFloat<Bits> mul_pow_2(DyadicFloat<Bits> a,
+LIBC_INLINE constexpr DyadicFloat<Bits> mul_pow_2(const DyadicFloat<Bits> &a,
                                                   int32_t pow_2) {
   DyadicFloat<Bits> result = a;
   result.exponent += pow_2;
diff --git a/libc/src/__support/macros/optimization.h b/libc/src/__support/macros/optimization.h
index 59886ca44be12..05a47791deed8 100644
--- a/libc/src/__support/macros/optimization.h
+++ b/libc/src/__support/macros/optimization.h
@@ -33,4 +33,18 @@ LIBC_INLINE constexpr bool expects_bool_condition(T value, T expected) {
 #error "Unhandled compiler"
 #endif
 
+// Defining optimization options for math functions.
+// TODO: Exporting this to public generated headers?
+#define LIBC_MATH_SKIP_ACCURATE_PASS 0x01
+#define LIBC_MATH_SMALL_TABLES 0x02
+#define LIBC_MATH_NO_ERRNO 0x04
+#define LIBC_MATH_NO_EXCEPT 0x08
+#define LIBC_MATH_FAST                                                         \
+  (LIBC_MATH_SKIP_ACCURATE_PASS | LIBC_MATH_SMALL_TABLES |                     \
+   LIBC_MATH_NO_ERRNO | LIBC_MATH_NO_EXCEPT)
+
+#ifndef LIBC_MATH
+#define LIBC_MATH 0
+#endif // LIBC_MATH
+
 #endif // LLVM_LIBC_SRC___SUPPORT_MACROS_OPTIMIZATION_H
diff --git a/libc/src/math/generic/CMakeLists.txt b/libc/src/math/generic/CMakeLists.txt
index aa0069d821d0d..2e33c37dd90f0 100644
--- a/libc/src/math/generic/CMakeLists.txt
+++ b/libc/src/math/generic/CMakeLists.txt
@@ -135,6 +135,22 @@ add_header_library(
     libc.src.__support.common
 )
 
+add_header_library(
+  range_reduction_double
+  HDRS
+    range_reduction_double.h
+    range_reduction_double_fma.h
+  DEPENDS
+  libc.src.__support.FPUtil.double_double
+  libc.src.__support.FPUtil.dyadic_float
+  libc.src.__support.FPUtil.fp_bits
+  libc.src.__support.FPUtil.fma
+  libc.src.__support.FPUtil.multiply_add
+  libc.src.__support.FPUtil.nearest_integer
+  libc.src.__support.common
+  libc.src.__support.integer_literals
+)
+
 add_header_library(
   sincosf_utils
   HDRS
@@ -146,6 +162,15 @@ add_header_library(
     libc.src.__support.common
 )
 
+add_header_library(
+  sincos_eval
+  HDRS
+    sincos_eval.h
+  DEPENDS
+    libc.src.__support.FPUtil.double_double
+    libc.src.__support.FPUtil.multiply_add
+)
+
 add_entrypoint_object(
   cosf
   SRCS
@@ -167,6 +192,29 @@ add_entrypoint_object(
     -O3
 )
 
+add_entrypoint_object(
+  sin
+  SRCS
+    sin.cpp
+  HDRS
+    ../sin.h
+  DEPENDS
+    libc.hdr.errno_macros
+    libc.src.errno.errno
+    libc.src.__support.FPUtil.double_double
+    libc.src.__support.FPUtil.dyadic_float
+    libc.src.__support.FPUtil.fenv_impl
+    libc.src.__support.FPUtil.fp_bits
+    libc.src.__support.FPUtil.fma
+    libc.src.__support.FPUtil.multiply_add
+    libc.src.__support.FPUtil.nearest_integer
+    libc.src.__support.FPUtil.polyeval
+    libc.src.__support.FPUtil.rounding_mode
+    libc.src.__support.macros.optimization
+  COMPILE_OPTIONS
+    -O3
+)
+
 add_entrypoint_object(
   sinf
   SRCS
diff --git a/libc/src/math/generic/range_reduction_double.h b/libc/src/math/generic/range_reduction_double.h
new file mode 100644
index 0000000000000..3cd76d722da1a
--- /dev/null
+++ b/libc/src/math/generic/range_reduction_double.h
@@ -0,0 +1,67 @@
+//===-- Range reduction for double precision sin/cos/tan --------*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
+#define LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/common.h"
+
+namespace LIBC_NAMESPACE {
+
+using fputil::DoubleDouble;
+
+LIBC_INLINE constexpr int FAST_PASS_EXPONENT = 23;
+
+// Digits of pi/128, generated by Sollya with:
+// > a = round(pi/128, D, RN);
+// > b = round(pi/128 - a, D, RN);
+LIBC_INLINE constexpr DoubleDouble PI_OVER_128 = {0x1.1a62633145c07p-60,
+                                                  0x1.921fb54442d18p-6};
+
+// Digits of -pi/128, generated by Sollya with:
+// > a = round(pi/128, 25, RN);
+// > b = round(pi/128 - a, 23, RN);
+// > c = round(pi/128 - a - b, 25, RN);
+// > d = round(pi/128 - a - b - c, D, RN);
+// The precisions of the parts are chosen so that:
+// 1)  k * a, k * b, k * c are exact in double precision
+// 2)  k * b + fractional part of (k * a) is exact in double precsion
+LIBC_INLINE constexpr double MPI_OVER_128[4] = {
+    -0x1.921fb5p-6, -0x1.110b48p-32, +0x1.ee59dap-56, -0x1.98a2e03707345p-83};
+
+LIBC_INLINE constexpr double ONE_TWENTY_EIGHT_OVER_PI_D = 0x1.45f306dc9c883p5;
+
+namespace generic {
+
+LIBC_INLINE int range_reduction_small(double x, DoubleDouble &u) {
+  double prod_hi = x * ONE_TWENTY_EIGHT_OVER_PI_D;
+  double kd = fputil::nearest_integer(prod_hi);
+  int k = static_cast<int>(kd);
+
+  // x - k * (pi/128)
+  double c = fputil::multiply_add(kd, MPI_OVER_128[0], x);    // Exact
+  double y_hi = fputil::multiply_add(kd, MPI_OVER_128[1], c); // Exact
+  double y_lo = fputil::multiply_add(kd, MPI_OVER_128[2], kd * MPI_OVER_128[3]);
+  u = fputil::exact_add(y_hi, y_lo);
+
+  return k;
+}
+
+// TODO: Implement generic's range_reduction_large correctly rounded for all
+// rounding modes.  The current fma's range_reduction_large only works for
+// round-to-nearest without FMA instruction.
+
+} // namespace generic
+
+} // namespace LIBC_NAMESPACE
+
+#endif // LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_H
diff --git a/libc/src/math/generic/range_reduction_double_fma.h b/libc/src/math/generic/range_reduction_double_fma.h
new file mode 100644
index 0000000000000..dafcefe4ef72e
--- /dev/null
+++ b/libc/src/math/generic/range_reduction_double_fma.h
@@ -0,0 +1,334 @@
+//===-- Range reduction for double precision sin/cos/tan w/ FMA -*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+
+#ifndef LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_FMA_H
+#define LLVM_LIBC_SRC_MATH_GENERIC_RANGE_REDUCTION_DOUBLE_FMA_H
+
+#include "src/__support/FPUtil/FPBits.h"
+#include "src/__support/FPUtil/double_double.h"
+#include "src/__support/FPUtil/dyadic_float.h"
+#include "src/__support/FPUtil/multiply_add.h"
+#include "src/__support/FPUtil/nearest_integer.h"
+#include "src/__support/common.h"
+#include "src/__support/integer_literals.h"
+
+namespace LIBC_NAMESPACE {
+
+namespace fma {
+
+using fputil::DoubleDouble;
+using Float128 = fputil::DyadicFloat<128>;
+
+LIBC_INLINE constexpr int FAST_PASS_EXPONENT = 32;
+
+// Digits of pi/128, generated by Sollya with:
+// > a = round(pi/128, D, RN);
+// > b = round(pi/128 - a, D, RN);
+LIBC_INLINE constexpr DoubleDouble PI_OVER_128 = {0x1.1a62633145c07p-60,
+                                                  0x1.921fb54442d18p-6};
+LIBC_INLINE constexpr Float128 PI_OVER_128_F128 = {
+    Sign::POS, -133, 0xc90f'daa2'2168'c234'c4c6'628b'80dc'1cd1_u128};
+
+// Digits of 2^(16*i) / pi, generated by Sollya with:
+// For [2..62]:
+// > for i from 3 to 63 do {
+//     pi_inv = 2^(16*(i - 3)) / pi;
+//     pn = nearestint(pi_inv);
+//     pi_frac = pi_inv - pn;
+//     a = round(pi_frac, D, RN);
+//     b = round(pi_frac - a, D, RN);
+//     c = round(pi_frac - a - b, D, RN);
+//     d = round(pi_frac - a - b - c, D, RN);
+//     print("{", 2^7 * a, ",", 2^7 * b, ",", 2^7 * c, ",", 2^7 * d, "},");
+//   };
+// For [0..1]:
+// The leading bit of 2^(16*(i - 3)) / pi is very small, so we add 0.25 so that
+// the conditions for the algorithms are still satisfied, and one of those
+// conditions guarantees that ulp(0.25 * x_reduced) >= 2, and will safely be
+// discarded.
+// > for i from 0 to 2 do {
+//     pi_frac = 0.25 + 2^(16*(i - 3)) / pi;
+//     a = round(pi_frac, D, RN);
+//     b = round(pi_frac - a, D, RN);
+//     c = round(pi_frac - a - b, D, RN);
+//     d = round(pi_frac - a - b - c, D, RN);
+//     print("{", 2^7 * a, ",", 2^7 * b, ",", 2^7 * c, ",", 2^7 * d, "},");
+//   };
+// For The fast pass using double-double, we only need 3 parts (a, b, c), but
+// for the accurate pass using Float128, instead of using another table of
+// Float128s, we simply add the fourth path (a, b, c, d), which simplify the
+// implementation a bit and saving some memory.
+LIBC_INLINE constexpr double ONE_TWENTY_EIGHT_OVER_PI[64][4] = {
+    {0x1.0000000000014p5, 0x1.7cc1b727220a9p-49, 0x1.3f84eafa3ea6ap-103,
+     -0x1.11f924eb53362p-157},
+    {0x1.0000000145f3p5, 0x1.b727220a94fe1p-49, 0x1.d5f47d4d37703p-104,
+     0x1.b6295993c439p-158},
+    {0x1.000145f306dcap5, -0x1.bbead603d8a83p-50, 0x1.f534ddc0db629p-106,
+     0x1.664f10e4107f9p-160},
+    {0x1.45f306dc9c883p5, -0x1.6b01ec5417056p-49, -0x1.6447e493ad4cep-103,
+     0x1.e21c820ff28b2p-157},
+    {-0x1.f246c6efab581p4, 0x1.3abe8fa9a6eep-53, 0x1.b6c52b3278872p-107,
+     0x1.07f9458eaf7afp-164},
+    {0x1.391054a7f09d6p4, -0x1.70565911f924fp-53, 0x1.2b3278872084p-107,
+     -0x1.ae9c5421443aap-162},
+    {0x1.529fc2757d1f5p2, 0x1.a6ee06db14acdp-53, -0x1.8778df7c035d4p-107,
+     0x1.d5ef5de2b0db9p-161},
+    {-0x1.ec54170565912p-1, 0x1.b6c52b3278872p-59, 0x1.07f9458eaf7afp-116,
+     -0x1.d4f246dc8e2dfp-173},
+    {-0x1.505c1596447e5p5, 0x1.b14acc9e21c82p-49, 0x1.fe5163abdebbcp-106,
+     0x1.586dc91b8e909p-160},
+    {-0x1.596447e493ad5p1, 0x1.93c439041fe51p-54, 0x1.8eaf7aef1586ep-108,
+     -0x1.b7238b7b645a4p-163},
+    {0x1.bb81b6c52b328p5, -0x1.de37df00d74e3p-49, 0x1.7bd778ac36e49p-103,
+     -0x1.1c5bdb22d1ffap-158},
+    {0x1.b6c52b3278872p5, 0x1.07f9458eaf7afp-52, -0x1.d4f246dc8e2dfp-109,
+     0x1.374b801924bbbp-164},
+    {0x1.2b3278872084p5, -0x1.ae9c5421443aap-50, 0x1.b7246e3a424ddp-106,
+     0x1.700324977504fp-161},
+    {-0x1.8778df7c035d4p5, 0x1.d5ef5de2b0db9p-49, 0x1.1b8e909374b8p-104,
+     0x1.924bba8274648p-160},
+    {-0x1.bef806ba71508p4, -0x1.443a9e48db91cp-50, -0x1.6f6c8b47fe6dbp-104,
+     -0x1.115f62e6de302p-158},
+    {-0x1.ae9c5421443aap-2, 0x1.b7246e3a424ddp-58, 0x1.700324977504fp-113,
+     -0x1.cdbc603c429c7p-167},
+    {-0x1.38a84288753c9p5, -0x1.b7238b7b645a4p-51, 0x1.924bba8274648p-112,
+     0x1.cfe1deb1cb12ap-166},
+    {-0x1.0a21d4f246dc9p3, 0x1.d2126e9700325p-53, -0x1.a22bec5cdbc6p-107,
+     -0x1.e214e34ed658cp-162},
+    {-0x1.d4f246dc8e2dfp3, 0x1.374b801924bbbp-52, -0x1.f62e6de301e21p-106,
+     -0x1.38d3b5963045ep-160},
+    {-0x1.236e4716f6c8bp4, -0x1.1ff9b6d115f63p-50, 0x1.921cfe1deb1cbp-106,
+     0x1.29a73ee88235fp-162},
+    {0x1.b8e909374b802p4, -0x1.b6d115f62e6dep-50, -0x1.80f10a71a76b3p-105,
+     0x1.cfba208d7d4bbp-160},
+    {0x1.09374b801924cp4, -0x1.15f62e6de301ep-50, -0x1.0a71a76b2c609p-105,
+     0x1.1046bea5d7689p-159},
+    {-0x1.68ffcdb688afbp3, -0x1.736f180f10a72p-53, 0x1.62534e7dd1047p-107,
+     -0x1.0568a25dbd8b3p-161},
+    {0x1.924bba8274648p0, 0x1.cfe1deb1cb12ap-54, -0x1.63045df7282b4p-108,
+     -0x1.44bb7b16638fep-162},
+    {-0x1.a22bec5cdbc6p5, -0x1.e214e34ed658cp-50, -0x1.177dca0ad144cp-106,
+     0x1.213a671c09ad1p-160},
+    {0x1.3a32439fc3bd6p1, 0x1.cb129a73ee882p-54, 0x1.afa975da24275p-109,
+     -0x1.8e3f652e8207p-164},
+    {-0x1.b78c0788538d4p4, 0x1.29a73ee88235fp-50, 0x1.4baed1213a672p-104,
+     -0x1.fb29741037d8dp-159},
+    {0x1.fc3bd63962535p5, -0x1.822efb9415a29p-51, 0x1.a24274ce38136p-105,
+     -0x1.741037d8cdc54p-159},
+    {-0x1.4e34ed658c117p2, -0x1.f7282b4512edfp-52, 0x1.d338e04d68bfp-107,
+     -0x1.bec66e29c67cbp-162},
+    {0x1.62534e7dd1047p5, -0x1.0568a25dbd8b3p-49, -0x1.c7eca5d040df6p-105,
+     -0x1.9b8a719f2b318p-160},
+    {-0x1.63045df7282b4p4, -0x1.44bb7b16638fep-50, 0x1.ad17df904e647p-104,
+     0x1.639835339f49dp-158},
+    {0x1.d1046bea5d769p5, -0x1.bd8b31c7eca5dp-49, -0x1.037d8cdc538dp-107,
+     0x1.a99cfa4e422fcp-161},
+    {0x1.afa975da24275p3, -0x1.8e3f652e8207p-52, 0x1.3991d63983534p-106,
+     -0x1.82d8dee81d108p-160},
+    {-0x1.a28976f62cc72p5, 0x1.35a2fbf209cc9p-53, -0x1.4e33e566305b2p-109,
+     0x1.08bf177bf2507p-163},
+    {-0x1.76f62cc71fb29p5, -0x1.d040df633714ep-49, -0x1.9f2b3182d8defp-104,
+     0x1.f8bbdf9283b2p-158},
+    {0x1.d338e04d68bfp5, -0x1.bec66e29c67cbp-50, 0x1.9cfa4e422fc5ep-105,
+     -0x1.036be27003b4p-161},
+    {0x1.c09ad17df904ep4, 0x1.91d639835339fp-50, 0x1.272117e2ef7e5p-104,
+     -0x1.7c4e007680022p-158},
+    {0x1.68befc827323bp5, -0x1.c67cacc60b638p-50, 0x1.17e2ef7e4a0ecp-104,
+     0x1.ff897ffde0598p-158},
+    {-0x1.037d8cdc538dp5, 0x1.a99cfa4e422fcp-49, 0x1.77bf250763ff1p-103,
+     0x1.7ffde05980fefp-158},
+    {-0x1.8cdc538cf9599p5, 0x1.f49c845f8bbep-50, -0x1.b5f13801da001p-104,
+     0x1.e05980fef2f12p-158},
+    {-0x1.4e33e566305b2p3, 0x1.08bf177bf2507p-51, 0x1.8ffc4bffef02dp-105,
+     -0x1.fc04343b9d298p-160},
+    {-0x1.f2b3182d8dee8p4, -0x1.d1081b5f13802p-52, 0x1.2fffbc0b301fep-107,
+     -0x1.a1dce94beb25cp-163},
+    {-0x1.8c16c6f740e88p5, -0x1.036be27003b4p-49, -0x1.0fd33f8086877p-109,
+     -0x1.d297d64b824b2p-164},
+    {0x1.3908bf177bf25p5, 0x1.d8ffc4bffef03p-53, -0x1.9fc04343b9d29p-108,
+     -0x1.f592e092c9813p-162},
+    {0x1.7e2ef7e4a0ec8p4, -0x1.da00087e99fcp-56, -0x1.0d0ee74a5f593p-110,
+     0x1.f6d367ecf27cbp-166},
+    {-0x1.081b5f13801dap4, -0x1.0fd33f8086877p-61, -0x1.d297d64b824b2p-116,
+     -0x1.8130d834f648bp-170},
+    {-0x1.af89c00ed0004p5, -0x1.fa67f010d0ee7p-50, -0x1.297d64b824b26p-104,
+     -0x1.30d834f648b0cp-162},
+    {-0x1.c00ed00043f4dp5, 0x1.fde5e2316b415p-55, -0x1.2e092c98130d8p-110,
+     -0x1.a7b24585ce...
[truncated]

@lntue
Copy link
Contributor Author

lntue commented Jun 17, 2024

Performance tests with the CORE-MATH project for FMA version:

  • For inputs in [-pi, pi]
    Reciprocal Throughput
CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 38.193 + 0.560 clc/call; Median-Min = 0.364 clc/call; Max = 40.305 clc/call;
--- System LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 24.891 + 1.007 clc/call; Median-Min = 0.888 clc/call; Max = 26.638 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 22.702 + 0.400 clc/call; Median-Min = 0.270 clc/call; Max = 25.537 clc/call;

Latency:

CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH latency ---
[####################] 100 %
Ntrial = 20 ; Min = 73.839 + 0.949 clc/call; Median-Min = 0.807 clc/call; Max = 75.513 clc/call;
--- System LIBC latency ---
[####################] 100 %
Ntrial = 20 ; Min = 48.495 + 2.750 clc/call; Median-Min = 2.026 clc/call; Max = 53.403 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 58.582 + 0.534 clc/call; Median-Min = 0.537 clc/call; Max = 59.231 clc/call;
  • For inputs ~ 2^100
    Reciprocal Throughput
CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 78.974 + 0.237 clc/call; Median-Min = 0.214 clc/call; Max = 81.168 clc/call;
--- System LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 174.806 + 0.875 clc/call; Median-Min = 0.886 clc/call; Max = 176.161 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 37.598 + 0.094 clc/call; Median-Min = 0.081 clc/call; Max = 38.052 clc/call;

Latency

CORE_MATH_PERF_MODE (perf or rdtsc) environment variable is not set. The default is rdtsc.
LIBC-location: /usr/local/google/home/lntue/experiment/llvm/llvm-project/build/projects/libc/lib/libllvmlibc.a
GNU libc version: 2.37
GNU libc release: stable
--- CORE-MATH latency ---
[####################] 100 %
Ntrial = 20 ; Min = 114.356 + 0.779 clc/call; Median-Min = 0.819 clc/call; Max = 115.353 clc/call;
--- System LIBC latency ---
[####################] 100 %
Ntrial = 20 ; Min = 187.295 + 0.258 clc/call; Median-Min = 0.250 clc/call; Max = 189.725 clc/call;
--- LIBC reciprocal throughput ---
[####################] 100 %
Ntrial = 20 ; Min = 85.237 + 0.507 clc/call; Median-Min = 0.521 clc/call; Max = 85.883 clc/call

@lntue lntue requested a review from petrhosek June 17, 2024 05:20
@zimmermann6
Copy link

it seems the inexact exception is not handled:

zimmerma@croissant:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=$L ./check.sh sin
Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.86fbbc688f0e9p-24 (y=0x1.86fbbc688f0ep-24)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.fe945340d1525p-22 (y=0x1.fe945340d13d3p-22)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

@zimmermann6
Copy link

zimmermann6 commented Jun 17, 2024

On a Intel(R) Xeon(R) Silver 4214 I get:

zimmerma@croissant:~/svn/core-math$ LIBM=$L CORE_MATH_PERF_MODE=rdtsc ./perf.sh sin
GNU libc version: 2.38
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 61.961 + 0.506 clc/call; Median-Min = 0.548 clc/call; Max = 62.771 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 32.978 + 0.803 clc/call; Median-Min = 0.627 clc/call; Max = 35.687 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 30.249 + 0.946 clc/call; Median-Min = 0.809 clc/call; Max = 32.471 clc/call;
zimmerma@croissant:~/svn/core-math$ LIBM=$L CORE_MATH_PERF_MODE=rdtsc PERF_ARGS=--latency ./perf.sh sin
GNU libc version: 2.38
GNU libc release: stable
[####################] 100 %
Ntrial = 20 ; Min = 88.747 + 0.985 clc/call; Median-Min = 0.854 clc/call; Max = 90.962 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 58.363 + 0.368 clc/call; Median-Min = 0.270 clc/call; Max = 59.351 clc/call;
[####################] 100 %
Ntrial = 20 ; Min = 62.464 + 0.795 clc/call; Median-Min = 0.687 clc/call; Max = 64.072 clc/call;

@jhuber6
Copy link
Contributor

jhuber6 commented Jun 17, 2024

How does this perform on targets without 128-bit float support? Guessing those skip the final accurate pass?

@lntue
Copy link
Contributor Author

lntue commented Jun 17, 2024

How does this perform on targets without 128-bit float support? Guessing those skip the final accurate pass?

The 128-bit precision floating point type that we use for accurate pass is not quad precision, and it is emulated using uint64_t internally: https://github.com/llvm/llvm-project/blob/main/libc/src/__support/FPUtil/dyadic_float.h#L33

@lntue
Copy link
Contributor Author

lntue commented Jun 17, 2024

it seems the inexact exception is not handled:

zimmerma@croissant:~/svn/core-math$ CORE_MATH_CHECK_STD=true LIBM=$L ./check.sh sin
Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.86fbbc688f0e9p-24 (y=0x1.86fbbc688f0ep-24)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.fe945340d1525p-22 (y=0x1.fe945340d13d3p-22)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

@zimmermann6 : Do you mind sync to the latest update and check again? It was caused by some unrelated changes. Thanks!

@zimmermann6
Copy link

unless I did something wrong, I still get missing inexact exceptions:

Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.de85173083242p-20 (y=0x1.de851730820d8p-20)
Missing inexact exception for x=0x1.8d06555815a2cp+9 (y=0x1.65b10b8443ecep-1)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

@lntue
Copy link
Contributor Author

lntue commented Jun 18, 2024

unless I did something wrong, I still get missing inexact exceptions:

Running worst cases check in --rndn mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Missing inexact exception for x=0x1.de85173083242p-20 (y=0x1.de851730820d8p-20)
Missing inexact exception for x=0x1.8d06555815a2cp+9 (y=0x1.65b10b8443ecep-1)
Running worst cases check in --rndz mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)
Running worst cases check in --rndu mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88fp-1)
Running worst cases check in --rndd mode...
Missing inexact exception for x=0x1.005023d32fee5p+1 (y=0x1.d109ad145c88ep-1)

Can you apply #95845 to your repo and rerun the cmake + ninja commands?
Thanks,

@zimmermann6
Copy link

thanks to the help of Tue, now all tests do pass (including the check of the inexact exception)

Copy link
Member

@nickdesaulniers nickdesaulniers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not finished reviewing, but I'm in and out of meetings all day. Posting what I have for now.

libc/src/math/generic/range_reduction_double.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double_fma.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double_fma.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double_fma.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double_fma.h Outdated Show resolved Hide resolved
libc/src/math/generic/sin.cpp Outdated Show resolved Hide resolved
libc/src/math/generic/sin.cpp Outdated Show resolved Hide resolved
libc/src/math/generic/sin.cpp Show resolved Hide resolved
libc/src/math/generic/range_reduction_double.h Outdated Show resolved Hide resolved
libc/src/math/generic/range_reduction_double.h Outdated Show resolved Hide resolved
libc/src/math/x86_64/sin.cpp Show resolved Hide resolved
libc/test/src/math/sin_test.cpp Outdated Show resolved Hide resolved
@lntue
Copy link
Contributor Author

lntue commented Jun 21, 2024

Thanks @zimmermann6 for letting me know about the recent results at: https://inria.hal.science/hal-04480440.

With the insight from that paper, I was able to make non-FMA targets also correctly rounded to all rounding modes.

So I rearrange the range reduction headers, with range_reduction_double_fma.h and range_reduction_double_nofma.h contain their own specialized look-up tables, and fast small range reduction, while range_reduction_double_common.h contains double-double large range reduction and 128-bit precision small and large range reductions.

@zimmermann6
Copy link

I still have a question. Since the full set of hard-to-round cases for sin is not known, in CORE-MATH we issue an error message when we are not able to guarantee correct rounding. How do you deal with this issue?

@lntue
Copy link
Contributor Author

lntue commented Jun 21, 2024

I still have a question. Since the full set of hard-to-round cases for sin is not known, in CORE-MATH we issue an error message when we are not able to guarantee correct rounding. How do you deal with this issue?

@zimmermann6 : right now we will probably rely on fuzz testing for it, comparing against MPFR outputs.

@nickdesaulniers : shall we add accuracy assertion under #ifndef NDEBUG guard?

@nickdesaulniers
Copy link
Member

@nickdesaulniers : shall we add accuracy assertion under #ifndef NDEBUG guard?

Yes, I'm a big fan of -DLLVM_ENABLE_ASSERTIONS=ON. We also have LIBC_ASSERT (src/__support/libc_assert.h). The intention of -DLLVM_ENABLE_ASSERTIONS=ON is to trade some performance overhead for additional safety checks. Assertions builds are generally much faster than -DCMAKE_BUILD_TYPE=Debug builds of the rest of LLVM in my experience, so I usually develop with assertions enabled but not a full blow debug build (which takes longer to compiler and link).

@lntue
Copy link
Contributor Author

lntue commented Jun 24, 2024

@nickdesaulniers : shall we add accuracy assertion under #ifndef NDEBUG guard?

Yes, I'm a big fan of -DLLVM_ENABLE_ASSERTIONS=ON. We also have LIBC_ASSERT (src/__support/libc_assert.h). The intention of -DLLVM_ENABLE_ASSERTIONS=ON is to trade some performance overhead for additional safety checks. Assertions builds are generally much faster than -DCMAKE_BUILD_TYPE=Debug builds of the rest of LLVM in my experience, so I usually develop with assertions enabled but not a full blow debug build (which takes longer to compiler and link).

I created an issue (#96452) and added a TODO to provide an exact error bound for the accurate pass and add assertion.

@@ -36,7 +36,8 @@ TEST_F(LlvmLibcSinTest, TrickyInputs) {
0x1.e639103a05997p+2, 0x1.13114266f9764p+4, -0x1.3eec5912ea7cdp+331,
0x1.08087e9aad90bp+887, 0x1.2b5fe88a9d8d5p+903, -0x1.a880417b7b119p+1023,
-0x1.6deb37da81129p+205, 0x1.08087e9aad90bp+887, 0x1.f6d7518808571p+1023,
-0x1.8bb5847d49973p+845, 0x1.f08b14e1c4d0fp+890,
-0x1.8bb5847d49973p+845, 0x1.f08b14e1c4d0fp+890, 0x1.6ac5b262ca1ffp+849,
0x1.e0000000001c2p-20,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with the two additional constants?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two extra worst cases that popped up with the CORE-MATH test suites when I was playing around with the error bounds for fast passes for non-FMA targets.

@lntue lntue merged commit 16903ac into llvm:main Jun 24, 2024
7 checks passed
@lntue lntue deleted the sin branch June 24, 2024 21:57
lntue added a commit that referenced this pull request Jun 25, 2024
…rounding modes. (#96591)

Sharing the same algorithm as double precision sin:
#95736
lntue added a commit that referenced this pull request Jun 27, 2024
…ll rounding modes. (#96719)

Sharing the same algorithm as double precision sin:
#95736 and cos:
#96591
bherrera pushed a commit to misttech/mist-os that referenced this pull request Jul 1, 2024
…recision sincos correctly rounded to all rounding modes. (#96719)

Sharing the same algorithm as double precision sin:
llvm/llvm-project#95736 and cos:
llvm/llvm-project#96591

GitOrigin-RevId: 8eda5e8754fda4422617d069315bd0ce34d99b0d
Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993
Roller-URL: https://ci.chromium.org/b/8743956522971831761
CQ-Do-Not-Cancel-Tryjobs: true
Change-Id: Ic8e99f78feabb9deb3ac49fc04608cfadfc16a66
Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1073452
bherrera pushed a commit to misttech/integration that referenced this pull request Jul 1, 2024
… sincos correctly rounded to all rounding modes. (#96719)

Sharing the same algorithm as double precision sin:
llvm/llvm-project#95736 and cos:
llvm/llvm-project#96591

GitOrigin-RevId: 8eda5e8754fda4422617d069315bd0ce34d99b0d
Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993
Change-Id: Ie2bd882ad9952a0969feb84edcf1daffab6901b4
bherrera pushed a commit to misttech/integration that referenced this pull request Jul 1, 2024
…] Implement double precision sincos correctly rounded to all rounding modes. (#96719)

Sharing the same algorithm as double precision sin:
llvm/llvm-project#95736 and cos:
llvm/llvm-project#96591

GitOrigin-RevId: 6e644e80df2ddd3b6292ca7f77c48922a7848a1c
Original-Revision: 4e4be54614351dfbf2503dd5369b0837e3f56993
Roller-URL: https://ci.chromium.org/b/8743956522971831761
CQ-Do-Not-Cancel-Tryjobs: true
Original-Reviewed-on: https://fuchsia-review.googlesource.com/c/fuchsia/+/1073452
Original-Revision: 660fa89a50c932fbfcacafdc59eab859631e4390
Change-Id: Iba8f96b2ab385de0dc2388226870f9c5af969b2b
lravenclaw pushed a commit to lravenclaw/llvm-project that referenced this pull request Jul 3, 2024
…ll rounding modes. (llvm#96719)

Sharing the same algorithm as double precision sin:
llvm#95736 and cos:
llvm#96591
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…rounding modes. (llvm#95736)

- Algorithm:
- Step 1 - Range reduction: for a double precision input `x`, return `k`
and `u` such that
    - k is an integer
    - u = x - k * pi / 128, and |u| < pi/256
- Step 2 - Calculate `sin(u)` and `cos(u)` in double-double using Taylor
polynomials with errors < 2^-70 with FMA or < 2^-66 w/o FMA.
- Step 3 - Calculate `sin(x) = sin(k*pi/128) * cos(u) + cos(k*pi/128) *
sin(u)` using look-up table for `sin(k*pi/128)` and `cos(k*pi/128)`.
- Step 4 - Use Ziv's rounding test to decide if the result is correctly
rounded.
- Step 4' - If the Ziv's rounding test failed, redo step 1-3 using
128-bit precision.
- Currently, without FMA instructions, the large range reduction only
works correctly for the default rounding mode (FE_TONEAREST).
- Provide `LIBC_MATH` flag so that users can set `LIBC_MATH =
LIBC_MATH_SKIP_ACCURATE_PASS` to build the `sin` function without step 4
and 4'.
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…rounding modes. (llvm#96591)

Sharing the same algorithm as double precision sin:
llvm#95736
AlexisPerry pushed a commit to llvm-project-tlp/llvm-project that referenced this pull request Jul 9, 2024
…ll rounding modes. (llvm#96719)

Sharing the same algorithm as double precision sin:
llvm#95736 and cos:
llvm#96591
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants