Skip to content

Commit

Permalink
Merge pull request #694 from daniestevez/divide-aarch64
Browse files Browse the repository at this point in the history
volk_32fc_x2_divide_32fc: add documentation about numerical accuracy
  • Loading branch information
jdemel authored Nov 7, 2023
2 parents f2535d0 + c364110 commit 56a893c
Showing 1 changed file with 38 additions and 0 deletions.
38 changes: 38 additions & 0 deletions kernels/volk/volk_32fc_x2_divide_32fc.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,44 @@
* volk_free(input_vector);
* volk_free(out);
* \endcode
*
* \b Numerical accuracy
*
* The division of complex numbers (a + bj)/(c + dj) is calculated as
* (a*c + b*d)/(c*c + d*d) + (b*c - a*d)/(c*c + d*d) j.
*
* Since Volk is built using unsafe math optimizations
* (-fcx-limited-range -funsafe-math-optimizations with gcc), after computing
* the two numerators and the denominator in the above formula, the compiler
* can either:
*
* a) compute the two divisions;
* b) compute the inverse of the denominator, 1.0/(c*c + d*d), and then
* multiply this number by each of the numerators.
*
* Under gcc, b) is allowed by -freciprocal-math, which is included in
* -funsafe-math-optimizations.
*
* Depending on the architecture and the estimated cost of multiply and divide
* instructions, the compiler can perform a) or b). gcc performs a) under x86_64
* and armv7, and b) under aarch64.
*
* To avoid obtaining inf or nan in some architectures, care should be taken
* that c*c + d*d is not too small. In particular, if c*c + d*d < FLT_MAX, then
* the calculation of 1.0/(c*c + d*d) will yield inf.
*
* For more information about numerical accuracy of complex division, see the
* following:
*
* - https://godbolt.org/z/z9z5b9oM5 Compiler Explorer example showing how
* complex division is done in different architectures and with different
* optimization options.
*
* -
* https://github.com/gcc-mirror/gcc/blob/7e3c58bfc1d957e48faf0752758da0fed811ed58/libgcc/libgcc2.c#L2707
* gcc's implementation of the __divsc3 function, which implements C99 complex
* division semantics and handles several potential numerical problems. This
* function is used whenever -fcx-limited-range is not enabled.
*/

#ifndef INCLUDED_volk_32fc_x2_divide_32fc_u_H
Expand Down

0 comments on commit 56a893c

Please sign in to comment.