-
Notifications
You must be signed in to change notification settings - Fork 310
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve float to int truncation precision #835
Conversation
c58eb96
to
8bde04d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this. Nice work.
Hi Dmitry, Phil and I have reviewed your change together. Improving the converter code is definitely something we want to do. We have the following critical questions which need to be covered first:
Then there are lesser issues that will need to be discussed/addressed prior merge:
|
note so I don't forget: any change here may affect the relevance of |
8bde04d
to
fed6f7e
Compare
Hi Ross! Thank you for your comments.
Yes it is indeed possible by setting
Windows platform but PA GitHub actions also compile for Ubuntu and OSX. I did not implement PA tests for assembler inserts for SSE and ARMv8 but that code is taken from my project with >10 years of operation including ARMv8 on Linux and iOS. It makes sense to make tests if we agree on
I am able to compile for Windows XP with MinGW but do not have older MSVC than VS2019. MSDN has this API available for vs140 tools which if I am not mistaken can be used to target Windows XP builds. If anybody reports incompatibility I could add workaround in the future.
Must be fairly portable as that intrinsic is available on all compilers.
I updated implementation to limit to GCC and Clang but utilized platforms flags were set by GCC or Clang only, so it was fine already.
Yes, adjusted. To preserve backwards compatibility with older PA versions I probably need to change implementation to |
…rea by forcing floating-point rounding mode to rounding to nearest.
fed6f7e
to
bc52043
Compare
After double checking PA implementation and specifically Therefore, I updated description and implementation to set rounding to nearest mode and removed all SIMD-related code which was doing rounding towards zero. That simplified PR and made it similar to
@RossBencina, @philburk would you please check implementation again as it looks quite straight forward now. |
I tried this out with C++ on a Mac:
and got:
|
Additionally, I thought the goal was to round to nearest, not to truncate. Also, |
So it seems that fesetround() does not affect casting. Maybe we should be calling nearbyint() instead. |
We're bumping this for a future release. It's important to fix but it's not ready. |
The MS docs are vague on explicit casts, but I presume that they are saying that that fesetround doesn't affect float-to-int casts. In any case we have Phil's test results. https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fegetround-fesetround2?view=msvc-170 |
According to Phil's results setting
Yes, those functions had more work but the code changed during our discussion. I propose to leave just C-style casts as it was before, i.e. I will undo
|
Remove inline keyword.
I made changes, now implementation is limited to only setting/resetting the rounding mode around conversion code. |
@RossBencina I executed Phil's code on MSVC and got the same result: fegetround = 0
((int32_t) 0.99) = 0
(nearbyint(0.99) = 1
FE_TONEAREST = 0
((int32_t) 0.99) = 0
(nearbyint(0.99) = 1
FE_DOWNWARD = 256
((int32_t) 0.99) = 0
(nearbyint(0.99) = 0 Therefore the proposed PR is indeed useful to enforce |
I further debugged statement in Microsoft's documentation (https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/fegetround-fesetround2?view=msvc-170) and Floating-point to integer implicit casts and conversions, which always round towards zero.
volatile float fv = 0.99;
// Compare casts
std::cout << "fegetround = " << fegetround() << std::endl;
volatile int32_t v1 = fv;
std::cout << "((int32_t) 0.99) = " << v1 << std::endl;
std::cout << "(nearbyint(0.99) = " << nearbyint(0.99) << std::endl;
fesetround(FE_TONEAREST);
std::cout << "FE_TONEAREST = " << FE_TONEAREST << std::endl;
volatile int32_t v2 = fv;
std::cout << "((int32_t) 0.99) = " << v2 << std::endl;
std::cout << "(nearbyint(0.99) = " << nearbyint(0.99) << std::endl;
fesetround(FE_DOWNWARD);
std::cout << "FE_DOWNWARD = " << FE_DOWNWARD << std::endl;
volatile int32_t v3 = fv;
std::cout << "((int32_t) 0.99) = " << v3 << std::endl;
std::cout << "(nearbyint(0.99) = " << nearbyint(0.99) << std::endl; result: fegetround = 0
((int32_t) 0.99) = 0
(nearbyint(0.99) = 1
FE_TONEAREST = 0
((int32_t) 0.99) = 0
(nearbyint(0.99) = 1
FE_DOWNWARD = 256
((int32_t) 0.99) = 0
(nearbyint(0.99) = 0 Checking the assembly (x86-64) reveals that in all 3 cases compiler is using the same instruction 01C33DE3 movss xmm0,dword ptr [fv]
01C33DE8 cvttss2si eax,xmm0
01C33DEC mov dword ptr [v1],eax
01C33ED6 movss xmm0,dword ptr [fv]
01C33EDB cvttss2si eax,xmm0
01C33EDF mov dword ptr [v2],eax
01C33FCF movss xmm0,dword ptr [fv]
01C33FD4 cvttss2si eax,xmm0
01C33FD8 mov dword ptr [v3],eax |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am still concerned about the new function names. See unresolved comments above.
@@ -358,18 +383,21 @@ static void Float32_To_Int32_Dither( | |||
{ | |||
float *src = (float*)sourceBuffer; | |||
PaInt32 *dest = (PaInt32*)destinationBuffer; | |||
int prevMode = SetRoundingMode(); | |||
|
|||
while( count-- ) | |||
{ | |||
/* REVIEW */ | |||
double dither = PaUtil_GenerateFloatTriangularDither( ditherGenerator ); | |||
/* use smaller scaler to prevent overflow when we add the dither */ | |||
double dithered = ((double)*src * (2147483646.0)) + dither; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now that we are rounding to nearest, will this smaller scaler be small enough to prevent numeric overflow during the cast?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR does not change rounding mode because rounding to nearest is the default rounding mode when process starts and therefore de facto PA is already assuming this rounding mode. Proposed change enforces this rounding mode explicitly to avoid situation that user sets for example FE_DOWNWARD
which actually changes rounding mode and thus the result of PA converters will be different if we do not set FE_TONEAREST
explicitly.
None of this surprises me. My conclusion is that setting the rounding mode to |
Ross, you are right! My attention was somehow misled by result of While in my test I was doing implicit casts: volatile float fv = 0.99;
// Implicit cast of fv to v1
volatile int32_t v1 = fv; those tests gave the same result (truncation towards zero) with implicit and explicit casts. Therefore we can conclude that if PA converters do not use math functions which are affected by rounding mode and rely on explicit or implicit floating-point to integer casts then they are unaffected by rounding mode and do not require any explicit rounding mode to be set. I am closing this PR, feel free to reopen it for some reason. |
Improve float to int truncation precision by forcing floating-point rounding mode to rounding
towards zero or by using CPU SIMD which truncates by rounding towards zeroto nearest.This PR addresses issues discussed in #390 and PR #403 which removed
lrint
in favor of C-style cast of floating point value to integer.Currently, conversion via C-style cast works as expected, i.e. truncates floating point value to integer with rounding
towards zeroto nearest, only if CPU is set to roundingtowards zeroto nearest mode. But, if user code or compilation flags set some other rounding mode the conversion via C-style castbecomes erroneousresults in unexpected result.To circumvent it this PR proposes to set
neededrequired rounding mode viafesetround
API and reset it back to previous user-set rounding mode when conversion operation completes.Besidesfesetround
API it is also possible to use specialized CPU SIMD which truncates with rounding towards zero, for example it can be x86 SSE and SSE2, ARMv8. By using CPU SIMD it is possible to avoid callingfesetround
API completely that improves run-time performance.This PR implements
these 2 approaches:fesetround
to set required rounding mode when no CPU SIMD is present for the truncation of float and double to int for the required rounding modeUse CPU SIMD for truncation of float and double to int and in this case code relatedfesetround
is optimized away by the compiler in Release build, so no overhead happens during run-time