fast_rsqrt for f32 and f64 types #13718

jacob-hegna · 2014-04-24T00:49:40Z

Useful for applications that need to be as fast as possible, this function allows for faster inverse square root computation than the current rsqrt function. Both are necessary, and serve different purposes.

…ts. It uses the Newton's method approach originally seen in Quake Arena

…m testing the function

brson · 2014-04-24T01:17:36Z

Is there precedent for fast inverse square root in standard libraries? I'm only familiar with it's use in Quake.

jacob-hegna · 2014-04-24T01:21:12Z

It is used in normalizing vectors quite frequently, it just so happens that a lot of quick vector normalization occurs in lighting for game engines. There also isn't a ton of precedent for inverse square roots in standard libraries in general (from what I have gathered at least), the fast that rust already has rsqrt probably means a fast version isn't much of a stretch considering they each have their separate use cases

sfackler · 2014-04-24T03:14:14Z

Is this the right implementation? The Quake fast inverse square root algorithm is both an order of magnitude slower and an order of magnitude less accurate than the rsqrtss SSE instruction: http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/

jacob-hegna · 2014-04-24T03:23:45Z

It should be faster than the current rust rsqrt, but I would not be surprised if there is an x86 assembly instruction that does it faster. However, if we assume that rust will be used on non x86 platforms (ie ARM) then using x86 specific code isn't really the optimal solution. I could be totally wrong though, and if rsqrtss is platform independent then I would be all for just adopting that in the regular rsqrt function (because if it is exact then there is no reason to have a separate function for it).
Edit: also, as pointed out by @killmous rsqrtss only works with x86 processors with the SSE chip (introduced in 1999). Obviously that only excludes a small portion of users, but I'm sure there are some people with old hardware out there.
Also, according to your link, "both the MSVC and GCC compilers default to exclusively using the x87 for scalar math, so unless you edit the 'code generation' project properties... you’ll be stuck with code that uses the old slow way."

Aatch · 2014-04-24T04:53:52Z

I'm against merging this in as it currently stands.

Given the massive performance increase of using rsqrtss over Carmack's algorithm, throwing it out because it 1) doesn't work on 15-year-old hardware and 2) only works on x86 is silly. We have conditional compilation to manage the cpu platform differences and using something like CPUID to check for SSE seems like a faster option, even with the branch. Or are you planning on supporting the 21-year old computer market as well? It seems like adding a second function for performance reasons, then ignoring an order-of-magnitude faster possibility defeats the point.

This shouldn't be part of the Float trait. The very concept of a fast rsqrt function suggests that it isn't widely applicable and indeed doesn't make a lick of sense for a hypothetical arbitrary precision floating point type.

The implementation of fast_rsqrt for f64 is unacceptable. Loosing half the precision is not a valid solution. Either it shouldn't be available at all or should be implemented in a way that doesn't produce such a massive amount of error. A brief search finds that the double-precision version is identical except for using i64 and 0x5fe6eb50c7b537a9 as the magic number.

There are no benchmarks. Irrelevant of platform capabilities, anything that touts performance needs benchmarks.

thestinger · 2014-04-24T05:07:40Z

This shouldn't be part of the Float trait. The very concept of a fast rsqrt function suggests that it isn't widely applicable and indeed doesn't make a lick of sense for a hypothetical arbitrary precision floating point type.

The Float trait is no longer going to work for arbitrary precision floating point because #13597 switched it to taking all the parameters by-value. I wasn't very enthusiastic about that and to me it indicates a major language flaw - but I didn't voice much concern because it wasn't consistent in the usage of by-reference parameters.

Aatch · 2014-04-24T05:08:09Z

Actually, thinking on this further, I am completely against using the Quake III algorithm at all, at least on x86. The Steam Hardware & Software Survey does not even have a line for whether or not SSE exists on the persons machine and 99.95% of computers have SSE2. In other words, less than 1 in 2000 machines that steam collects data from will crash on the rsqrtss instruction. For some context, there are more people reporting dial up than reporting a lack of SSE2.

jacob-hegna · 2014-04-24T05:09:48Z

That's fair - I'll drop the Quake algorithm and modify the current rsqrt to use rsqrtss, sorry about the original implementation.
Should this be in a new PR?

Aatch · 2014-04-24T05:22:50Z

@jacob-hegna I'd hold off on doing anything right now until the question of "is this something we want at all" is answered.

brson · 2014-04-25T19:00:03Z

Thank you for the contribution but I'm going to close this PR without merging for the following reasons: there seems to be a lack of precedent for providing this function in a general purpose stdlib; I believe the demand for this function is minimal.

Every function that goes into std is a burden that must be carried by all Rust software forever, so if there's a reasonable case for not including it, then we often should not.

Again, thanks.

jacob-hegna · 2014-04-25T19:53:07Z

No problem, just trying to help out however I can

jacob-hegna added 4 commits April 23, 2014 19:46

Added a fast_rsqrt function to compute, well, fast inverse square roo…

55cf3bc

…ts. It uses the Newton's method approach originally seen in Quake Arena

Made lines in f32.rs and f64.rs shorter than 100 chars

9d1b8a0

eliminated the std:: prepended to transmute calls that originated fro…

fe6f663

…m testing the function

Added fast_rsqrt to Float's traits

e4e847a

removed dead assignment

1ef5302

brson closed this Apr 25, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fast_rsqrt for f32 and f64 types #13718

fast_rsqrt for f32 and f64 types #13718

jacob-hegna commented Apr 24, 2014

brson commented Apr 24, 2014

jacob-hegna commented Apr 24, 2014

sfackler commented Apr 24, 2014

jacob-hegna commented Apr 24, 2014

Aatch commented Apr 24, 2014

thestinger commented Apr 24, 2014

Aatch commented Apr 24, 2014

jacob-hegna commented Apr 24, 2014

Aatch commented Apr 24, 2014

brson commented Apr 25, 2014

jacob-hegna commented Apr 25, 2014

fast_rsqrt for f32 and f64 types #13718

fast_rsqrt for f32 and f64 types #13718

Conversation

jacob-hegna commented Apr 24, 2014

brson commented Apr 24, 2014

jacob-hegna commented Apr 24, 2014

sfackler commented Apr 24, 2014

jacob-hegna commented Apr 24, 2014

Aatch commented Apr 24, 2014

thestinger commented Apr 24, 2014

Aatch commented Apr 24, 2014

jacob-hegna commented Apr 24, 2014

Aatch commented Apr 24, 2014

brson commented Apr 25, 2014

jacob-hegna commented Apr 25, 2014