Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fast_rsqrt for f32 and f64 types #13718

Closed
wants to merge 5 commits into from

Conversation

jacob-hegna
Copy link
Contributor

Useful for applications that need to be as fast as possible, this function allows for faster inverse square root computation than the current rsqrt function. Both are necessary, and serve different purposes.

@brson
Copy link
Contributor

brson commented Apr 24, 2014

Is there precedent for fast inverse square root in standard libraries? I'm only familiar with it's use in Quake.

@jacob-hegna
Copy link
Contributor Author

It is used in normalizing vectors quite frequently, it just so happens that a lot of quick vector normalization occurs in lighting for game engines. There also isn't a ton of precedent for inverse square roots in standard libraries in general (from what I have gathered at least), the fast that rust already has rsqrt probably means a fast version isn't much of a stretch considering they each have their separate use cases

@sfackler
Copy link
Member

Is this the right implementation? The Quake fast inverse square root algorithm is both an order of magnitude slower and an order of magnitude less accurate than the rsqrtss SSE instruction: http://assemblyrequired.crashworks.org/2009/10/16/timing-square-root/

@jacob-hegna
Copy link
Contributor Author

It should be faster than the current rust rsqrt, but I would not be surprised if there is an x86 assembly instruction that does it faster. However, if we assume that rust will be used on non x86 platforms (ie ARM) then using x86 specific code isn't really the optimal solution. I could be totally wrong though, and if rsqrtss is platform independent then I would be all for just adopting that in the regular rsqrt function (because if it is exact then there is no reason to have a separate function for it).
Edit: also, as pointed out by @killmous rsqrtss only works with x86 processors with the SSE chip (introduced in 1999). Obviously that only excludes a small portion of users, but I'm sure there are some people with old hardware out there.
Also, according to your link, "both the MSVC and GCC compilers default to exclusively using the x87 for scalar math, so unless you edit the 'code generation' project properties... you’ll be stuck with code that uses the old slow way."

@Aatch
Copy link
Contributor

Aatch commented Apr 24, 2014

I'm against merging this in as it currently stands.

Given the massive performance increase of using rsqrtss over Carmack's algorithm, throwing it out because it 1) doesn't work on 15-year-old hardware and 2) only works on x86 is silly. We have conditional compilation to manage the cpu platform differences and using something like CPUID to check for SSE seems like a faster option, even with the branch. Or are you planning on supporting the 21-year old computer market as well? It seems like adding a second function for performance reasons, then ignoring an order-of-magnitude faster possibility defeats the point.

This shouldn't be part of the Float trait. The very concept of a fast rsqrt function suggests that it isn't widely applicable and indeed doesn't make a lick of sense for a hypothetical arbitrary precision floating point type.

The implementation of fast_rsqrt for f64 is unacceptable. Loosing half the precision is not a valid solution. Either it shouldn't be available at all or should be implemented in a way that doesn't produce such a massive amount of error. A brief search finds that the double-precision version is identical except for using i64 and 0x5fe6eb50c7b537a9 as the magic number.

There are no benchmarks. Irrelevant of platform capabilities, anything that touts performance needs benchmarks.

@thestinger
Copy link
Contributor

This shouldn't be part of the Float trait. The very concept of a fast rsqrt function suggests that it isn't widely applicable and indeed doesn't make a lick of sense for a hypothetical arbitrary precision floating point type.

The Float trait is no longer going to work for arbitrary precision floating point because #13597 switched it to taking all the parameters by-value. I wasn't very enthusiastic about that and to me it indicates a major language flaw - but I didn't voice much concern because it wasn't consistent in the usage of by-reference parameters.

@Aatch
Copy link
Contributor

Aatch commented Apr 24, 2014

Actually, thinking on this further, I am completely against using the Quake III algorithm at all, at least on x86. The Steam Hardware & Software Survey does not even have a line for whether or not SSE exists on the persons machine and 99.95% of computers have SSE2. In other words, less than 1 in 2000 machines that steam collects data from will crash on the rsqrtss instruction. For some context, there are more people reporting dial up than reporting a lack of SSE2.

@jacob-hegna
Copy link
Contributor Author

That's fair - I'll drop the Quake algorithm and modify the current rsqrt to use rsqrtss, sorry about the original implementation.
Should this be in a new PR?

@Aatch
Copy link
Contributor

Aatch commented Apr 24, 2014

@jacob-hegna I'd hold off on doing anything right now until the question of "is this something we want at all" is answered.

@brson
Copy link
Contributor

brson commented Apr 25, 2014

Thank you for the contribution but I'm going to close this PR without merging for the following reasons: there seems to be a lack of precedent for providing this function in a general purpose stdlib; I believe the demand for this function is minimal.

Every function that goes into std is a burden that must be carried by all Rust software forever, so if there's a reasonable case for not including it, then we often should not.

Again, thanks.

@brson brson closed this Apr 25, 2014
@jacob-hegna
Copy link
Contributor Author

No problem, just trying to help out however I can

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants