Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid union reinterpret #16817

Closed

Conversation

unknownbrackets
Copy link
Collaborator

It seems like the consensus is that this is undefined behavior in C++, though not newer C. Let's just avoid it, whether or not it fixes the M1 thing. It does seem like it would explain it, though. It's unfortunate, this may make debug mode a bit slower for IR/interp...

Might've missed some but I think I got most of them.

-[Unknown]

@unknownbrackets unknownbrackets force-pushed the avoid-union-reinterpret branch from 617b236 to 14bc0f9 Compare January 19, 2023 05:30
@hrydgard
Copy link
Owner

hrydgard commented Jan 19, 2023

So, somewhat disappointingly, it's not this either. Graphics are still broken in GTA on M1, at least.

I actually do think that unions should be safe enough since they're quite widely used for this purpose, although that'll probably only last to the next c++ standard or something..

So I'm a little undecided on merging this..

@iota97
Copy link
Contributor

iota97 commented Jan 19, 2023

If this doesn't fix Intel either we should probably check if we can get some consistence Windows/Intel Mac behavior with a "self made" sine and cosine function. Taylor should do just fine, but there sure are lot of implementation on the internet.

Can't really think of anything else on that PR...

@unknownbrackets
Copy link
Collaborator Author

Taylor should do just fine, but there sure are lot of implementation on the internet.

It would be best to try to match what the PSP actually generates, see #2990. I don't know if sin/cos would specifically solve that, but we already had issues that using the double version of sin/cos helped. It's important that the results line up with the PSP results.

So, somewhat disappointingly, it's not this either. Graphics are still broken in GTA on M1, at least.

Darn, it would've explained the strange NANs. That said, the screenshots showed it working on softgpu for Intel, which is (if I understand correctly) not the case for the M1. So they're different problems, apparently. Though, these unions are used in both cases anyway, I'd think...

-[Unknown]

@Kethen
Copy link
Contributor

Kethen commented Jan 20, 2023

Sadly it also does not help on intel

That said, the screenshots showed it working on softgpu for Intel

If you're referring to the screenshots I took on software rendering, each screenshot was taken with the windows version running in wine on top, native version under; the native MacOS version has the same rendering issues even on software rendering

Clarification:

On intel x86_64 hardware, it works in wine with Windows builds but not native MacOS native builds, and bisect points to 0ccc63b where the issue first appeared on MacOS x86_64 builds

With MacOS x86_64 builds, the issue is observed with and without software rendering enabled

Given how the Windows x86_64 builds are issue-free in Wine even on MacOS, softgpu has the same issue on MacOS x86_64 builds, I made a guess that the issue resides in the different compilers used

@Kethen
Copy link
Contributor

Kethen commented Jan 20, 2023

@hrydgard since bisect was done on Intel x86_64 builds, is it possible to build and test dbe6658 (issue-free on intel) and 0ccc63b (issue first observed on intel) on your M1 device to find out whether there is a separated issue originated from arm64/x86_64?

@hrydgard
Copy link
Owner

Well, when I get around to it, I'll start by reproducing on my old Intel Mac, and start taking that commit apart..

@iota97
Copy link
Contributor

iota97 commented Jan 20, 2023

It would be best to try to match what the PSP actually generates, see #2990.

Don't get me wrong, I just wanted to propose a quick test to see if like this Windows and Mac behaved the same, both broken or not doesn't really matter as it would give us some good hint on what's the problem.

Different implementation could give zero or different sign leading to NaN on some edge cases.

@unknownbrackets
Copy link
Collaborator Author

Well, if it doesn't help either, I suppose we should close this. I wish unions were definitively correct for these types of conversions.

I guess a quick check would be trying to switch back to sinf/cosf/sincosf. That would cause problems, but if it undoes the breakage it would answer questions.

-[Unknown]

@hrydgard
Copy link
Owner

Yeah, will try some stuff like that soon.

@fp64
Copy link
Contributor

fp64 commented Jan 25, 2023

It would be best to try to match what the PSP actually generates

Is there a table somewhere of known 'input->output' pairs for PSP FPU functions?
In case someone wants to try to reverse-engineer it.
Full dump would be 16GB per function, which is probably not feasible.

@fp64
Copy link
Contributor

fp64 commented Feb 2, 2023

Addendum (maybe it's offtopic here?).
Does the code from https://github.com/pspdev/pspsdk/blob/master/src/fpu/pspfpu.c match what PSP produces (I suspect not)?

So, as I understand it, vfpu_sincos actually does sinpi(0.5f*x), cospi(0.5f*) (sinpi and cospi are supposed to become part of <math.h> in C23).

I think the current implementation, i.e. (float)sin((double)x*M_PI_2), (float)cos((double)x*M_PI_2), should (for a decent library implementations of sin, cos) produce correctly rounded result on [-1;+1] except, possibly, the following values:

sin:

x Bitpattern Correctly rounded Bitpattern
8.54222264e-07f 0x35654DB5u 1.34180914e-06f 0x35B41836u
4.48000446e-06f 0x369652F2u 7.03717433e-06f 0x36EC20DEu
6.12366421e-06f 0x36CD79E0u 9.6190297e-06f 0x3721616Bu
0.000104116247f 0x38DA58FCu 0.000163545425f 0x392B7D64u
0.000111957939f 0x38EACAF6u 0.000175863112f 0x393867E5u
0.0582824349f 0x3D6EB990u 0.0914220065f 0x3DBB3B76u
0.125765383f 0x3E00C8A4u 0.196269348f 0x3E48FAD5u
0.24755609f 0x3E7D7F58u 0.37913394f 0x3EC21DD8u

cos:

x Bitpattern Correctly rounded Bitpattern
0.000269203563f 0x398D23E4u 0.99999994f 0x3F7FFFFFu
0.00051548559f 0x3A0721A7u 0.999999702f 0x3F7FFFFBu
0.00505788904f 0x3BA5BCA6u 0.99996841f 0x3F7FFDEEu
0.0276315138f 0x3CE25B7Cu 0.999058247f 0x3F7FC248u
0.75244391f 0x3F40A02Au 0.37913394f 0x3EC21DD8u
0.874234617f 0x3F5FCDD7u 0.196269348f 0x3E48FAD5u
0.941717565f 0x3F711467u 0.0914220065f 0x3DBB3B76u

with cos for x=0.75244391f being the only one actually wrong on my gcc, accurate values being:

Expression Value
12721624/2^25 3.791339397430419921875
cos(6311957/2^23*π/2) 3.79133954644203171221131811...
12721624.5/2^25 3.7913395464420318603515625
12721625/2^25 3.791339695453643798828125

@unknownbrackets
Copy link
Collaborator Author

match what PSP produces (I suspect not)?

I'm sure it doesn't. I did create a table of inputs/outputs to do some simulations against. Here's an example:

input output simulated error
0x44400009 0x3a623240 0x3a6231d4 -442368
0x4440002c 0x3b8a3ae0 0x3b8a3ac8 -786432
0x409a158a 0x3f754810 0x3f754814 16777216
0x35d36040 0x36234000 0x362603a8 2898560
0x35d360ff 0x36234000 0x3626043c 2900928

I only saved the full table as a CSV and it's quite large, but maybe I can convert it to something smaller and attach it somewhere.

As I recall, it was clear the the result was also only correct to a certain number of bits in "infinite bits" space, before normalizing the exponent. Specifically, exp 0x68 (-24) always had zeroes in the lowest 18 bits, until 0x7A (-6) which could have all bits set in the result. That's just what I had in my notes that it was looking like.

At the time I was thinking maybe it used CORDIC because of the vrot instruction, but knowing that it definitely happens in two steps internally now, I guess a taylor expansion is more likely, possibly just at a fixed precision.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Feb 6, 2023

@unknownbrackets Just curious, the error column seems a little .. off? Or what does it represent?

@unknownbrackets
Copy link
Collaborator Author

unknownbrackets commented Feb 6, 2023

I don't remember, it might've just been bits of mantissa error (i.e. error as an integer adjusted to a certain exp or something.)

-[Unknown]

@fp64
Copy link
Contributor

fp64 commented Feb 6, 2023

i.e. error as an integer adjusted to a certain exp or something

It matches error==(bit_cast<float>(simulated)-bit_cast<float>(output))*pow(2.0,46.0).

Yes, seeing these 0's in LSB is interesting. It's clearly far from correctly rounded.

@fp64
Copy link
Contributor

fp64 commented Feb 6, 2023

output as float as normalized as fixed
0x3A623240 0.000862870365 0xE2324*2^-30 0x00388C9*2^-28
0x3B8A3AE0 0.00421844423 0x8A3AE*2^-27 0x011475C*2^-28
0x3F754810 0.958130836 0xF5481*2^-20 0xF548100*2^-28
0x36234000 2.43261456e-06 0xA3400*2^-38 0x000028D*2^-28
0x36234000 2.43261456e-06 0xA3400*2^-38 0x000028D*2^-28

So, at least in this table:

  1. output is always integer (sometimes odd integer) divided by 2^28.
  2. output has at most (sometimes exactly) 20 significant binary digits.

@unknownbrackets
Copy link
Collaborator Author

Here's a quick upload of the csv (compressed using zstd):
https://forums.ppsspp.org/uploads/sin.csv.zst

-[Unknown]

@fp64
Copy link
Contributor

fp64 commented Feb 6, 2023

Downloaded, thanks!

@hrydgard
Copy link
Owner

hrydgard commented Feb 7, 2023

I'm gonna go ahead and close this, as whatever the standard may say, unions are well support and convenient for this. It turned out to be something completely different in #15149 .

@hrydgard hrydgard closed this Feb 7, 2023
@unknownbrackets unknownbrackets deleted the avoid-union-reinterpret branch February 8, 2023 02:26
@unknownbrackets
Copy link
Collaborator Author

Fair enough, I'd definitely prefer to keep the unions as long as they're valid everywhere and stay valid.....

If we figure out the actual model of sin/cos, might be nice to use it (even optionally.)

I'm also now very curious if the wrong driving in #2990 is differently wrong on macOS.

-[Unknown]

@unknownbrackets unknownbrackets removed this from the v1.15.0 milestone Feb 8, 2023
@fp64 fp64 mentioned this pull request Feb 9, 2023
2 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants