Avoid union reinterpret #16817

unknownbrackets · 2023-01-19T04:12:36Z

It seems like the consensus is that this is undefined behavior in C++, though not newer C. Let's just avoid it, whether or not it fixes the M1 thing. It does seem like it would explain it, though. It's unfortunate, this may make debug mode a bit slower for IR/interp...

Might've missed some but I think I got most of them.

-[Unknown]

hrydgard · 2023-01-19T20:37:52Z

So, somewhat disappointingly, it's not this either. Graphics are still broken in GTA on M1, at least.

I actually do think that unions should be safe enough since they're quite widely used for this purpose, although that'll probably only last to the next c++ standard or something..

So I'm a little undecided on merging this..

iota97 · 2023-01-19T22:09:29Z

If this doesn't fix Intel either we should probably check if we can get some consistence Windows/Intel Mac behavior with a "self made" sine and cosine function. Taylor should do just fine, but there sure are lot of implementation on the internet.

Can't really think of anything else on that PR...

unknownbrackets · 2023-01-19T23:51:45Z

Taylor should do just fine, but there sure are lot of implementation on the internet.

It would be best to try to match what the PSP actually generates, see #2990. I don't know if sin/cos would specifically solve that, but we already had issues that using the double version of sin/cos helped. It's important that the results line up with the PSP results.

So, somewhat disappointingly, it's not this either. Graphics are still broken in GTA on M1, at least.

Darn, it would've explained the strange NANs. That said, the screenshots showed it working on softgpu for Intel, which is (if I understand correctly) not the case for the M1. So they're different problems, apparently. Though, these unions are used in both cases anyway, I'd think...

-[Unknown]

Kethen · 2023-01-20T09:01:48Z

Sadly it also does not help on intel

That said, the screenshots showed it working on softgpu for Intel

If you're referring to the screenshots I took on software rendering, each screenshot was taken with the windows version running in wine on top, native version under; the native MacOS version has the same rendering issues even on software rendering

Clarification:

On intel x86_64 hardware, it works in wine with Windows builds but not native MacOS native builds, and bisect points to 0ccc63b where the issue first appeared on MacOS x86_64 builds

With MacOS x86_64 builds, the issue is observed with and without software rendering enabled

Given how the Windows x86_64 builds are issue-free in Wine even on MacOS, softgpu has the same issue on MacOS x86_64 builds, I made a guess that the issue resides in the different compilers used

Kethen · 2023-01-20T09:46:24Z

@hrydgard since bisect was done on Intel x86_64 builds, is it possible to build and test dbe6658 (issue-free on intel) and 0ccc63b (issue first observed on intel) on your M1 device to find out whether there is a separated issue originated from arm64/x86_64?

hrydgard · 2023-01-20T09:53:08Z

Well, when I get around to it, I'll start by reproducing on my old Intel Mac, and start taking that commit apart..

iota97 · 2023-01-20T21:57:40Z

It would be best to try to match what the PSP actually generates, see #2990.

Don't get me wrong, I just wanted to propose a quick test to see if like this Windows and Mac behaved the same, both broken or not doesn't really matter as it would give us some good hint on what's the problem.

Different implementation could give zero or different sign leading to NaN on some edge cases.

unknownbrackets · 2023-01-22T15:46:51Z

Well, if it doesn't help either, I suppose we should close this. I wish unions were definitively correct for these types of conversions.

I guess a quick check would be trying to switch back to sinf/cosf/sincosf. That would cause problems, but if it undoes the breakage it would answer questions.

-[Unknown]

hrydgard · 2023-01-22T16:30:52Z

Yeah, will try some stuff like that soon.

fp64 · 2023-01-25T14:01:21Z

It would be best to try to match what the PSP actually generates

Is there a table somewhere of known 'input->output' pairs for PSP FPU functions?
In case someone wants to try to reverse-engineer it.
Full dump would be 16GB per function, which is probably not feasible.

fp64 · 2023-02-02T14:53:34Z

Addendum (maybe it's offtopic here?).
Does the code from https://github.com/pspdev/pspsdk/blob/master/src/fpu/pspfpu.c match what PSP produces (I suspect not)?

So, as I understand it, vfpu_sincos actually does sinpi(0.5f*x), cospi(0.5f*) (sinpi and cospi are supposed to become part of <math.h> in C23).

I think the current implementation, i.e. (float)sin((double)x*M_PI_2), (float)cos((double)x*M_PI_2), should (for a decent library implementations of sin, cos) produce correctly rounded result on [-1;+1] except, possibly, the following values:

sin:

x	Bitpattern	Correctly rounded	Bitpattern
8.54222264e-07f	0x35654DB5u	1.34180914e-06f	0x35B41836u
4.48000446e-06f	0x369652F2u	7.03717433e-06f	0x36EC20DEu
6.12366421e-06f	0x36CD79E0u	9.6190297e-06f	0x3721616Bu
0.000104116247f	0x38DA58FCu	0.000163545425f	0x392B7D64u
0.000111957939f	0x38EACAF6u	0.000175863112f	0x393867E5u
0.0582824349f	0x3D6EB990u	0.0914220065f	0x3DBB3B76u
0.125765383f	0x3E00C8A4u	0.196269348f	0x3E48FAD5u
0.24755609f	0x3E7D7F58u	0.37913394f	0x3EC21DD8u

cos:

x	Bitpattern	Correctly rounded	Bitpattern
0.000269203563f	0x398D23E4u	0.99999994f	0x3F7FFFFFu
0.00051548559f	0x3A0721A7u	0.999999702f	0x3F7FFFFBu
0.00505788904f	0x3BA5BCA6u	0.99996841f	0x3F7FFDEEu
0.0276315138f	0x3CE25B7Cu	0.999058247f	0x3F7FC248u
0.75244391f	0x3F40A02Au	0.37913394f	0x3EC21DD8u
0.874234617f	0x3F5FCDD7u	0.196269348f	0x3E48FAD5u
0.941717565f	0x3F711467u	0.0914220065f	0x3DBB3B76u

with cos for x=0.75244391f being the only one actually wrong on my gcc, accurate values being:

Expression	Value
12721624/2^25	3.791339397430419921875
cos(6311957/2^23*π/2)	3.79133954644203171221131811...
12721624.5/2^25	3.7913395464420318603515625
12721625/2^25	3.791339695453643798828125

unknownbrackets · 2023-02-06T07:48:30Z

match what PSP produces (I suspect not)?

I'm sure it doesn't. I did create a table of inputs/outputs to do some simulations against. Here's an example:

input	output	simulated	error
0x44400009	0x3a623240	0x3a6231d4	-442368
0x4440002c	0x3b8a3ae0	0x3b8a3ac8	-786432
0x409a158a	0x3f754810	0x3f754814	16777216
0x35d36040	0x36234000	0x362603a8	2898560
0x35d360ff	0x36234000	0x3626043c	2900928

I only saved the full table as a CSV and it's quite large, but maybe I can convert it to something smaller and attach it somewhere.

As I recall, it was clear the the result was also only correct to a certain number of bits in "infinite bits" space, before normalizing the exponent. Specifically, exp 0x68 (-24) always had zeroes in the lowest 18 bits, until 0x7A (-6) which could have all bits set in the result. That's just what I had in my notes that it was looking like.

At the time I was thinking maybe it used CORDIC because of the vrot instruction, but knowing that it definitely happens in two steps internally now, I guess a taylor expansion is more likely, possibly just at a fixed precision.

-[Unknown]

hrydgard · 2023-02-06T08:59:27Z

@unknownbrackets Just curious, the error column seems a little .. off? Or what does it represent?

unknownbrackets · 2023-02-06T14:42:56Z

I don't remember, it might've just been bits of mantissa error (i.e. error as an integer adjusted to a certain exp or something.)

-[Unknown]

fp64 · 2023-02-06T18:21:21Z

i.e. error as an integer adjusted to a certain exp or something

It matches error==(bit_cast<float>(simulated)-bit_cast<float>(output))*pow(2.0,46.0).

Yes, seeing these 0's in LSB is interesting. It's clearly far from correctly rounded.

fp64 · 2023-02-06T19:45:29Z

output	as float	as normalized	as fixed
0x3A623240	0.000862870365	0xE2324*2^-30	0x00388C9*2^-28
0x3B8A3AE0	0.00421844423	0x8A3AE*2^-27	0x011475C*2^-28
0x3F754810	0.958130836	0xF5481*2^-20	0xF548100*2^-28
0x36234000	2.43261456e-06	0xA3400*2^-38	0x000028D*2^-28
0x36234000	2.43261456e-06	0xA3400*2^-38	0x000028D*2^-28

So, at least in this table:

output is always integer (sometimes odd integer) divided by 2^28.
output has at most (sometimes exactly) 20 significant binary digits.

unknownbrackets · 2023-02-06T20:55:26Z

Here's a quick upload of the csv (compressed using zstd):
https://forums.ppsspp.org/uploads/sin.csv.zst

-[Unknown]

fp64 · 2023-02-06T23:10:50Z

Downloaded, thanks!

hrydgard · 2023-02-07T17:49:00Z

I'm gonna go ahead and close this, as whatever the standard may say, unions are well support and convenient for this. It turned out to be something completely different in #15149 .

unknownbrackets · 2023-02-08T02:27:29Z

Fair enough, I'd definitely prefer to keep the unions as long as they're valid everywhere and stay valid.....

If we figure out the actual model of sin/cos, might be nice to use it (even optionally.)

I'm also now very curious if the wrong driving in #2990 is differently wrong on macOS.

-[Unknown]

unknownbrackets added 3 commits January 18, 2023 18:17

Core: Avoid float/int32 union aliasing for VFPU.

bc1a894

interp: Use memcpy instead of union for float conv.

5ddcd98

arm64jit: Avoid using a union for sin/cos.

977182e

unknownbrackets added this to the v1.15.0 milestone Jan 19, 2023

unknownbrackets mentioned this pull request Jan 19, 2023

GTA : LCS - All character and vehicle models are invisible on M1 Mac #15149

Closed

5 tasks

unknownbrackets added 6 commits January 18, 2023 21:30

Debugger: Avoid int/float conv through union.

36b3665

GPU: Avoid int/float conv through union.

9d118eb

Cheats: Avoid int/float conv through union.

4ca11af

Common: Avoid union in half/float conv.

f55d537

irjit: Avoid int/float conv through union.

cfdefdf

interp: Remove float/int unions in interp.

14bc0f9

unknownbrackets force-pushed the avoid-union-reinterpret branch from 617b236 to 14bc0f9 Compare January 19, 2023 05:30

hrydgard closed this Feb 7, 2023

unknownbrackets deleted the avoid-union-reinterpret branch February 8, 2023 02:26

unknownbrackets removed this from the v1.15.0 milestone Feb 8, 2023

fp64 mentioned this pull request Feb 9, 2023

Improving accuracy of vfpu_sin #16946

Open

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Avoid union reinterpret #16817

Avoid union reinterpret #16817

unknownbrackets commented Jan 19, 2023

hrydgard commented Jan 19, 2023 •

edited

Loading

iota97 commented Jan 19, 2023

unknownbrackets commented Jan 19, 2023

Kethen commented Jan 20, 2023 •

edited

Loading

Kethen commented Jan 20, 2023

hrydgard commented Jan 20, 2023

iota97 commented Jan 20, 2023

unknownbrackets commented Jan 22, 2023

hrydgard commented Jan 22, 2023

fp64 commented Jan 25, 2023 •

edited

Loading

fp64 commented Feb 2, 2023 •

edited

Loading

unknownbrackets commented Feb 6, 2023

hrydgard commented Feb 6, 2023

unknownbrackets commented Feb 6, 2023 •

edited

Loading

fp64 commented Feb 6, 2023

fp64 commented Feb 6, 2023

unknownbrackets commented Feb 6, 2023

fp64 commented Feb 6, 2023

hrydgard commented Feb 7, 2023

unknownbrackets commented Feb 8, 2023

Avoid union reinterpret #16817

Avoid union reinterpret #16817

Conversation

unknownbrackets commented Jan 19, 2023

hrydgard commented Jan 19, 2023 • edited Loading

iota97 commented Jan 19, 2023

unknownbrackets commented Jan 19, 2023

Kethen commented Jan 20, 2023 • edited Loading

Kethen commented Jan 20, 2023

hrydgard commented Jan 20, 2023

iota97 commented Jan 20, 2023

unknownbrackets commented Jan 22, 2023

hrydgard commented Jan 22, 2023

fp64 commented Jan 25, 2023 • edited Loading

fp64 commented Feb 2, 2023 • edited Loading

unknownbrackets commented Feb 6, 2023

hrydgard commented Feb 6, 2023

unknownbrackets commented Feb 6, 2023 • edited Loading

fp64 commented Feb 6, 2023

fp64 commented Feb 6, 2023

unknownbrackets commented Feb 6, 2023

fp64 commented Feb 6, 2023

hrydgard commented Feb 7, 2023

unknownbrackets commented Feb 8, 2023

hrydgard commented Jan 19, 2023 •

edited

Loading

Kethen commented Jan 20, 2023 •

edited

Loading

fp64 commented Jan 25, 2023 •

edited

Loading

fp64 commented Feb 2, 2023 •

edited

Loading

unknownbrackets commented Feb 6, 2023 •

edited

Loading