Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ridge Racers (USJS00001) - CPU autodrive was Algorithm buggy #2990

Open
triglav1024 opened this issue Jul 30, 2013 · 57 comments
Open

Ridge Racers (USJS00001) - CPU autodrive was Algorithm buggy #2990

triglav1024 opened this issue Jul 30, 2013 · 57 comments

Comments

@triglav1024
Copy link

triglav1024 commented Jul 30, 2013

Options -> AV player mode
https://www.youtube.com/watch?v=hRrVBM2-OWc
https://www.youtube.com/watch?v=XJkM729PeeE
https://www.youtube.com/watch?v=3eQ7BlocmUo

Ridge Racers - JP 1.01 / USA 1.00 / EUR 1.00 / HK 1.00 / Asia 1.00

@thedax
Copy link
Collaborator

thedax commented Jul 30, 2013

This is probably some sort of CPU or VFPU bug, I'd guess. A similar behaviour occurs in Dolphin with replays in Mario Kart Wii.

One question though, does the bug occur immediately in every replay, like say you start the game and then launch a replay, or does it take say 10-20 minutes for it to appear? If yes, as a workaround, try using the Unlock CPU Speed option/hack since some games get buggy if the emulated PSP CPU speed is changed often. I'd check the debug log to see if it's using scePowerSetClockFrequency often(#2104).

@triglav1024
Copy link
Author

I could not think, the CPU hack.I had set the clock to 333Khz.
However, it was the same whether you set the default clock.

This behavior has occurred immediately after startup. Every time, I will develop after 40 seconds from the start.
In addition, there is no randomness, the same car is always selected. And buggy ....

@unknownbrackets
Copy link
Collaborator

Has this improved at all, or does it still do this? There were some timing fixes not that long ago.

-[Unknown]

@ppmeis
Copy link
Contributor

ppmeis commented May 15, 2014

I just test this issue. All replays made by CPU are buggy: car makes strange things during race (like constantly hit the wall), But personal replays works fine.

Tested with latest build 0.9.8-676

@unknownbrackets
Copy link
Collaborator

Could this have possibly improved with the vrot fix?

Does having jit off affect it?

-[Unknown]

@ppmeis
Copy link
Contributor

ppmeis commented Jul 22, 2014

Tested with latest build. CPU replays still buggy:
image

Jit off does not help:
image

@unknownbrackets
Copy link
Collaborator

Does this still happen in the latest git build?

Make sure you don't have that GEB save compat thing changed from the default.

-[Unknown]

@ppmeis
Copy link
Contributor

ppmeis commented Aug 25, 2014

Tested with latest build, bug is still present:
image

@ppmeis
Copy link
Contributor

ppmeis commented Feb 1, 2015

Tested with latest build. Same status:
image

@unknownbrackets
Copy link
Collaborator

I have the US version of this game, but have not really played it much.

What's the easiest and fastest way to reproduce this issue from scratch (e.g. no savedata / blank slate)? I want to try to see if I can at least cause the autodrive to be wrong in different ways.

Edit: hmm, I think I can repro without savedata actually, n/m.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Excluding alu and lsu like lv/sv/lwc/swc/mt*/mf* type instructions, here's a list of the ones this game does during the AV thing. The value in parens is number of times it was hit, I've moved all the super unlikely ones to the bottom.

mul.s     (54993288)  // Small error = major driving glitches.
add.s     (28356670)  // Small error = major driving glitches.
c.le      (13832384)  // Change to lt = driving glitches happen differently.
sub.s     (13597462)  // Small error = major driving glitches.
vdot      (12944555)  // MAYBE: Introducing a small error makes glitches happen quicker.
vadd      (9333422)   // Small error = major driving glitches.
vscl      (8506915)   // Small error = major driving glitches.
vsub      (5637652)   // Small error = driving glitches happen differently.
trunc.w.s (5548058)   // Small error = driving glitches happen differently.
cvt.s.w   (4223431)   // Small error = major driving glitches.
vsqrt     (3891304)   // MAYBE: Introducing a small error makes glitches MUCH worse.
div.s     (3243544)   // Small error = major driving glitches.
v(h)tfm4  (2253749)   // MAYBE: Introducing a small error makes glitches MUCH worse.
vpfxt     (2184294)   // Ignore = no driving change, but major gfx glitches.  Might still be wrong prefix handling.
vrsq      (971862)    // MAYBE: Introducing a small error makes glitches MUCH worse.
v(h)tfm3  (783310)    // Small error = major driving glitches.
vdiv      (572562)    // Small error = driving glitches happen differently.
sqrt.s    (533778)    // Small error = driving glitches happen differently.
vcrsp.t   (100890)    // MAYBE: Introducing a small error makes glitches happen quicker.

c.lt      (29268876)  // Any change = breaks everything, but unlikely.
mov.s     (15284356)
vone      (9659339)
neg.s     (3638275)
c.eq      (1284924)   // Any change = breaks everything, but unlikely.
abs.s     (562281)    // Small error = crash, unplayable... unlikely.
vmov      (452786)
vneg      (276140)
vmidt     (180679)
vmmov     (62622)
vzero     (14834)

vmul      (4225115)   // Not so small error = no difference.
vi2f      (1135038)   // Makes no difference.
vi2uc     (568000)    // Makes no difference.
vabs      (567519)    // Not so small error = no difference.
cvt.w.s   (379639)    // Not so small error = no difference.
vrot      (284752)    // Not so small error = no difference.
vmmul     (164305)    // Not so small error = no difference.
vqmul.q   (100890)    // Not so small error = no difference.
vcos      (21269)     // Not so small error = no difference.
vrcp      (15796)     // Makes no difference.
vsin      (8350)      // Not so small error = no difference.
vf2iz     (481)       // Makes no difference, even if hardcoded (but graphical glitches yes.)
vrndf1    (245)       // Makes no difference.

AFAICT, it does not change the rounding mode ever from the default.

If it's not a cpu instruction, then maybe it's timing somehow. But man, every almost instruction I try has a major impact on driving, so it could be anything...

-[Unknown]

@ppmeis
Copy link
Contributor

ppmeis commented Mar 2, 2015

@unknownbrackets as simple as navigate to Settings > AV Player and select Accept, then autodrive will start.

Tested with latest build. Same status.

@hrydgard
Copy link
Owner

hrydgard commented Mar 2, 2015

Hm, vrndf1 seems like a suspicious candidate - IIRC we don't reseed the random number generator when a game would write directly to the random context registers of the VFPU. But if it doesn't make a difference if you modify it, then unless the game depends on a particular sequence (that we can't repro anyway as we don't know how the PSP's rndgen works) it's probably not it...

@unknownbrackets
Copy link
Collaborator

I thought so too, but no matter what result I make that generate (I tried statically generating 0, 0.5, and I think one other number), it is the same exact incorrect driving, so seems like it can't be that one...

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Okay, well, I've eliminated as many instructions as I could:
#2990 (comment)

Still not guaranteeed to be a cpu bug...

-[Unknown]

@ppmeis
Copy link
Contributor

ppmeis commented Jul 25, 2015

Tested with latest build. Same status:
image

@unknownbrackets
Copy link
Collaborator

Some stats (not sure if useful) showing float usage of various instructions from game start until after the game has clearly gone wrong.

Leftmost number is total floats processed. Then Infinity, NaN, negative zero, and subnormals/denormals.

Since it really goes off a cliff at one point, I was thinking it's possible this is subnormal related... it doesn't ever set the flush to zero flag.

mul.s:      128779215, INF:0     NAN:0     NZ:2239473 SUB:11966
neg.s:      5245302,   INF:0     NAN:0     NZ:79209   SUB:130  
mov.s:      22309050,  INF:0     NAN:5392  NZ:260968  SUB:528    NAN:7fffff-7fffff
vcos:       51246,     INF:0     NAN:0     NZ:0       SUB:0    
vi2f:       3481408,   INF:0     NAN:0     NZ:0       SUB:0    
vadd:       55346715,  INF:0     NAN:0     NZ:413464  SUB:12001
cvt.s.w:    3348734,   INF:0     NAN:0     NZ:0       SUB:0    
div.s:      7354260,   INF:0     NAN:0     NZ:7887    SUB:0    
c.le:       20980612,  INF:0     NAN:0     NZ:40030   SUB:2494 
add.s:      67100736,  INF:0     NAN:0     NZ:1448224 SUB:3376 
trunc.w.s:  4127106,   INF:0     NAN:0     NZ:0       SUB:2448 
sub.s:      30248208,  INF:0     NAN:0     NZ:349366  SUB:5221 
cvt.w.s:    268353,    INF:0     NAN:0     NZ:0       SUB:0    
vf2in:      1740704,   INF:0     NAN:0     NZ:0       SUB:0    
c.eq:       1906462,   INF:0     NAN:0     NZ:425     SUB:40   
abs.s:      874098,    INF:0     NAN:0     NZ:1214    SUB:0    
c.lt:       44685816,  INF:0     NAN:0     NZ:175307  SUB:315  
vdot:       77938675,  INF:0     NAN:0     NZ:26024   SUB:0    
vneg:       759288,    INF:0     NAN:0     NZ:23272   SUB:0    
vrsq:       2117025,   INF:0     NAN:0     NZ:0       SUB:0    
vsat0:      3216,      INF:0     NAN:0     NZ:0       SUB:0    
vscl:       42447592,  INF:0     NAN:0     NZ:124420  SUB:0    
vsub:       33862734,  INF:0     NAN:0     NZ:1483129 SUB:1050 
vsqrt:      7829373,   INF:0     NAN:0     NZ:0       SUB:0    
sqrt.s:     791694,    INF:0     NAN:0     NZ:0       SUB:0    
vcrsp/vqmu: 685962,    INF:0     NAN:0     NZ:11169   SUB:0    
v(h)tfm3:   8347410,   INF:0     NAN:0     NZ:37392   SUB:80   
vrot:       876736,    INF:0     NAN:0     NZ:66106   SUB:0    
vmmul:      5170227,   INF:0     NAN:0     NZ:82346   SUB:0    
vmov:       4018263,   INF:0     NAN:0     NZ:0       SUB:199248
vmmov:      811746,    INF:0     NAN:0     NZ:0       SUB:0    
v(h)tfm4:   40416408,  INF:0     NAN:0     NZ:11657   SUB:0    
vmul:       8497689,   INF:0     NAN:0     NZ:0       SUB:0    
vabs:       2611056,   INF:0     NAN:0     NZ:0       SUB:9    
vdiv:       1316103,   INF:0     NAN:0     NZ:0       SUB:0    
vrcp:       38124,     INF:0     NAN:0     NZ:0       SUB:0    
vsin:       17454,     INF:0     NAN:0     NZ:0       SUB:0    
vrndf1:     735,       INF:0     NAN:0     NZ:0       SUB:0    
vf2iz:      1072,      INF:0     NAN:0     NZ:0       SUB:0    

-[Unknown]

@unknownbrackets
Copy link
Collaborator

I had tried some things before, but just wanted to note that I've tried forcing subnormal results to 0 (as always seems to happen with many vfpu ops) for vmul/vadd/vsub/vtfm3/vhtfm3/etc., as well as forcing nan to 0x7f800001. There was no change in the failure.

I do think there's a good chance it's related to multiply accuracy.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Jun 9, 2019

Update: as a very rough measure, I tried & 0xFFFFFFFE for all the results of vtfm, vadd, vsub, vdiv, and vmul.

Normally, things go wrong right before the second tunnel. With this change, things go wrong before the third tunnel, and it looks right for longer. So this is promising.

Trying to dig into which instruction gets tougher, though. Just disabling the masking for one op at once:

  • vtfm without mask: still lasts longer, but goes wrong slightly earlier than all masked.
  • vdiv without mask: goes wrong even earlier than normal.
  • vmul without mask: better than no masking, but breaks within the second tunnel.
  • vsub without mask: very similar to vmul disabled.
  • vadd without mask: goes wrong much earlier than with all masked.

A few other instructions didn't seem to matter, like vmmul or vdot. That said, obviously this doesn't implicate any of the above instructions - it could be that rounding at vsub masks a problem that is really in vmul, or even in vdot.

The important bit here is that rounding/precision is almost definitely at issue here.

For clarity, changing the rounding mode doesn't help things, so it's more complex than that.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Jun 9, 2019

I think that indeed confirms that precision/rounding is the culprit. Masking like that is not likely to accurately simulate the issues though, of course.

I believe in the FTZ thing plus probably a slightly lower-precision dot product implemented in the VFPU hardware (in addition to approximations in vrot and similar). VTFM is very likely to use that hardware dot product.

I think the dot product precision issues could be shown by trying things like dotting a=(1.0, 1.0, 1.0, 1.0) and b = (0.000001, 0.000001, 0.000001, 1.0), and the reverse of b with 1.0 first. The 0.000001 constant should be adjusted so that the sum of three of them just breaks into the precision that's still available when the exponent is set to be able to represent 1.0. That way, if the dot product summing uses collective mantissa alignment and then summing up the mantissas, we'd get the same results if the 1.0 was first or last or whereever, whereas if it's computed like we do by simply summing up the products from left to right, we should get different results.

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Jun 9, 2019

For posterity:

{ 0x3F800000, 0x33800000, 0x33800000, 0x33800000 }
{ 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 }
= 0x3f800001

{ 0x33800000, 0x33800000, 0x33800000, 0x3F800000 }
{ 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 }
= 0x3f800001

{ 0x3F800000, 0x34000000, 0x00000000, 0x00000000 }
{ 0x3F800000, 0x3F800000, 0x3F800000, 0x3F800000 }
= 0x3f800001

{ 0x100BF8FE, 0x581F4DA5, 0x00000000, 0x00000000 }
{ 0x3F800000, 0x0207F3ED, 0x00000000, 0x00000000 }
= 0x1aa9337c

Since order doesn't matter, potentially it's aligning the exponents first and the summing. It'll be interesting to find if vhdp, vfad, vavg, or other ops have similar behavior.

For clarity on anyone reading this, the first two above sums are (base 2):

1.000000000000000000000000 * 1 +
0.000000000000000000000001 * 1 +
0.000000000000000000000001 * 1 +
0.000000000000000000000001 * 1 =
--------------------------
1.000000000000000000000011 = 0x3f800001

Which becomes 1.00000000000000000000001 because of limited mantissa, therefore 0x3f800001. I also tried:

1.000000000000000000000000 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000011 * 1 =
--------------------------
1.000000000000000000000111 = 0x3f800003

1.000000000000000000000000 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000001 * 1 =
--------------------------
1.000000000000000000000101 = 0x3f800002

1.000000000000000000000000 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 +
0.000000000000000000000010 * 1 =
--------------------------
1.000000000000000000000110 = 0x3f800003

Which all truncated as expected (was trying to verify any rounding behavior.)

Also confirmed the behavior is identical (just with a flipped sign) if I flip the sign of the first vector (meaning it doesn't truncate differently for negative.)

-[Unknown]

@unknownbrackets
Copy link
Collaborator

unknownbrackets commented Jun 9, 2019

Okay, using this:
https://gist.github.com/unknownbrackets/e5bdd06cd8d85712fc51bd7b7707cfd1

Which gets pretty good results (note: multiplying to a temporary float[4] first):

  FMA error: CORRECT 1aa9337c / 0.000000
  1.0*1.0 + 1.0*1.0^-23: CORRECT 3f800001 / 1.000000
  1.0*1.0 + 1.0*1.0^-24 + 1.0*1.0^-24 + 1.0*1.0^-24: CORRECT 3f800001 / 1.000000
  1.0*1.0^-24 + 1.0*1.0^-24 + 1.0*1.0^-24 + 1.0*1.0: CORRECT 3f800001 / 1.000000
  1.0*1.0 + 1.0*1.0^-23 + 1.0*1.0^-23 + 1.0*1.0^-24: CORRECT 3f800002 / 1.000000
  1.0*1.0 + 1.0*1.0^-23 + 1.0*1.0^-23 + 1.0*1.0^-23: CORRECT 3f800003 / 1.000000
  1.0*1.0 + 1.0*1.0^-23 + 1.0*1.0^-23 + 1.1*1.0^-23: CORRECT 3f800003 / 1.000000
  1.0*-1.0 + 1.0*-1.0^-23 + 1.0*-1.0^-23 + 1.1*-1.0^-23: CORRECT bf800003 / -1.000000
  Simulate case 1: CORRECT c75864aa / -55396.664062
  Simulate case 2: CORRECT c7fb200f / -128576.117188
  Simulate case 3: CORRECT c5972dcb / -4837.724121
  Simulate case 4: CORRECT 42222309 / 40.534214
  Simulate case 5: WRONG 3d84e134 / 0.064883  vs  3d84e130 / 0.064883
  Simulate case 5 DEBUG: beb4194f + bdbb66eb + 3f0215ab + 00000000
  Simulate case 5 DEBUG: -0.351756 + -0.091505 + 0.508143 + 0.000000
  Simulate case 6: CORRECT 4136c004 / 11.421879

FWIW case 5 is (I sampled the most different results from Ridge Racer, and used them to debug the software float add):

	ScePspIVector4 dotsim5a = { 0x3f2dc5cb, 0x3e71855a, 0x3f3206af, 0x00000000 };
	ScePspIVector4 dotsim5b = { 0xbf04a8ed, 0xbec6a2ff, 0x3f3b0f83, 0x00000000 };
	testDot("  Simulate case 5", dotsim5a, dotsim5b);

This changes the results. It goes differently wrong right before the second tunnel, but doesn't work out from there. Pretty sure we're barking up the right tree, because everything up to where it goes crazy was right and the same - and the goes crazy point acted differently.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Jun 10, 2019

Cool. It's possible though that this sequence is so sensitive that it won't work all the way through until we've fixed both the FTZ issue and gotten this even more accurate...

Please as always feel free to push even very rough code to a branch or PR, would be interesting to try this on Tekken 6.

Also by the way the BSR instruction (CLZ on ARM) will let us get rid of those annoying while loops in the software add.

Additionally, floating point multiplication in software is actually even easier than addition since there's no realignment needed, just multiply the mantissas, shift down by a fixed amount, and add the exponents (with a bias to account for the 127 base).

Also it's very likely that vhdp, vfad, vavg have similar issues since they almost certainly are reusing the vdot hardware, kind of like the prefix hack ops.

@unknownbrackets
Copy link
Collaborator

Here's the branch so far:
master...unknownbrackets:vfpu-dot

-[Unknown]

@unknownbrackets
Copy link
Collaborator

It definitely is more accurate applying the same dot operation in vcrsp, though there's something odd happening with inf there. It affected Ridge Racer in probably a good way, but it still goes crazy a bit earlier than before.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

So, it's probably not sqrt.

I wrote a software sqrt, which matches vsqrt much better (sqrtf = exact match 3% of the time, vfpu_sqrt = exact match 84% of the time.) There was no change or improvement to the driving, though.

It could be hiding in the remaining 16% (seems to be a rounding issue, but I can't figure out the right logic for it), but I'd have expected some improvement if the accuracy mattered.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Oops, had a stupid mistake disabling the sqrt. It does improve things. But it also mysteriously makes the game crash (well, it was before if it ran far enough without winning, but now it does it earlier...)

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Okay, sorry for the many comments. Found the bug (max_exp == 0 vs max_exp <= 0) causing the crash, so now this is the version that gets the farthest:

https://github.com/hrydgard/ppsspp/compare/master...unknownbrackets:vfpu-dot?expand=1

It still goes crazy eventually. Maybe it's the remaining 16% of sqrt - any ideas what might be wrong there? I tried rounding up or rounding even instead of masking, but maybe wrong...

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Jul 9, 2019

Cool. But I don't think Ridge Racer is going to suddenly be fixed 100% after a single instruction is used - it's clear that its "physics" simulation uses a lot of different instructions and any of them can introduce a tiny error, which will get amplified over time and cause the simulation to fall out of sync with the replay data. It's not even certain that a single precision fix will cause the simulation results to be closer to the real thing (although as we fix more things, that does get more likely). And we still don't force FTZ on for VFPU instructions, which we really should if we don't just software emulate them all.

Anyway, this is very good progress already even if Ridge Racer isn't fixed. Who knows what other games might be helped. Unfortunately this stuff is not easy to enable globally, for fear of slowdowns...

@unknownbrackets
Copy link
Collaborator

Sure, of course. But there aren't that many instructions left unless it's FPU too. See the list. It's not like it uses sin/cos/etc. I assume Dissidia replays are affected by the same problem, but iirc they use a lot more VFPU instructions.

Also, there's some masking already applying FTZ in that branch. But if you look above, Ridge Racer isn't really sending any subnormals through most of these instructions anyway.

-[Unknown]

@hrydgard
Copy link
Owner

hrydgard commented Jul 9, 2019

Well there's vrot, vrsq and vdiv, and vsin and vcos are actually in the list you posted above? (actually never mind about the latter, I see you posted a revised list further down)

@ghost
Copy link

ghost commented Jun 15, 2020

the same thing happens on Ridge Racer 7 when played on RPCS3... The autodrive is also buggy.. and i also found another bug... My saved replays is starting to bug also...

@hrydgard
Copy link
Owner

Yeah, tiny, tiny math inaccuracies can result in this kind of thing, no surprise it happens on RPCS3 as well.

@ghost
Copy link

ghost commented Jun 15, 2020

I noticed that when i use a cheat that will alter the car's performance on Ridge Racer, the AV Player CPU car's performance would also change .. So if someone makes a cheat code that will alter the cars performance, probably we would have no Algorithm bug...

@hrydgard
Copy link
Owner

Nah, you can't conclude that. Your cheat will just be another input that will throw the algorithm off even more, while it's already definitely broken in other ways....

@ghost
Copy link

ghost commented Jun 15, 2020

I tried replicating the replays and I broke my fingers halfway on SR765...

@ghost
Copy link

ghost commented Jun 21, 2020

Actually, i managed to replicate half of the Seaside Route 765 CPU replay where you drive a Blue Raggio while racing the Angelus... I actually screwed halfway when im supposed to trigger the 2nd NOS... The Raggio drifted on the turn that im not supposed to drift then the Angelus passed me... And since the Raggio is a Dynamic car, i cant control it properly.. Also, when replicating the replays, you got to be precise on the turns or the A.I. Opponents will mess your rhythm... Anyways, here are the 6 tracks with no CPU bugs whatsoever:

Seaside Route 765: https://www.youtube.com/watch?v=kQyHEo4S4wg
Sunset Drive: https://www.youtube.com/watch?v=LsFrQ9JJ9T4
Union Hill District: https://www.youtube.com/watch?v=CgpGzMnA_54
Crismonrock Pass: https://www.youtube.com/watch?v=RURjK13Odgk
Midtown Expressway: https://www.youtube.com/watch?v=_iOCyYokMco
Greenpeak Highlands: https://www.youtube.com/watch?v=kydwDBr9MoA&t

@ghost
Copy link

ghost commented Jun 25, 2020

I tried to run Ridge Racer 6 on the Xenia emulator to test the AV Player, while hoping that it won't crash.. But despite Ridge Racer 6 just being an upgraded version of Ridge Racer PSP, I was surprised Ridge Racer 6 AV Player never bugged whatsoever... The course I played was called "Surfside Resort"...

@ghost
Copy link

ghost commented Jul 7, 2020

When i played this on Android, I noticed that JIT, IR Interpreter, and Interpreter executes the CPU Autodrive differently, causing different algorithms to happen... Try and list out the differences when using those 3 CPU Cores, and you might find that one mathematical error...

@ghost
Copy link

ghost commented Jul 23, 2020

The fact that it desyncs makes me sad because I kept watching those replays on my real PSP when the Wi-Fi dies, brown out, or if im getting bored after I finished the game... The desyncing kinda represents how this game series is getting forgotten because Ridge Racer 8 never got released and the fact that the bug is still here represents that the game got left unfinished and forgotten....

@unknownbrackets
Copy link
Collaborator

The xbox 360 and PSP have different CPUs, which is why they have different problems.

The heart of this problem is math. Games use what are called "vector" or "simd" instructions to calculate math in speed critical situations and 3D formulas. If you look here, the Xbox 360 CPU had special modifications to do dot products on the CPU faster:

https://en.wikipedia.org/wiki/Xbox_360_technical_specifications

To help you understand, let's say I was adding up these two numbers:

6628451234
984726456

Google says the result is 7613177690, which is probably accurate. But what if someone did it by hand, and got it wrong? What if they thought it was 7613177609? It's a small difference, but the small differences add up - like a "hyperspace jump" in slightly the wrong direction.

Some (but not all) of the PSP CPU's calculations were wrong - bad math. Crucially, unless we get the math wrong and get it wrong in exactly the same way - these replays won't play correctly.

Xenia probably doesn't have this problem because the Xbox 360 got high marks on its maths. Just like a modern PC or a phone, it can add, multiply, divide, and subtract correctly. So there's no need to simulate the errors.

Notably, it's probably the same reason again for RPCS3. The 7 SPEs also use inaccurate maths.

The reason these calculations were wrong? Most likely speed, power, or cost. Doing math correctly might've required more silicon, more battery juice, or might've made games run slower. These errors are at the hardware level and we don't fully understand them. We don't know exactly how it calculates square roots, and what shortcuts it's using to get a close, but wrong, value.

It's not that anyone doesn't care or wants to see the series dwindle by any means. Several people have spent hours debugging, working on, and trying to fix this very issue.

-[Unknown]

@Back2Life888
Copy link

PPSSPP 1.11.3. Issue still persists.

@Back2Life888
Copy link

Back2Life888 commented Sep 22, 2022

I discovered something odd with the desyncing replays. It wouldn't just bug out pre-recorded replays, it could also bug out your own replays. If you save a replay and then update or downgrade PPSSPP to another version, that replay may bug out and desync like the pre-recorded ones. I have some replays saved on an old PPSSPP version and the car just desyncs. Backtracking to an older version fixes the bug on some replays and some of them get fixed somehow.

@unknownbrackets
Copy link
Collaborator

This is because we've made some updates to improve accuracy in some CPU instructions. It hasn't been enough to make the pre-recorded missions play correctly, but it means that recordings from previous versions no longer play the way they used to.

This issue basically relies on specific and very accurate mathematical results, matching the same mathematical errors that the PSP CPU makes. Or at least, so we think.

-[Unknown]

@unknownbrackets
Copy link
Collaborator

Just an idea that I thought of just now but have not pursued on this:

It could be that it isn't just accuracy, but that there's some actual bug in the math equation, but things work out as long as the replay replicates it because it's small. Specifically, I don't think anyone has ever checked if there is any suspicious vector overlap cases. There's been evidence to suggest that unlike PPSSPP's code, the actual VFPU doesn't guarantee overlap safety in all cases (and when it does, it seems to do so by performing operations in reverse order.)

Probably not likely, but I have already tried flushing everything to zero, adjusting rounding modes, forcing things to the decently accurate vdot, etc.

-[Unknown]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants