-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ridge Racers (USJS00001) - CPU autodrive was Algorithm buggy #2990
Comments
This is probably some sort of CPU or VFPU bug, I'd guess. A similar behaviour occurs in Dolphin with replays in Mario Kart Wii. One question though, does the bug occur immediately in every replay, like say you start the game and then launch a replay, or does it take say 10-20 minutes for it to appear? If yes, as a workaround, try using the Unlock CPU Speed option/hack since some games get buggy if the emulated PSP CPU speed is changed often. I'd check the debug log to see if it's using scePowerSetClockFrequency often(#2104). |
I could not think, the CPU hack.I had set the clock to 333Khz. This behavior has occurred immediately after startup. Every time, I will develop after 40 seconds from the start. |
Has this improved at all, or does it still do this? There were some timing fixes not that long ago. -[Unknown] |
I just test this issue. All replays made by CPU are buggy: car makes strange things during race (like constantly hit the wall), But personal replays works fine. Tested with latest build 0.9.8-676 |
Could this have possibly improved with the vrot fix? Does having jit off affect it? -[Unknown] |
Does this still happen in the latest git build? Make sure you don't have that GEB save compat thing changed from the default. -[Unknown] |
I have the US version of this game, but have not really played it much. What's the easiest and fastest way to reproduce this issue from scratch (e.g. no savedata / blank slate)? I want to try to see if I can at least cause the autodrive to be wrong in different ways. Edit: hmm, I think I can repro without savedata actually, n/m. -[Unknown] |
Excluding alu and lsu like mul.s (54993288) // Small error = major driving glitches.
add.s (28356670) // Small error = major driving glitches.
c.le (13832384) // Change to lt = driving glitches happen differently.
sub.s (13597462) // Small error = major driving glitches.
vdot (12944555) // MAYBE: Introducing a small error makes glitches happen quicker.
vadd (9333422) // Small error = major driving glitches.
vscl (8506915) // Small error = major driving glitches.
vsub (5637652) // Small error = driving glitches happen differently.
trunc.w.s (5548058) // Small error = driving glitches happen differently.
cvt.s.w (4223431) // Small error = major driving glitches.
vsqrt (3891304) // MAYBE: Introducing a small error makes glitches MUCH worse.
div.s (3243544) // Small error = major driving glitches.
v(h)tfm4 (2253749) // MAYBE: Introducing a small error makes glitches MUCH worse.
vpfxt (2184294) // Ignore = no driving change, but major gfx glitches. Might still be wrong prefix handling.
vrsq (971862) // MAYBE: Introducing a small error makes glitches MUCH worse.
v(h)tfm3 (783310) // Small error = major driving glitches.
vdiv (572562) // Small error = driving glitches happen differently.
sqrt.s (533778) // Small error = driving glitches happen differently.
vcrsp.t (100890) // MAYBE: Introducing a small error makes glitches happen quicker.
c.lt (29268876) // Any change = breaks everything, but unlikely.
mov.s (15284356)
vone (9659339)
neg.s (3638275)
c.eq (1284924) // Any change = breaks everything, but unlikely.
abs.s (562281) // Small error = crash, unplayable... unlikely.
vmov (452786)
vneg (276140)
vmidt (180679)
vmmov (62622)
vzero (14834)
vmul (4225115) // Not so small error = no difference.
vi2f (1135038) // Makes no difference.
vi2uc (568000) // Makes no difference.
vabs (567519) // Not so small error = no difference.
cvt.w.s (379639) // Not so small error = no difference.
vrot (284752) // Not so small error = no difference.
vmmul (164305) // Not so small error = no difference.
vqmul.q (100890) // Not so small error = no difference.
vcos (21269) // Not so small error = no difference.
vrcp (15796) // Makes no difference.
vsin (8350) // Not so small error = no difference.
vf2iz (481) // Makes no difference, even if hardcoded (but graphical glitches yes.)
vrndf1 (245) // Makes no difference. AFAICT, it does not change the rounding mode ever from the default. If it's not a cpu instruction, then maybe it's timing somehow. But man, every almost instruction I try has a major impact on driving, so it could be anything... -[Unknown] |
@unknownbrackets as simple as navigate to Settings > AV Player and select Accept, then autodrive will start. Tested with latest build. Same status. |
Hm, vrndf1 seems like a suspicious candidate - IIRC we don't reseed the random number generator when a game would write directly to the random context registers of the VFPU. But if it doesn't make a difference if you modify it, then unless the game depends on a particular sequence (that we can't repro anyway as we don't know how the PSP's rndgen works) it's probably not it... |
I thought so too, but no matter what result I make that generate (I tried statically generating 0, 0.5, and I think one other number), it is the same exact incorrect driving, so seems like it can't be that one... -[Unknown] |
Okay, well, I've eliminated as many instructions as I could: Still not guaranteeed to be a cpu bug... -[Unknown] |
Some stats (not sure if useful) showing float usage of various instructions from game start until after the game has clearly gone wrong. Leftmost number is total floats processed. Then Infinity, NaN, negative zero, and subnormals/denormals. Since it really goes off a cliff at one point, I was thinking it's possible this is subnormal related... it doesn't ever set the flush to zero flag.
-[Unknown] |
I had tried some things before, but just wanted to note that I've tried forcing subnormal results to 0 (as always seems to happen with many vfpu ops) for vmul/vadd/vsub/vtfm3/vhtfm3/etc., as well as forcing nan to 0x7f800001. There was no change in the failure. I do think there's a good chance it's related to multiply accuracy. -[Unknown] |
Update: as a very rough measure, I tried Normally, things go wrong right before the second tunnel. With this change, things go wrong before the third tunnel, and it looks right for longer. So this is promising. Trying to dig into which instruction gets tougher, though. Just disabling the masking for one op at once:
A few other instructions didn't seem to matter, like vmmul or vdot. That said, obviously this doesn't implicate any of the above instructions - it could be that rounding at vsub masks a problem that is really in vmul, or even in vdot. The important bit here is that rounding/precision is almost definitely at issue here. For clarity, changing the rounding mode doesn't help things, so it's more complex than that. -[Unknown] |
I think that indeed confirms that precision/rounding is the culprit. Masking like that is not likely to accurately simulate the issues though, of course. I believe in the FTZ thing plus probably a slightly lower-precision dot product implemented in the VFPU hardware (in addition to approximations in vrot and similar). VTFM is very likely to use that hardware dot product. I think the dot product precision issues could be shown by trying things like dotting a=(1.0, 1.0, 1.0, 1.0) and b = (0.000001, 0.000001, 0.000001, 1.0), and the reverse of b with 1.0 first. The 0.000001 constant should be adjusted so that the sum of three of them just breaks into the precision that's still available when the exponent is set to be able to represent 1.0. That way, if the dot product summing uses collective mantissa alignment and then summing up the mantissas, we'd get the same results if the 1.0 was first or last or whereever, whereas if it's computed like we do by simply summing up the products from left to right, we should get different results. |
For posterity:
Since order doesn't matter, potentially it's aligning the exponents first and the summing. It'll be interesting to find if vhdp, vfad, vavg, or other ops have similar behavior. For clarity on anyone reading this, the first two above sums are (base 2):
Which becomes
Which all truncated as expected (was trying to verify any rounding behavior.) Also confirmed the behavior is identical (just with a flipped sign) if I flip the sign of the first vector (meaning it doesn't truncate differently for negative.) -[Unknown] |
Okay, using this: Which gets pretty good results (note: multiplying to a temporary
FWIW case 5 is (I sampled the most different results from Ridge Racer, and used them to debug the software float add): ScePspIVector4 dotsim5a = { 0x3f2dc5cb, 0x3e71855a, 0x3f3206af, 0x00000000 };
ScePspIVector4 dotsim5b = { 0xbf04a8ed, 0xbec6a2ff, 0x3f3b0f83, 0x00000000 };
testDot(" Simulate case 5", dotsim5a, dotsim5b); This changes the results. It goes differently wrong right before the second tunnel, but doesn't work out from there. Pretty sure we're barking up the right tree, because everything up to where it goes crazy was right and the same - and the goes crazy point acted differently. -[Unknown] |
Cool. It's possible though that this sequence is so sensitive that it won't work all the way through until we've fixed both the FTZ issue and gotten this even more accurate... Please as always feel free to push even very rough code to a branch or PR, would be interesting to try this on Tekken 6. Also by the way the BSR instruction (CLZ on ARM) will let us get rid of those annoying while loops in the software add. Additionally, floating point multiplication in software is actually even easier than addition since there's no realignment needed, just multiply the mantissas, shift down by a fixed amount, and add the exponents (with a bias to account for the 127 base). Also it's very likely that vhdp, vfad, vavg have similar issues since they almost certainly are reusing the vdot hardware, kind of like the prefix hack ops. |
Here's the branch so far: -[Unknown] |
It definitely is more accurate applying the same dot operation in vcrsp, though there's something odd happening with inf there. It affected Ridge Racer in probably a good way, but it still goes crazy a bit earlier than before. -[Unknown] |
So, it's probably not sqrt. I wrote a software sqrt, which matches vsqrt much better (sqrtf = exact match 3% of the time, vfpu_sqrt = exact match 84% of the time.) There was no change or improvement to the driving, though. It could be hiding in the remaining 16% (seems to be a rounding issue, but I can't figure out the right logic for it), but I'd have expected some improvement if the accuracy mattered. -[Unknown] |
Oops, had a stupid mistake disabling the sqrt. It does improve things. But it also mysteriously makes the game crash (well, it was before if it ran far enough without winning, but now it does it earlier...) -[Unknown] |
Okay, sorry for the many comments. Found the bug (max_exp == 0 vs max_exp <= 0) causing the crash, so now this is the version that gets the farthest: https://github.com/hrydgard/ppsspp/compare/master...unknownbrackets:vfpu-dot?expand=1 It still goes crazy eventually. Maybe it's the remaining 16% of sqrt - any ideas what might be wrong there? I tried rounding up or rounding even instead of masking, but maybe wrong... -[Unknown] |
Cool. But I don't think Ridge Racer is going to suddenly be fixed 100% after a single instruction is used - it's clear that its "physics" simulation uses a lot of different instructions and any of them can introduce a tiny error, which will get amplified over time and cause the simulation to fall out of sync with the replay data. It's not even certain that a single precision fix will cause the simulation results to be closer to the real thing (although as we fix more things, that does get more likely). And we still don't force FTZ on for VFPU instructions, which we really should if we don't just software emulate them all. Anyway, this is very good progress already even if Ridge Racer isn't fixed. Who knows what other games might be helped. Unfortunately this stuff is not easy to enable globally, for fear of slowdowns... |
Sure, of course. But there aren't that many instructions left unless it's FPU too. See the list. It's not like it uses sin/cos/etc. I assume Dissidia replays are affected by the same problem, but iirc they use a lot more VFPU instructions. Also, there's some masking already applying FTZ in that branch. But if you look above, Ridge Racer isn't really sending any subnormals through most of these instructions anyway. -[Unknown] |
Well there's vrot, vrsq and vdiv, and vsin and vcos are actually in the list you posted above? (actually never mind about the latter, I see you posted a revised list further down) |
the same thing happens on Ridge Racer 7 when played on RPCS3... The autodrive is also buggy.. and i also found another bug... My saved replays is starting to bug also... |
Yeah, tiny, tiny math inaccuracies can result in this kind of thing, no surprise it happens on RPCS3 as well. |
I noticed that when i use a cheat that will alter the car's performance on Ridge Racer, the AV Player CPU car's performance would also change .. So if someone makes a cheat code that will alter the cars performance, probably we would have no Algorithm bug... |
Nah, you can't conclude that. Your cheat will just be another input that will throw the algorithm off even more, while it's already definitely broken in other ways.... |
I tried replicating the replays and I broke my fingers halfway on SR765... |
Actually, i managed to replicate half of the Seaside Route 765 CPU replay where you drive a Blue Raggio while racing the Angelus... I actually screwed halfway when im supposed to trigger the 2nd NOS... The Raggio drifted on the turn that im not supposed to drift then the Angelus passed me... And since the Raggio is a Dynamic car, i cant control it properly.. Also, when replicating the replays, you got to be precise on the turns or the A.I. Opponents will mess your rhythm... Anyways, here are the 6 tracks with no CPU bugs whatsoever: Seaside Route 765: https://www.youtube.com/watch?v=kQyHEo4S4wg |
I tried to run Ridge Racer 6 on the Xenia emulator to test the AV Player, while hoping that it won't crash.. But despite Ridge Racer 6 just being an upgraded version of Ridge Racer PSP, I was surprised Ridge Racer 6 AV Player never bugged whatsoever... The course I played was called "Surfside Resort"... |
When i played this on Android, I noticed that JIT, IR Interpreter, and Interpreter executes the CPU Autodrive differently, causing different algorithms to happen... Try and list out the differences when using those 3 CPU Cores, and you might find that one mathematical error... |
The fact that it desyncs makes me sad because I kept watching those replays on my real PSP when the Wi-Fi dies, brown out, or if im getting bored after I finished the game... The desyncing kinda represents how this game series is getting forgotten because Ridge Racer 8 never got released and the fact that the bug is still here represents that the game got left unfinished and forgotten.... |
The xbox 360 and PSP have different CPUs, which is why they have different problems. The heart of this problem is math. Games use what are called "vector" or "simd" instructions to calculate math in speed critical situations and 3D formulas. If you look here, the Xbox 360 CPU had special modifications to do dot products on the CPU faster: https://en.wikipedia.org/wiki/Xbox_360_technical_specifications To help you understand, let's say I was adding up these two numbers: 6628451234 Google says the result is 7613177690, which is probably accurate. But what if someone did it by hand, and got it wrong? What if they thought it was 7613177609? It's a small difference, but the small differences add up - like a "hyperspace jump" in slightly the wrong direction. Some (but not all) of the PSP CPU's calculations were wrong - bad math. Crucially, unless we get the math wrong and get it wrong in exactly the same way - these replays won't play correctly. Xenia probably doesn't have this problem because the Xbox 360 got high marks on its maths. Just like a modern PC or a phone, it can add, multiply, divide, and subtract correctly. So there's no need to simulate the errors. Notably, it's probably the same reason again for RPCS3. The 7 SPEs also use inaccurate maths. The reason these calculations were wrong? Most likely speed, power, or cost. Doing math correctly might've required more silicon, more battery juice, or might've made games run slower. These errors are at the hardware level and we don't fully understand them. We don't know exactly how it calculates square roots, and what shortcuts it's using to get a close, but wrong, value. It's not that anyone doesn't care or wants to see the series dwindle by any means. Several people have spent hours debugging, working on, and trying to fix this very issue. -[Unknown] |
PPSSPP 1.11.3. Issue still persists. |
I discovered something odd with the desyncing replays. It wouldn't just bug out pre-recorded replays, it could also bug out your own replays. If you save a replay and then update or downgrade PPSSPP to another version, that replay may bug out and desync like the pre-recorded ones. I have some replays saved on an old PPSSPP version and the car just desyncs. Backtracking to an older version fixes the bug on some replays and some of them get fixed somehow. |
This is because we've made some updates to improve accuracy in some CPU instructions. It hasn't been enough to make the pre-recorded missions play correctly, but it means that recordings from previous versions no longer play the way they used to. This issue basically relies on specific and very accurate mathematical results, matching the same mathematical errors that the PSP CPU makes. Or at least, so we think. -[Unknown] |
Just an idea that I thought of just now but have not pursued on this: It could be that it isn't just accuracy, but that there's some actual bug in the math equation, but things work out as long as the replay replicates it because it's small. Specifically, I don't think anyone has ever checked if there is any suspicious vector overlap cases. There's been evidence to suggest that unlike PPSSPP's code, the actual VFPU doesn't guarantee overlap safety in all cases (and when it does, it seems to do so by performing operations in reverse order.) Probably not likely, but I have already tried flushing everything to zero, adjusting rounding modes, forcing things to the decently accurate vdot, etc. -[Unknown] |
Options -> AV player mode
https://www.youtube.com/watch?v=hRrVBM2-OWc
https://www.youtube.com/watch?v=XJkM729PeeE
https://www.youtube.com/watch?v=3eQ7BlocmUo
Ridge Racers - JP 1.01 / USA 1.00 / EUR 1.00 / HK 1.00 / Asia 1.00
The text was updated successfully, but these errors were encountered: