-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a few x86 vertexjit optimizations #9674
Conversation
This was commented out, but works fine and goes from 320% -> 450% the speed of non-jit for simple pos/col verts.
This was actually 270% -> 340% non-jit for pos-only verts.
This takes it from 150% to 390% non-jit for pos only verts.
Does it perf improvement on SSSE 3 and SSE 3 ? |
Yes, but not the s16 pos throughmode change (which only works for SSE 4.1.) The rest are SSE2+. Note that these improvements are only measured for vertex decoding. This is only a percentage of the overall rendering process, so don't expect a huge impact in most games. -[Unknown] |
If you have Haswell or better (BMI2 extensions), it's possible to do an even faster 4444->8888 through PDEP:
|
I'm still rocking my ivy, will probably upgrade at some point but it's been serving me quite well (aside from lacking AVX2 and BMI.) Definitely some cool things can be done there. -[Unknown] |
Still rocking an Ivy as a main machine as well, but I've got a more modern CPU in the laptop. It's fast enough though, so not a very important optimization. PDEP/PEXT are really cool instructions though, I'm sure they have more interesting uses. |
Is this fix related to iOS micro stutter or not? |
No. This only applies to x86 (Intel) CPUs, which are not used in any iOS devices. -[Unknown] |
Oh.... ok thanks |
Micro-benchmarking results from unittest in commit messages - one only for SSE 4.1.
-[Unknown]