Implement a few x86 vertexjit optimizations #9674

unknownbrackets · 2017-05-08T14:08:11Z

Micro-benchmarking results from unittest in commit messages - one only for SSE 4.1.

-[Unknown]

This was commented out, but works fine and goes from 320% -> 450% the speed of non-jit for simple pos/col verts.

This was actually 270% -> 340% non-jit for pos-only verts.

This takes it from 150% to 390% non-jit for pos only verts.

zminhquanz · 2017-05-08T14:44:26Z

Does it perf improvement on SSSE 3 and SSE 3 ?

unknownbrackets · 2017-05-08T15:30:01Z

Yes, but not the s16 pos throughmode change (which only works for SSE 4.1.) The rest are SSE2+.

Note that these improvements are only measured for vertex decoding. This is only a percentage of the overall rendering process, so don't expect a huge impact in most games.

-[Unknown]

hrydgard · 2017-05-08T17:38:11Z

If you have Haswell or better (BMI2 extensions), it's possible to do an even faster 4444->8888 through PDEP:

(color in eax)
pdep ebx, eax, 0xF0F0F0F0
pdep ecx, eax, 0x0F0F0F0F
or ebx, ecx

unknownbrackets · 2017-05-08T18:47:12Z

I'm still rocking my ivy, will probably upgrade at some point but it's been serving me quite well (aside from lacking AVX2 and BMI.) Definitely some cool things can be done there.

-[Unknown]

hrydgard · 2017-05-08T20:16:57Z

Still rocking an Ivy as a main machine as well, but I've got a more modern CPU in the laptop. It's fast enough though, so not a very important optimization. PDEP/PEXT are really cool instructions though, I'm sure they have more interesting uses.

iOS4all · 2017-05-08T20:26:12Z

Is this fix related to iOS micro stutter or not?
Thanks.

unknownbrackets · 2017-05-08T20:41:49Z

No. This only applies to x86 (Intel) CPUs, which are not used in any iOS devices.

-[Unknown]

iOS4all · 2017-05-09T03:26:56Z

Oh.... ok thanks

unknownbrackets added 3 commits May 7, 2017 12:14

x86: More optimal 4444 in vertexjit.

b06e271

This was commented out, but works fine and goes from 320% -> 450% the speed of non-jit for simple pos/col verts.

x86: Minor memory copy perf improvement.

0fe927a

This was actually 270% -> 340% non-jit for pos-only verts.

x86: Minor optimization for s16 through positions.

dacb776

This takes it from 150% to 390% non-jit for pos only verts.

hrydgard merged commit f06daba into hrydgard:master May 8, 2017

unknownbrackets deleted the vertexjit branch May 8, 2017 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a few x86 vertexjit optimizations #9674

Implement a few x86 vertexjit optimizations #9674

unknownbrackets commented May 8, 2017

zminhquanz commented May 8, 2017 •

edited

Loading

unknownbrackets commented May 8, 2017

hrydgard commented May 8, 2017

unknownbrackets commented May 8, 2017 •

edited

Loading

hrydgard commented May 8, 2017

iOS4all commented May 8, 2017

unknownbrackets commented May 8, 2017

iOS4all commented May 9, 2017

Implement a few x86 vertexjit optimizations #9674

Implement a few x86 vertexjit optimizations #9674

Conversation

unknownbrackets commented May 8, 2017

zminhquanz commented May 8, 2017 • edited Loading

unknownbrackets commented May 8, 2017

hrydgard commented May 8, 2017

unknownbrackets commented May 8, 2017 • edited Loading

hrydgard commented May 8, 2017

iOS4all commented May 8, 2017

unknownbrackets commented May 8, 2017

iOS4all commented May 9, 2017

zminhquanz commented May 8, 2017 •

edited

Loading

unknownbrackets commented May 8, 2017 •

edited

Loading