Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework D3D9 float emulation #2294

Closed
wants to merge 8 commits into from
Closed

Rework D3D9 float emulation #2294

wants to merge 8 commits into from

Conversation

doitsujin
Copy link
Owner

@doitsujin doitsujin commented Sep 14, 2021

Replaces almost every occurence of a * b with (b == 0 ? 0 : a) * (a == 0 ? 0 : b) if we know that both operands may be zero. I've omitted the TexBem instructions for now since I don't know what those do exactly and if there are any NaNs to filter out there, but everythig else should be handled.

The existing d3d9.floatEmulation option still works, and setting it to false effectively disables this.

Fixes stuff like #2107, but will obviously need a fair bit of testing.

@DadSchoorse
Copy link
Contributor

This really needs some work in the drivers for low end hardware, I'm losing up to 20% performance on my renoir apu.

@pendingchaos
Copy link
Contributor

Not sure if it matters but (b == 0 ? 0 : a) * (a == 0 ? 0 : b) is -0.0 for a,b=-0.0,-1.0, unlike AMD's legacy multiplication opcode, which is 0.0 IIRC

@doitsujin
Copy link
Owner Author

doitsujin commented Sep 14, 2021

I don't think there's anything in D3D9 that relies on signed zeroes being correct. D3D9 does not follow IEEE rules.

Does the instruction always return positive zero or does it return the correct sign? That's not clear to me right now.

@doitsujin
Copy link
Owner Author

We can change this to (b == 0 ? b : a) * (a == 0 ? a : b) if that more closely emulates v_mul_legacy behaviour.

@misyltoad
Copy link
Collaborator

fwiw we do not enabled Signed Zero Preserve in D3D9

@pendingchaos
Copy link
Contributor

pendingchaos commented Sep 14, 2021

Does the instruction always return positive zero or does it return the correct sign? That's not clear to me right now.

I tested this a while ago and IIRC it's always positive zero if either source is negative or positive zero (EDIT: checked again, and -0.0 * +0.0 is +0.0)

We can change this to (b == 0 ? b : a) * (a == 0 ? a : b) if that more closely emulates v_mul_legacy behaviour.

Doesn't work with -0.0 * 0.0, but it probably doesn't matter if you don't enable Signed Zero Preserve. RADV should be able to optimize either if it's not enabled

@DadSchoorse
Copy link
Contributor

This is probably more a theoretical problem, but doesn't this need SignedZeroInfNanPreserve to ensure correctness? Without it it's allowed to optimize (b == 0 ? 0 : a) * (a == 0 ? 0 : b) to a * b, I think.

@pendingchaos
Copy link
Contributor

This is probably more a theoretical problem, but doesn't this need SignedZeroInfNanPreserve to ensure correctness? Without it it's allowed to optimize (b == 0 ? 0 : a) * (a == 0 ? 0 : b) to a * b, I think.

I don't think that's a valid optimization for the compiler to do. If one of the operands is 0 and the other is nan/inf, the expression has no multiplication by nan/inf

@DadSchoorse
Copy link
Contributor

Okay, you're the compiler expert not me. 😄 My understanding of the default vulkan floating point rules is that the compiler is allowed to assume that no arguments are NaN or Inf tho.

@K0bin K0bin added the d3d9 label Sep 17, 2021
@pendingchaos
Copy link
Contributor

nmin(abs(a), abs(b)) == 0.0 ? 0.0 : a * b might be faster on devices where min can take two abs modifiers (most devices?), though the size of the spirv might be larger. It should also match v_mul_legacy_f32 exactly

@pendingchaos
Copy link
Contributor

I think this (work in progress) branch should mostly restore RADV performance: https://gitlab.freedesktop.org/pendingchaos/mesa/-/commits/radv_zerowins_misc

I haven't actually tested this with DXVK though

break;
case DxsoOpcode::Rsq:
result.id = m_module.opFAbs(typeId,
emitRegisterLoad(src[0], mask).id);

result.id = m_module.opInverseSqrt(typeId,
result.id);

if (m_moduleInfo.options.d3d9FloatEmulation) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we keep the old path around as an option by default for now until we get support for this in a stable release of Mesa and on NV?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, I'd like a way to turn this off, we don't rely on this behaviour in the DXVK Native titles at all.

Copy link
Owner Author

@doitsujin doitsujin Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the way to turn this off is disabling d3d9FloatEmulation, just like before?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't want to have to ship two different code paths and enable the broken one by default. It'll just lead to a maintenance shitshow where we have to keep track of every single game that runs into the problem, and if we enable this conditionally based on the driver version or whatever it'll just lead to weird bug reports.

We can wait until the mesa optimization lands I guess, but once that happens I'd rather just have things work out of the box.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then what do we do about Intel?

Copy link
Owner Author

@doitsujin doitsujin Oct 16, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ignore and move on?

If their hardware has no way to support mul_legacy other than some global flag that we can't enable then there's not much we can do on our end.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They support flushing NaN/INF to FLT_MAX like we emulate rn.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sigh, fine, maintenance shitshow it is. I still hate of having to default to an option that is literally broken though.

@DadSchoorse
Copy link
Contributor

I think this (work in progress) branch should mostly restore RADV performance: https://gitlab.freedesktop.org/pendingchaos/mesa/-/commits/radv_zerowins_misc

I haven't actually tested this with DXVK though

These patches fully restore performance in my handful of test cases.

@K0bin
Copy link
Collaborator

K0bin commented Dec 5, 2021

Merged with #2359

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants