Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasm: prefer pmin/pmax #361

Merged
merged 1 commit into from
Dec 2, 2023
Merged

wasm: prefer pmin/pmax #361

merged 1 commit into from
Dec 2, 2023

Conversation

myfreeer
Copy link
Contributor

@myfreeer myfreeer commented Dec 2, 2023

According to emscripten and v8, [f32x4|f64x2].[min|max] compiles to much more instructions than [f32x4|f64x2].[pmin|pmax]. It is defined in spec that the difference between pmin/pmax and min/max is NaN-propagating behavior, and the equivalent to the x86 _mm_min_ps/_mm_max_ps is pmin/pmax in v8. This should make functions with min/max faster on webassembly, and align with the existing behavior with x86 sse.

According to [emscripten](https://emscripten.org/docs/porting/simd.html) and [v8](https://github.com/v8/v8/blob/b6520eda5eafc3b007a5641b37136dfc9d92f63d/src/compiler/backend/x64/code-generator-x64.cc#L2661-L2699), `[f32x4|f64x2].[min|max]` compiles to much more instructions than `[f32x4|f64x2].[pmin|pmax]`.
It is defined in [spec](https://github.com/WebAssembly/spec/blob/main/proposals/simd/SIMD.md#floating-point-min-and-max) that the difference between pmin/pmax and min/max is NaN-propagating behavior, and the equivalent to the x86 `_mm_min_ps`/`_mm_max_ps` is pmin/pmax in [v8](https://github.com/v8/v8/blob/b6520eda5eafc3b007a5641b37136dfc9d92f63d/src/compiler/backend/x64/code-generator-x64.cc#L2740-L2747).
This should make functions with min/max faster on webassembly, and align with the existing behavior with x86 sse.
@recp recp merged commit 9b26aff into recp:master Dec 2, 2023
33 checks passed
@recp
Copy link
Owner

recp commented Dec 2, 2023

@myfreeer many thanks for contributions for WASM support and its stability, the PR is merged 🚀

@myfreeer myfreeer deleted the myfreeer-patch-4 branch December 2, 2023 09:22
@gottfriedleibniz
Copy link

gottfriedleibniz commented Dec 20, 2023

Should we aim for consistent Nan-propagating behaviour here? The Intel ISA states:

If only one value is a NaN (SNaN or QNaN) for this instruction, the second operand (source operand), either a NaN or a valid floating-point value, is written to the result.

From the V8 link above:

The maxps instruction doesn't propagate NaNs and +0's in its first operand

Therefore to match _mm_max_ps(a, b): wasm_f32x4_pmax(a, b) should be wasm_f32x4_pmax(b, a). For example, consider:

vec4 a, b, c;

glm_vec4_broadcast(NAN, a);
glm_vec4_broadcast(0.0, b);
glm_vec4_maxv(a, b, c);

printf("%f %f %f %f\n", c[0], c[1], c[2], c[3]);

// SSE: 0.000000 0.000000 0.000000 0.000000
// WASM: nan nan nan nan
// ARM: Another can of worms

Generalizing the above: should these functions follow x86 behaviour, IEEE behaviour, or just aim for maximum performance ignoring these edge-cases.

@recp
Copy link
Owner

recp commented Dec 22, 2023

@gottfriedleibniz many thanks for catching edge-cases like this and help to fix them.

I think we can wrap min/max by glmm_ e.g. glmm_min() and glmm_max() to make behaviors identical

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants