Autovectorization support #84

penzn · 2022-08-09T23:02:02Z

We discussed it in latest SIMD meeting, wanted to get @tlively's perspective.

Compilers have -ffast-math flag, which at least partially fits the bill for supporting relaxed SIMD operations.

Fast math allows some variability of result, but would including both Arm and x86 outputs be too much variability? Maybe we can limit what instructions we support.
Or would it be necessary to generate platform detection code to reduce variability?

The text was updated successfully, but these errors were encountered:

tlively · 2022-08-10T22:26:02Z

I don't think there's any need to restrict the -ffast-math optimizations we provide for users who opt-in to using that flag. In other words, we should apply the same aggressive optimizations available to other targets. If a user is not getting the results they need, then their program was not well-specified enough and the fix should be in the user's code, not in the compiler.

sunfishcode · 2022-08-10T22:44:37Z

There is a fundamental difference between -ffast-math on native targets and -ffast-math on wasm. On native targets, one compiles with -ffast-math and can then test the output, and trust that it'll continue to behave as tested, because all the nondeterminism related to floating-point has been resolved. On wasm, Developers may test their code on their local machine with a particular wasm engine, and it may work for them, and their users may have machines with different architectures and different wasm engines, where it may not work.

So it is worth considering restricting the -ffast-math optimizations.

penzn · 2022-08-12T01:25:17Z

Sure, fast math should change the output even with MVP, but the change would at least be portable. Here we are going away from that, and I want to understand the consequences, especially for things producer can and cannot do.

There are broadly two types of instructions in this proposal:

Most introduce non-determinism w.r.t out-of-range values (what happens with OOB lane indices and such(
Instructions that actually affect precision, namely qfma, dot, and bfloat ops if added

fmin/fmax might be counted in the second category because they have drastic enough differences in output thanks to opposite approaches to NaNs in the two major architectures.

Technically first category doesn't affect precision, and theoretically fast math transformations should have the same effect as in core Wasm, as long they are not dependent on out-of-range semantics.

For the second group it has been proposed to use platform detection as mitigation, though I am not sure if that would get us all the way back to stable. Thoughts?

tlively · 2022-08-12T18:28:06Z

There is a fundamental difference between -ffast-math on native targets and -ffast-math on wasm. On native targets, one compiles with -ffast-math and can then test the output, and trust that it'll continue to behave as tested, because all the nondeterminism related to floating-point has been resolved.

This is fundamentally different from native targets in the same way relaxed-simd itself is fundamentally different from native architectures, though. I would think that by opting into relaxed-simd, the user would be opting into accepting additional testing burden since the nondeterminism has explicitly not been resolved. I can see that this would be inconvenient for the user, but they can always choose not to use relaxed-simd or not use -ffast-math, so I don't see that's it's worth doing anything special in the tools here.

@penzn, I don't quite understand the distinction you're trying to draw between those two groups of instructions. Either way, the compiler should be free to perform instruction selection in any way that matches the specified semantics (no matter how loose or strict they might be) of the input program. Baking platform detection into compilers seems complicated and undesirable.

sunfishcode · 2022-08-15T16:08:32Z

This is fundamentally different from native targets in the same way relaxed-simd itself is fundamentally different from native architectures, though. I would think that by opting into relaxed-simd, the user would be opting into accepting additional testing burden since the nondeterminism has explicitly not been resolved. I can see that this would be inconvenient for the user, but they can always choose not to use relaxed-simd or not use -ffast-math, so I don't see that's it's worth doing anything special in the tools here.

Do you envision relaxed-simd will be automatically enabled by -ffast-math, or will it always remain a separate opt-in?

tlively · 2022-08-15T21:39:23Z

I expect it would be a separate opt-in via -mrelaxed-simd to explicitly enable the target feature.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Autovectorization support #84

Autovectorization support #84

penzn commented Aug 9, 2022

tlively commented Aug 10, 2022

sunfishcode commented Aug 10, 2022

penzn commented Aug 12, 2022

tlively commented Aug 12, 2022

sunfishcode commented Aug 15, 2022

tlively commented Aug 15, 2022

Autovectorization support #84

Autovectorization support #84

Comments

penzn commented Aug 9, 2022

tlively commented Aug 10, 2022

sunfishcode commented Aug 10, 2022

penzn commented Aug 12, 2022

tlively commented Aug 12, 2022

sunfishcode commented Aug 15, 2022

tlively commented Aug 15, 2022