Support for CPUs which do not have SSE2 extensions? #3118

strega-nil-ms · 2022-09-21T18:00:35Z

Currently, on x86, we support /arch:IA32, and build our separately compiled sources with SSE2 support disabled. Is this still necessary, or can we allow ourselves to assume SSE2 hardware?

Notes:

XP and Vista support has been dropped
Windows 7 supported CPUs without SSE2 in its initial release, but dropped support in a 2018 update
Windows 8 has never supported CPUs without SSE2
SSE2 doesn't work in 32-bit kernels (this is a problem as long as we support Windows 10)

The text was updated successfully, but these errors were encountered:

barcharcraz · 2022-09-21T18:04:14Z

Given /arch:IA32 is the default on x86 we probably have to support that, however we may be able to get away with building the DLLs with SSE2

Alcaro · 2022-09-21T18:05:26Z

No, /arch:IA32 is not the default on x86. Proof: This code gives different results on IA32 vs no flags. (It also proves that IA32 automatically promotes every float32 to float64 before doing any math.) https://godbolt.org/z/Pv7Go5Te8

The relevant 2018 update is https://support.microsoft.com/en-us/topic/may-8-2018-kb4103718-monthly-rollup-c4c01989-faca-af5f-46f4-2bdc2d0171fd.

AlexGuteniev · 2022-09-21T18:31:53Z

Might be an issue for 32-bit kernel mode usage.

barcharcraz · 2022-09-21T19:05:36Z

No, /arch:IA32 is not the default on x86. Proof: This code gives different results on IA32 vs no flags. (It also proves that IA32 automatically promotes every float32 to float64 before doing any math.) https://godbolt.org/z/Pv7Go5Te8

The relevant 2018 update is https://support.microsoft.com/en-us/topic/may-8-2018-kb4103718-monthly-rollup-c4c01989-faca-af5f-46f4-2bdc2d0171fd.

You're right, although the floating-point difference is that /arch:IA32 uses x87 floating point instructions, which are 80-bit

CaseyCarter · 2022-09-21T19:44:23Z

From https://learn.microsoft.com/en-us/cpp/build/reference/arch-x86?view=msvc-170:

/arch:SSE2
Enables the use of SSE2 instructions. This option is the default instruction set on x86 platforms if no /arch option is specified.

StephanTLavavej · 2022-09-21T21:54:35Z

We talked about this at the weekly maintainer meeting - although the potentially affected set of users is extremely small, if installing an updated redist caused code to fail at runtime, that would be very severe. In general, we have very little code affected by /arch:IA32 / the availability of SSE2 (from a quick scan, it's Special Math, vectorized algorithms, and the __vectorcall calling convention), so the benefits of making such a general change would be relatively small (e.g. in comparison to dropping Vista support which allowed us to remove a massive amount of code and significant runtime logic for Win7+ users).

However, Special Math is a special case - that is implemented in a separate "satellite DLL", and @strega-nil-ms has found that the availability of SSE2 impacts its precision (and presumably its performance). @CaseyCarter noted that we could change just the Special Math satellite DLL to use SSE2, which would be an extremely safe change - only programs actually using Special Math would be affected, as it is a pure leaf of the STL, and this satellite DLL was added relatively recently (VS 2017) so it is extraordinarily unlikely that machines with ancient processors are running code that uses this.

Note: such a change would need to happen in both the GitHub/CMake and internal/MSBuild build systems.

AlexGuteniev · 2022-09-22T07:34:28Z

Does building Special Math with /fp:strict or /fp:precise fix the precision issue?

strega-nil-ms · 2022-10-03T21:39:58Z

@AlexGuteniev no, we already build with /fp:strict, and that doesn't really have anything to do with why the result is different on non-SSE2 chips. The implementation of the special math functions does quite a bit of logic, and that logic is (necessarily) different on machines without SSE2, and on machines with SSE2.

StephanTLavavej added decision needed We need to choose something before working on this affects redist Results in changes to separately compiled bits performance Must go faster and removed decision needed We need to choose something before working on this labels Sep 21, 2022

StephanTLavavej mentioned this issue Feb 7, 2024

Should we require SSE2? #3922

Closed

StephanTLavavej mentioned this issue Jun 20, 2024

Build the x86 STL with /arch:SSE2 instead of /arch:IA32 #4741

Merged

StephanTLavavej closed this as completed in #4741 Jun 21, 2024

StephanTLavavej added the fixed Something works now, yay! label Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for CPUs which do not have SSE2 extensions? #3118

Support for CPUs which do not have SSE2 extensions? #3118

strega-nil-ms commented Sep 21, 2022 •

edited

Loading

barcharcraz commented Sep 21, 2022

Alcaro commented Sep 21, 2022 •

edited

Loading

AlexGuteniev commented Sep 21, 2022

barcharcraz commented Sep 21, 2022

CaseyCarter commented Sep 21, 2022

StephanTLavavej commented Sep 21, 2022

AlexGuteniev commented Sep 22, 2022

strega-nil-ms commented Oct 3, 2022

Support for CPUs which do not have SSE2 extensions? #3118

Support for CPUs which do not have SSE2 extensions? #3118

Comments

strega-nil-ms commented Sep 21, 2022 • edited Loading

barcharcraz commented Sep 21, 2022

Alcaro commented Sep 21, 2022 • edited Loading

AlexGuteniev commented Sep 21, 2022

barcharcraz commented Sep 21, 2022

CaseyCarter commented Sep 21, 2022

StephanTLavavej commented Sep 21, 2022

AlexGuteniev commented Sep 22, 2022

strega-nil-ms commented Oct 3, 2022

strega-nil-ms commented Sep 21, 2022 •

edited

Loading

Alcaro commented Sep 21, 2022 •

edited

Loading