Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable -fstrict-overflow #96821

Open
matthiasgoergens opened this issue Sep 14, 2022 · 13 comments
Open

Enable -fstrict-overflow #96821

matthiasgoergens opened this issue Sep 14, 2022 · 13 comments
Labels
build The build process and cross-build performance Performance or resource usage type-feature A feature request or enhancement

Comments

@matthiasgoergens
Copy link
Contributor

matthiasgoergens commented Sep 14, 2022

At the moment we compile releases with -fwrapv which makes the code a bit safer, but disables certain optimizations. From the GCC docs:

This option instructs the compiler to assume that signed arithmetic overflow of addition, subtraction and multiplication wraps around using twos-complement representation. This flag enables some optimizations and disables others.

My experiments with running sanitisers seem to suggest that we are nearly already ready for -fno-wrapv (or -fstrict-overflow in general). Doing so could lead to quite a few speedups, but we would need to be more careful with the code we write.

It might be worthwhile to get a few benchmarks.

(To be extra precise, we give -fwrapv for clang and gcc for any build that doesn't get --with-pydebug.)

Pitch

My plan right now is to adapt the build system so that only the modules that need it are build with -fwrapv, and the rest can be build with -fstrict-overflow.

We already have config machinery that can add specific CFLAGS for specific modules only.

Perhaps the whole thing can be gated behind a configure flag, like --with-strict-overflow.

If everything goes well, and this improves performance we can consider adding this functionality to one of the standard optimization options.

We can also work on making more modules -fstrict-overflow safe.

Previous discussion

@markshannon @ericsnowcurrently

Brought up on faster-cpython/ideas#458 and inspired by #96678

Some previous issues around -fwrapv:

I'm sure there are more.

Progress so far

As far as is currently known, the three remaining modules that rely on defined integer overflow are fixed by:

Linked PRs

@ericsnowcurrently
Copy link
Member

This is definitely worth exploring!

@ericsnowcurrently
Copy link
Member

but we would need to be more careful with the code we write.

What do you mean by that?

More concretely:

  • what would need to be done (or not done) specifically?
  • would this mean adding to PEP 7?
  • what would that text say?
  • would we add some explanation to the devguide?

@ericsnowcurrently
Copy link
Member

Also, do clang and msvc have something similar?

@ericsnowcurrently
Copy link
Member

In gh-96823 I asked about detecting compatibility for the compile flag. This is probably the better place to discuss that.

Basically, how can we detect that a C file can be built with -fstrict-overflow or that it can't?

@matthiasgoergens
Copy link
Contributor Author

I ran clang's undefined behaviour sanitiser and address sanitiser over the test suite to detect the modules that need defined overflow.

Of course, strictly speaking a detection this way can only show the presence of the need for defined overflow, but it cannot show the absence of such a need.

But that's no different from any other bug or undefined behaviour, and we just rely on the additional assumption that our test suite has enough coverage.

Small complicating detail: when you use the conpiler flag to make signed integer overflow into defined behaviour, the sanitisers no longer complain about it.

So if you want to check if the flag is still necessary for a module, you need to temporarily turn it off.

@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 16, 2022

Clang definitely has this flag. We actually use clang for the sanitisers.

I don't know about msvc. There's no trace of anything like this in their docs https://docs.microsoft.com/en-us/cpp/build/reference/compiler-options-listed-by-category?view=msvc-170#code-generation

But we can just stick to the current situation here:

Only gcc and clang explicitly get flags about overflow, and for the other compilers we just hope and pray.

(The situation will actually improve for the other compilers, when we make all modules strict-overflow safe.)

For the actual implementation, I'll probably make configure check for the flags directly, instead of checking for the name and version of the compiler.

@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 17, 2022

I had a look at PEP 7. The relevant section is https://peps.python.org/pep-0007/#c-dialect

The PEP says to use C11 and doesn't mention any deviation from that for overflow. So our code should already avoid signed integer overflow. At this point that's more of an aspiration than reality, though.

So to some extent, we could describe the effort here as part of implementing PEP 7.

Strictly speaking we don't need to change anything, but I think pragmatically we should make a note that up to 3.11 signed integer overflow was tacitly tolerated, but that beginning from 3.12 we are sticking to the standard. I created a draft PR python/peps#2796 for how that could look like. (Please suggest better wording, if you have ideas.)

(Additionally, I also notice that when I am building I am getting a few warnings here or there, but the PEP suggests that we shouldn't be getting any warnings. Perhaps I'll spend a bit of time fixing the code (or suppressing these warnings in known instances that we can't or won't fix.))

@matthiasgoergens
Copy link
Contributor Author

Another thing: whatever policy we decide on, we should probably fix that --with-pydebug seemingly arbitrarily gives you a different policy.

@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 17, 2022

I just found some slightly bad news in the GCC docs:

Certain ABI-changing flags are required to match in all compilation units, and trying to override this at link time with a conflicting value is ignored. This includes options such as -freg-struct-return and -fpcc-struct-return.

Other options such as -ffp-contract, -fno-strict-overflow, -fwrapv, -fno-trapv or -fno-strict-aliasing are passed through to the link stage and merged conservatively for conflicting translation units. Specifically -fno-strict-overflow, -fwrapv and -fno-trapv take precedence; and for example -ffp-contract=off takes precedence over -ffp-contract=fast. You can override them at link time.

If I understand that right, specifying -fwrapv or -fno-strict-overflow for one C source file might essentially be enough to preclude optimizations for (almost?) everything.

So we might need to be completely clean before we can expect to get a performance benefit?

That's where we want to get to eventually anyway, but it would have been nice if a piecemeal approach worked.

Update: in experiments with clang and its undefined-behaviour sanitizer, the piecemeal approach seems to work. So at least for diagnostics it's good enough! I don't know about optimizations.

@matthiasgoergens
Copy link
Contributor Author

I extended #96823 to incorporate what I wrote above.

matthiasgoergens added a commit to matthiasgoergens/cpython that referenced this issue Sep 18, 2022
matthiasgoergens added a commit to matthiasgoergens/cpython that referenced this issue Sep 19, 2022
Left-shifting negative numbers is undefined behaviour.

Fortunately, multiplication works just as well, is defined behaviour,
and gets compiled to the same machine code as before by optimizing
compilers.
matthiasgoergens added a commit to matthiasgoergens/cpython that referenced this issue Sep 19, 2022
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 19, 2022
…nGH-96915)

* pythongh-96821: Assert for demonstrating undefined behaviour

* Fix UB

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
(cherry picked from commit cbdeda8)

Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
miss-islington pushed a commit to miss-islington/cpython that referenced this issue Sep 19, 2022
…nGH-96915)

* pythongh-96821: Assert for demonstrating undefined behaviour

* Fix UB

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
(cherry picked from commit cbdeda8)

Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
Fidget-Spinner pushed a commit that referenced this issue Sep 19, 2022
* gh-96821: Assert for demonstrating undefined behaviour

* Fix UB

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Fidget-Spinner pushed a commit that referenced this issue Sep 19, 2022
…H-96926)

* gh-96821: Assert for demonstrating undefined behaviour

* Fix UB

(cherry picked from commit cbdeda8)

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
Fidget-Spinner pushed a commit that referenced this issue Sep 19, 2022
…H-96927)

* gh-96821: Assert for demonstrating undefined behaviour

* Fix UB

(cherry picked from commit cbdeda8)

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 25, 2022

@mdickinson Could you please have a look at the other PRs mentioned in this issue?

There's only two left, and then we can turn on strict overflow for (hopefully) extra performance.

I went with implementation defined behaviour for those two PRs. But I'm happy to adopt your clever method for non-implementation defined behaviour, if you prefer that.

@matthiasgoergens
Copy link
Contributor Author

matthiasgoergens commented Sep 26, 2022

I ran pyperformance benchmarks on a branch that had all the patches applied and -fstrict-overflow enabled. Overall, we are getting about 1% speedup on GCC:

Benchmark 2022-09-25_09-55-main-f5f047aa628c 2022-09-25_10-17-strict_overflow_test-9e8007d7aae6
unpickle 12.2 us 11.2 us: 1.08x faster
fannkuch 346 ms 325 ms: 1.06x faster
pyflate 371 ms 353 ms: 1.05x faster
scimark_sor 95.3 ms 92.3 ms: 1.03x faster
nqueens 71.1 ms 68.9 ms: 1.03x faster
float 64.7 ms 62.7 ms: 1.03x faster
chaos 58.8 ms 57.3 ms: 1.03x faster
logging_format 6.47 us 6.30 us: 1.03x faster
json_loads 20.6 us 20.2 us: 1.02x faster
nbody 79.4 ms 77.7 ms: 1.02x faster
sqlalchemy_declarative 122 ms 120 ms: 1.02x faster
spectral_norm 86.2 ms 84.4 ms: 1.02x faster
logging_silent 83.3 ns 81.8 ns: 1.02x faster
genshi_text 19.1 ms 18.8 ms: 1.02x faster
telco 5.42 ms 5.33 ms: 1.02x faster
go 121 ms 119 ms: 1.02x faster
sympy_expand 429 ms 422 ms: 1.02x faster
logging_simple 5.73 us 5.65 us: 1.02x faster
scimark_fft 260 ms 256 ms: 1.02x faster
django_template 29.8 ms 29.4 ms: 1.01x faster
xml_etree_generate 73.1 ms 72.2 ms: 1.01x faster
xml_etree_process 50.2 ms 49.5 ms: 1.01x faster
pathlib 16.8 ms 16.6 ms: 1.01x faster
json_dumps 8.21 ms 8.11 ms: 1.01x faster
pickle_pure_python 253 us 250 us: 1.01x faster
raytrace 248 ms 245 ms: 1.01x faster
xml_etree_iterparse 87.9 ms 86.9 ms: 1.01x faster
tornado_http 105 ms 104 ms: 1.01x faster
scimark_monte_carlo 55.4 ms 54.8 ms: 1.01x faster
unpickle_pure_python 181 us 179 us: 1.01x faster
meteor_contest 90.3 ms 89.5 ms: 1.01x faster
regex_v8 17.9 ms 17.8 ms: 1.01x faster
dulwich_log 67.8 ms 67.3 ms: 1.01x faster
xml_etree_parse 129 ms 128 ms: 1.01x faster
sympy_integrate 18.9 ms 18.7 ms: 1.01x faster
mako 8.31 ms 8.27 ms: 1.01x faster
sympy_sum 158 ms 158 ms: 1.01x faster
pidigits 166 ms 165 ms: 1.01x faster
python_startup_no_site 6.72 ms 6.74 ms: 1.00x slower
python_startup 8.70 ms 8.75 ms: 1.00x slower
scimark_lu 96.9 ms 98.4 ms: 1.02x slower
regex_dna 147 ms 149 ms: 1.02x slower
crypto_pyaes 63.0 ms 64.4 ms: 1.02x slower
deltablue 2.93 ms 3.00 ms: 1.03x slower
regex_effbot 2.43 ms 2.51 ms: 1.04x slower
Geometric mean (ref) 1.01x faster

Benchmark hidden because not significant (16): scimark_sparse_mat_mult, regex_compile, sqlalchemy_imperative, html5lib, unpickle_list, pickle, chameleon, pickle_list, hexiom, genshi_xml, 2to3, richards, sympy_str, pickle_dict, unpack_sequence, sqlite_synth

mdickinson pushed a commit that referenced this issue Oct 10, 2022
* gh-96821: Fix undefined behaviour in `audioop.c`

Left-shifting negative numbers is undefined behaviour.

Fortunately, multiplication works just as well, is defined behaviour,
and gets compiled to the same machine code as before by optimizing
compilers.

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
mpage pushed a commit to mpage/cpython that referenced this issue Oct 11, 2022
* pythongh-96821: Fix undefined behaviour in `audioop.c`

Left-shifting negative numbers is undefined behaviour.

Fortunately, multiplication works just as well, is defined behaviour,
and gets compiled to the same machine code as before by optimizing
compilers.

Co-authored-by: blurb-it[bot] <43283697+blurb-it[bot]@users.noreply.github.com>
pablogsal pushed a commit that referenced this issue Oct 24, 2022
…H-96927)

* gh-96821: Assert for demonstrating undefined behaviour

* Fix UB

(cherry picked from commit cbdeda8)

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Matthias Görgens <matthias.goergens@gmail.com>
hauntsaninja added a commit that referenced this issue Mar 4, 2023
Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Shantanu <hauntsaninja@gmail.com>
@hauntsaninja
Copy link
Contributor

hauntsaninja commented Mar 4, 2023

I've fixed and merged #96823 . A follow up improvement could be to check for the compiler switch only if --with-strict-overflow was given, as suggested by Erlend

hugovk pushed a commit to hugovk/cpython that referenced this issue Mar 6, 2023
)

Co-authored-by: C.A.M. Gerlach <CAM.Gerlach@Gerlach.CAM>
Co-authored-by: Erlend E. Aasland <erlend.aasland@protonmail.com>
Co-authored-by: Adam Turner <9087854+AA-Turner@users.noreply.github.com>
Co-authored-by: Shantanu <hauntsaninja@gmail.com>
carljm added a commit to carljm/cpython that referenced this issue Mar 6, 2023
* main: (21 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
carljm added a commit to carljm/cpython that referenced this issue Mar 7, 2023
* main: (37 commits)
  pythongh-102192: Replace PyErr_Fetch/Restore etc by more efficient alternatives in sub interpreters module (python#102472)
  pythongh-95672: Fix versionadded indentation of get_pagesize in test.rst (pythongh-102455)
  pythongh-102416: Do not memoize incorrectly loop rules in the parser (python#102467)
  pythonGH-101362: Optimise PurePath(PurePath(...)) (pythonGH-101667)
  pythonGH-101362: Check pathlib.Path flavour compatibility at import time (pythonGH-101664)
  pythonGH-101362: Call join() only when >1 argument supplied to pathlib.PurePath() (python#101665)
  pythongh-102444: Fix minor bugs in `test_typing` highlighted by pyflakes (python#102445)
  pythonGH-102341: Improve the test function for pow (python#102342)
  Fix unused classes in a typing test (pythonGH-102437)
  pythongh-101979: argparse: fix a bug where parentheses in metavar argument of add_argument() were dropped (python#102318)
  pythongh-102356: Add thrashcan macros to filter object dealloc (python#102426)
  Move around example in to_bytes() to avoid confusion (python#101595)
  pythonGH-97546: fix flaky asyncio `test_wait_for_race_condition` test (python#102421)
  pythongh-96821: Add config option `--with-strict-overflow` (python#96823)
  pythongh-101992: update pstlib module documentation (python#102133)
  pythongh-63301: Set exit code when tabnanny CLI exits on error (python#7699)
  pythongh-101863: Fix wrong comments in EUC-KR codec (pythongh-102417)
  pythongh-102302 Micro-optimize `inspect.Parameter.__hash__` (python#102303)
  pythongh-102179: Fix `os.dup2` error reporting for negative fds (python#102180)
  pythongh-101892: Fix `SystemError` when a callable iterator call exhausts the iterator (python#101896)
  ...
@iritkatriel iritkatriel added the build The build process and cross-build label Nov 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build The build process and cross-build performance Performance or resource usage type-feature A feature request or enhancement
Projects
None yet
Development

No branches or pull requests

5 participants