Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gh-117657: Enable TSAN check on macOS with free-threads build #120502

Open
wants to merge 7 commits into
base: main
Choose a base branch
from

Conversation

aisk
Copy link
Member

@aisk aisk commented Jun 14, 2024

There are some macOS only races, for example #120500, so I think maybe we should enable the test for macOS on GHA.

Another thing is that two TSAN tests will hang: #120696, so we should skip them with TSAN on macOS.

@aisk aisk marked this pull request as draft June 14, 2024 13:12
@aisk aisk marked this pull request as ready for review June 18, 2024 14:35
@aisk aisk changed the title Enable TSAN check on macOS with free-threads build gh-117657: Enable TSAN check on macOS with free-threads build Jun 18, 2024
@aisk aisk added the skip news label Jun 18, 2024
@@ -500,6 +500,7 @@ jobs:
options: ./configure --config-cache --disable-gil --with-thread-sanitizer --with-pydebug
suppressions_path: Tools/tsan/suppressions_free_threading.txt
tsan_logs_artifact_name: tsan-logs-free-threading
os-matrix: '["ubuntu-22.04", "macos-14"]'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid queuing up too many jobs for the limited number of macOS builders, we'd want to use macos-14 for forks, but Cirrus for upstream builds.

See elsewhere in this file for examples, also needs changes in the reusable workflow.

Copy link
Contributor

@colesbury colesbury left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ambivalent about adding another TSan job. I'm not sure it's currently worth the extra CI resources and #120500 doesn't really seem like it uncovered a new race -- it might be due to the different version of clang or just a slightly different call stack that wasn't covered by the suppressions file.

Comment on lines +78 to +80
if support.check_sanitizer(thread=True) and sys.platform == 'darwin':
# See: gh-120696
raise unittest.SkipTest("this test will hang on macOS with TSAN")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is hanging in faulthandler, that probably means the test is crashing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Faulthandler hanging is definitely not great, but even if faulthandler is fixed I expect there will still be a problem with this test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests are just test the mallocator without the GIL, and intented to be crashed, so I think they are ok for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, that makes sense -- I missed that.

@aisk
Copy link
Member Author

aisk commented Jun 19, 2024

I'm ambivalent about adding another TSan job. I'm not sure it's currently worth the extra CI resources and #120500 doesn't really seem like it uncovered a new race -- it might be due to the different version of clang or just a slightly different call stack that wasn't covered by the suppressions file.

I have another idea. We can just modified the code to temporarily skip these two problematic tests and run the TSAN checks on macOS locally. What do you think?

@colesbury
Copy link
Contributor

Yeah, I think we should be able to successfully run TSan tests locally on macOS.

There still seems to be some underlying bugs that we should fix. As you pointed out above, the test (in a subprocess) is supposed to crash because it calls PyMem_Malloc without holding the GIL / not having a valid thread state (the equivalent in the free-threaded build). The test runs:

PYTHONMALLOC=debug ./python -c "import _testcapi; _testcapi.pymem_malloc_without_gil()"

(I found it easier to run that directly under gdb or lldb than running the test case)

  • On Linux in the free-threaded build, the faulthandler itself crashes in _Py_DumpExtensionModules because it's calling PyDict_Next without a valid thread state. The test still passes because it's already printed out some of the error message.
  • On macOS, the faulthandler crash leads to a hang. I'm not sure why. I thought it might be re-entrantly triggering the faulthandler, but it doesn't look like that.

@aisk
Copy link
Member Author

aisk commented Jun 23, 2024

Hi @colesbury, I opened #120900 just to skip these two tests on macOS.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants