Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] gh-123358: Update LOAD_DEREF to use stackref and atomic incref #124463

Draft
wants to merge 16 commits into
base: main
Choose a base branch
from

Conversation

corona10
Copy link
Member

@corona10 corona10 commented Sep 24, 2024

@corona10 corona10 changed the title gh-123358: Update LOAD_DEREF to use stackref and atomic incref [WIP] gh-123358: Update LOAD_DEREF to use stackref and atomic incref Sep 24, 2024
@corona10
Copy link
Member Author

@Fidget-Spinner I am still working on :) I will ping you once I am ready!

@corona10 corona10 changed the title [WIP] gh-123358: Update LOAD_DEREF to use stackref and atomic incref gh-123358: Update LOAD_DEREF to use stackref and atomic incref Sep 24, 2024
@corona10 corona10 marked this pull request as ready for review September 24, 2024 22:45
{
PyObject *value;
#ifdef Py_GIL_DISABLED
value = _Py_atomic_load_ptr(&cell->ob_ref);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury Let's extract the utility function with separated PRs.

@corona10 corona10 force-pushed the gh-123358 branch 2 times, most recently from ec7594a to f573fa2 Compare September 25, 2024 04:28
@corona10 corona10 marked this pull request as draft September 25, 2024 04:39
@corona10 corona10 changed the title gh-123358: Update LOAD_DEREF to use stackref and atomic incref [WIP] gh-123358: Update LOAD_DEREF to use stackref and atomic incref Sep 25, 2024
Copy link
Member Author

@corona10 corona10 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Fidget-Spinner
Ken, Would you like to take a look at overall approach for both stack effect and stackref side?

I am not sure that I do correctly so before checking why Windows free-threaded is failed, I would like to get rough review.
Thank you for understanding.

Comment on lines 22 to 23
FT_ATOMIC_STORE_PTR_RELEASE(old_value, cell->ob_ref);
FT_ATOMIC_STORE_PTR_RELEASE(cell->ob_ref, value);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little suspicious. Why do we need the atomics when there's the critical section?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me check,, yeah we may don't need this.

Copy link
Member Author

@corona10 corona10 Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SUMMARY: ThreadSanitizer: data race /home/runner/work/cpython/cpython/./Include/internal/pycore_cell.h:23:18 in PyCell_SwapTakeRef

Maybe TSAN thought that they could be memory ordering issue.

@colesbury, Do you think that this would be false alarm?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need the second atomic write here (to cell->ob_ref), but not the first one. In other words, something like:

old_value = cell->ob_ref;
FT_ATOMIC_STORE_PTR_RELEASE(cell->ob_ref, value);

Although the write happens in the critical section, the read happens (with this PR) outside of the lock. This is the same pattern we use in dictionaries :

#define STORE_KEY(ep, key) FT_ATOMIC_STORE_PTR_RELEASE(ep->me_key, key)
#define STORE_VALUE(ep, value) FT_ATOMIC_STORE_PTR_RELEASE(ep->me_value, value)

@corona10
Copy link
Member Author

I suspect that there would be a deadlock issue at the Windows build. I am under investigation.
But before that, I would like to know there is any wrong code about stack effect and stack ref.

@Fidget-Spinner
Copy link
Member

But before that, I would like to know there is any wrong code about stack effect and stack ref.

Don't think so.

@@ -41,6 +41,8 @@ race_top:tstate_is_freed
race_top:type_modified_unlocked
race_top:write_thread_id
race_top:PyThreadState_Clear
# see: https://github.com/python/cpython/issues/117721
race_top:lock_PyThread_release_lock
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@colesbury found that there is a possible issue from

def wait_phase(self, phase, expected):
for _ in support.sleeping_retry(support.SHORT_TIMEOUT):
if len(phase) >= expected:
break
self.assertEqual(len(phase), expected)
which cause this TSAN error. We should fix it with a separate PR.

@corona10 corona10 force-pushed the gh-123358 branch 2 times, most recently from 5f4e42c to d527e0a Compare September 27, 2024 07:18
Copy link
Member

@markshannon markshannon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need to use the extra indirection of passing a pointer to a stack ref

@@ -42,6 +43,31 @@ PyCell_GetRef(PyCellObject *cell)
return res;
}

static inline void
_PyCell_GetStackRef(PyCellObject *cell, _PyStackRef *value_addr)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
_PyCell_GetStackRef(PyCellObject *cell, _PyStackRef *value_addr)
_PyStackRef _PyCell_GetStackRef(PyCellObject *cell)

I think that this can return the new _PyStackRef directly instead of via pointer, since there can be no garbage collection in this function.

@@ -1647,14 +1647,13 @@ dummy_func(
value = PyStackRef_FromPyObjectSteal(value_o);
}

inst(LOAD_DEREF, ( -- value)) {
inst(LOAD_DEREF, ( -- value[1])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This can be left as scalar, for the reason given above.

@@ -16,7 +16,6 @@ extern "C" {
#include "pycore_pystate.h" // _PyInterpreterState_GET()
#include "pycore_typeid.h" // _PyType_IncrefSlow


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if in _Py_TryIncRefShared we could just do an atomic add to the ob_ref_shared if the object is already marked as weakref'd? E.g:

Py_ssize_t shared = _Py_atomic_load_ssize_relaxed(&op->ob_ref_shared);
for (;;) {
    if ((shared & _Py_REF_SHARED_FLAG_MASK) == _Py_REF_MAYBE_WEAKREF) {
        _Py_atomic_add_ssize(&op->ob_ref_shared, (1 << _Py_REF_SHARED_SHIFT));
       return 1;
    }
}

That might avoid the threads spinning on updating the value when the value is in heavy contention and instead all of the atomic operations should just slowly proceed.

@corona10 corona10 force-pushed the gh-123358 branch 3 times, most recently from d56c823 to 25d3f9e Compare September 28, 2024 07:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants