-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-115999: Add partial free-thread specialization for BINARY_SUBSCR #127227
Conversation
I am going to write test codes at this week. |
ht->_spec_cache.getitem = descriptor; | ||
ht->_spec_cache.getitem_version = version; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should make to be thread-safe by using LOCK_TYPE(), but it will be handled with separated PRs.
1dacf5d
to
1a901a3
Compare
@mpage Would you like to take a look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this! I left one comment inline, but otherwise looks good to me. I kicked off benchmark runs for default/free-threaded builds and will post the results once they're finished.
@mpage Done thanks! |
FYI, aarch64 linux CI failure is not related to this PR, #127332 will fix the issue. |
I should have caught this last time, but I think we want to use the fast-path for list accesses in |
You mean that the fast-path that can avoid locking object? |
Yep |
What about checking |
Yes, those are needed for correctness. I think for now we probably want to have two slightly different code paths for the free-threaded and default builds. The default build should continue to use the existing logic to fetch the item from the list. The free-threaded builds should use |
Matt, I define a new |
Python/bytecodes.c
Outdated
@@ -790,11 +790,17 @@ dummy_func( | |||
// Deopt unless 0 <= sub < PyList_Size(list) | |||
DEOPT_IF(!_PyLong_IsNonNegativeCompact((PyLongObject *)sub)); | |||
Py_ssize_t index = ((PyLongObject*)sub)->long_value.ob_digit[0]; | |||
#ifdef Py_GIL_DISABLED | |||
STAT_INC(BINARY_SUBSCR, hit); | |||
PyObject *res_o = _PyList_GetItemRef(list, index); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The default build of list_get_item_ref is almost same as current implementation,
we might able to this implementation to the default build also.
Lines 352 to 361 in 328187c
#else | |
static inline PyObject* | |
list_get_item_ref(PyListObject *op, Py_ssize_t i) | |
{ | |
if (!valid_index(i, Py_SIZE(op))) { | |
return NULL; | |
} | |
return Py_NewRef(PyList_GET_ITEM(op, i)); | |
} | |
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would certainly be cleaner, but it might come with some performance cost in the default build.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! I kicked off new benchmark runs and will post the results here once they're finished. They take a few hours to run and there are other runs queued ahead of them, so that will probably be sometime tomorrow.
There are a couple tests that should be enabled using requires_specialization_ft
now that we're specializing BINARY_SUBSCR
in free-threaded builds:
- test_opcache.TestRacesDoNotCrash.test_binary_subscr_list_int
- test_dis.DisTests.test_binary_subscr_specialize - This should probably be deleted. It's covered by the tests that you've added and we want to move the specialization tests out of
test_dis
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Most recent performance results look good:
- Neutral on the default build.
- ~1% improvement on free-threaded builds.
Include/internal/pycore_list.h
Outdated
@@ -10,6 +10,8 @@ extern "C" { | |||
|
|||
PyAPI_FUNC(PyObject*) _PyList_Extend(PyListObject *, PyObject *); | |||
extern void _PyList_DebugMallocStats(FILE *out); | |||
// _PyList_GetItemRef should be used only when the object is known to be a list. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would also be worth mentioning that this does not raise errors whereas PyList_GetItemRef
does.
It seems like the change introduced a regression: #127521. |
--disable-gil
builds #115999STORE_SUBSCR
#127169 would be merged.PyDict_GetItemRef
is thread-safe internally.CALL
instructions in free-threaded builds #127123 is merged, I should follow several strategies from here.