-
-
Notifications
You must be signed in to change notification settings - Fork 31.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems #121315
gh-121313: Limit the reading size from pipes to their default buffer size on Unix systems #121315
Conversation
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool. If this change has little impact on Python users, wait for a maintainer to apply the |
Following your chain of pieces, could this be made to be more targeted to the specific case potentially? Two thoughts
See also: gh-117151 which is aiming to increase the default size (albeit focused around write performance) |
I've tried shifting the check to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking reasonable to me overall: Unlikely to break compatibility or reduce performance, improves default behavior. A couple smaller change requests from me.
It would be nice to add a test that will fail if something breaks / results in the "read too large on pipes resulting in bad behavior" again, although I don't see a straightforward way to do that (Maybe mocking Connection._read
in a new test in _test_multiprocessing
and checking the size of read when know it is a pipe?)
Lib/multiprocessing/connection.py
Outdated
@@ -18,6 +18,7 @@ | |||
import time | |||
import tempfile | |||
import itertools | |||
import stat |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Personal nitpick, PEP-8 doesn't seem to specify (https://peps.python.org/pep-0008/#imports), but I like imports to be alphabetical. itertools
, time
, and tempfile
which were already in the code just above this are also out of order (although time and tempfile only slightly). Rest are in order. Not sure if it matters for Python core developer acceptance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
Lib/multiprocessing/connection.py
Outdated
is_pipe = False | ||
page_size = 0 | ||
if not _winapi: | ||
page_size = os.sysconf(os.sysconf_names['SC_PAGESIZE']) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than do the if not _winapi
here, which has to be run/interpreted per _recv
call, can you add the "calculate max size for a fifo" like https://github.com/python/cpython/blob/main/Lib/multiprocessing/connection.py#L370-L379 does to choose/define the standard read function? Code here will still need to do the min
logic + "is this a fifo", but at least reduces overhead work a little bit further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've shifted fetching the base page size and calculating the default pipe size to the existing if _winapi
block above. Is this what you meant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep, looking good
@@ -0,0 +1 @@ | |||
Limit reading size in os.read for pipes to default pipe size in order to avoid memory overallocation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be updated from os.read
-> multiprocessing
to follow the logic location change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done
I think as far as I can review / needs a python core dev / someone with more project familiarity to look for high level things. Some lingering thoughts I have:
|
|
Hi @cmaloney, I wanted to check in and see if there are any additional steps I need to take for this pull request before it can be reviewed by a core developer. Thank you! |
Re: Core Review, as far as I know no other steps needed. From https://devguide.python.org/getting-started/pull-request-lifecycle/#reviewing it's mainly just patience, that document suggests a month wait before pinging other locations. |
There's one potential further optimization, at least on Linux. |
Thanks for taking this on! |
This PR uses 256KiB, not 64KiB on M1 mac (16K page). |
The Changelog entry was added to C API category, instead of the Library category. |
Nice catch. I will change the category in #123559. |
Issue: #121313