Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Always 100% cpu usage using _bg=True #339

Closed
rw opened this issue Dec 1, 2016 · 11 comments
Closed

Always 100% cpu usage using _bg=True #339

rw opened this issue Dec 1, 2016 · 11 comments
Labels

Comments

@rw
Copy link

rw commented Dec 1, 2016

If I run this script using the latest code from master (703657b):

import time

import sh

x = sh.sleep(60, _bg=True)
time.sleep(60)
x.wait()

CPU usage spikes to 100% and stays there for the duration of the command. Is there a way around this bug?

OSX 10.11.6, Pythons 2.7.12 and 3.5.2

@amoffat amoffat added the bug label Dec 1, 2016
@amoffat
Copy link
Owner

amoffat commented Dec 1, 2016

shhh...mining bitcoins 💰

thanks for reporting. just pushed up 1.12.4 live with the regression fix. should be fine now. close the issue if it looks good

@rw
Copy link
Author

rw commented Dec 1, 2016

Much better! It's still eating up CPU though. About 5% CPU per waiting process, so when I run this it eats 20% CPU:

import time

import sh

procs = []
for i in range(4):
    procs.append(sh.sleep(60, _bg=True))
time.sleep(60)

@amoffat
Copy link
Owner

amoffat commented Dec 1, 2016

hm..not as high on my machine. i just inserted a sleep into a freely-spinning background thread that has to run. let me take a deeper look tomorrow

@rw
Copy link
Author

rw commented Dec 1, 2016

I'll close this issue since it's looking a lot better, but I do think someone should examine it further. Thanks!

@rw rw closed this as completed Dec 1, 2016
@amoffat
Copy link
Owner

amoffat commented Dec 1, 2016

made some more improvements in release 1.12.5 and got 100 processes down to ~5% cpu on my machine.

import sh

procs = []                                                                                                        
for i in range(100):                                                                                              
    p = sh.sleep(30, _bg=True)                                                                                   
    procs.append(p)                                                                                                                                                                                                                  
[p.wait() for p in procs]

want to have one more look and confirm @rw ?

@rw
Copy link
Author

rw commented Dec 1, 2016

Looks a lot better, about 10-20% on my machine for those 100 processes.

Is there an evented way to do wait? It seems like a waste to be polling so much.

@amoffat
Copy link
Owner

amoffat commented Dec 1, 2016

I know what you mean. I was able to replace most of the internal polling with threading.Events, but in some places, mostly related to timeouts, we have to remain responsive. For example, there's a thread that runs select.select on the process's output pipe/tty. That thread also needs to check if the timeout Event.is_set(), so we can't wait indefinitely on select.select...we can only select for a very short time. The speed that that thread spins is directly tied with how fast the process will finish and clean up. But the speed of that thread spinning contributes to the CPU

@rw
Copy link
Author

rw commented Dec 1, 2016

Ah okay. Is there an ergonomic way to let the user configure how much load this puts? E.g. if I'm running 100 procs that I know will take about 10 minutes to run, should I be able to increase the polling interval?

@amoffat
Copy link
Owner

amoffat commented Dec 1, 2016

No, there's not. The closest you would have to something like that would be to use _fg=True, which will spawn the process in such a way that there are no background threads, since the input/outputs will be directly connected (via os.dup2) to sys.stdin/out/err. You would see much lower CPU usage there, but you will be unable to capture the output.

@amoffat
Copy link
Owner

amoffat commented Dec 1, 2016

But you also won't be able to batch-launch processes with _fg=True, so that probably doesn't work for you.

@rw rw closed this as completed Dec 1, 2016
@rw
Copy link
Author

rw commented Dec 1, 2016

I suggest adding a caveat about CPU usage to the documentation for _bg=True.

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants