Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Setting default redirections for all commands #269

Closed
Lucas-C opened this issue Aug 20, 2015 · 23 comments
Closed

Setting default redirections for all commands #269

Lucas-C opened this issue Aug 20, 2015 · 23 comments

Comments

@Lucas-C
Copy link
Contributor

Lucas-C commented Aug 20, 2015

Hello,

First, a big THANK YOU for your work on this project.
I find it extremly useful, simple and pythonic.
And it's going to be extremely useful to convince my coworkers to migrate their shell scripts to Python :)

Now, would you be open to allowing some kind of global configuration, specifically on the default redirection for stderr ?

Currently, some "hardcoded" defaults are defined there:
https://github.com/amoffat/sh/blob/master/sh.py#L669
And by default the commands stderr is simply discarded.

Could that default behaviour be somehow configurable ?
I would like the commands stderr to be written to the parent Python script stderr by default.
And in the same spirit, it could be sometimes handy to forward stdout the same way.

I'd love to work on a pull request if you are ok with this feature request.

Regards

@amoffat
Copy link
Owner

amoffat commented Aug 20, 2015

Hi Lucas. I global configuration sounds useful

In theory, a global configuration is currently possible:

import sh
sh.Command._call_args["timeout"] = 1
sh.sleep(3)

This is ugly though because it's manipulating a clearly private variable. Others have suggested in the past that we could do some kind of configuration object:

from sh import Config
sh2 = Config(timeout=1)
sh2.sleep(3)

I think both ways can be acceptable. In the case of the former way, we should rename _call_args to maybe config so it's more clear that it is ok to change.

What do you think about implementing a configuration class? We would also need to add a section to the docs about "Global Configuration" that demonstrates how to use it. And of course tests :)

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 20, 2015

Sounds like a plan !

What about defining a module global constant cmd_defaults and assigning it to Command._call_args to keep a full retro-compatibility here ?
We could then do:

import sh, sys
sh.cmd_defaults['stdout'] = sys.stdout
sh.cmd_defaults['stderr'] = sys.stderr
from sh import bzcat, tar, unzip

We could implement a configuration class, but if its sole purpose is to provide an access to this global dict of defaults, I'm not sure it's really necessary. I vote for KISS & YAGNI :)
Do you see another use for such class ?

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 20, 2015

After a short code review, it looks to me that Command._call_args is only used by:

Hence it looks like currently the defaults are only taken into account at command invocation time, not at command import time.
That means that settings the defaults before or after importing a command would be strictly equivalent.
Is that an OK behaviour to you ?
To me it looks like a rare edge case, but it may seem counter-intuitive to witness:

import sh, sys
from sh import grep
...
def foo(bar):
    ...
    sh.defaults['stdout'] = sys.stdout
    # From now on, in any invocation of grep, its stdout will be fowarded to this script stdout

A solution could be to make _call_args no more a class attribute, but an instance field, copied from sh.defaults at __init__ time.

@amoffat
Copy link
Owner

amoffat commented Aug 20, 2015

Actually, the more I'm thinking about it, the less I like mutating the global config dictionary. I could see the case where this bites people:

import sh
sh.defaults["timeout"] = 1

# ... lots of lines later, in a different module ...

# why does this keep dying!?!?!?!?!
stuff = sh.cat("/tmp/some-large-file")

I'm not sure that the shared state drawbacks are worth the benefits.

A configuration class, on the other hand, is more explicit about the state it contains. When you use it, you know you are launching a command with modified defaults, and those defaults only apply to the commands that are launched from it, not globally.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 21, 2015

Ok, then what would you think of something like that:

from sh import shell
shell.set_defaults(timeout=1)
from shell import bcat, tar, unzip

I'd really prefer not to have to prefix all my commands by sh., and only explicitly import commands I need at the beginning of my script, so that they are checked to exist at the very beginning.
It's interesting though: you personally prefer using the module sh. prefix ?

@amoffat
Copy link
Owner

amoffat commented Aug 21, 2015

Looks good Lucas. If we can make both from shell import something and shell.something, it would have feature parity with the top level sh itself. I personally don't prefer either one, but I use both :)

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 21, 2015

Perfect ! I'll start working on this asap

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 21, 2015

A few thoughts after looking at the code.

First, what we planned is actually currently possible somehow !

import sh, sys
shell = sh(_out=sys.stdout)
from shell import ls
ls() # print to stdout and does not return anything
shell.ls() # ditto

This is thanks to SelfWrapper.__call__. But I really cannot guess for what purpose it was defined. Any idea ?
When I comment out this method, only the test_shared_secial_args test failed. And it looks like it's testing exactly what we want !

Now, the only issue that remain, is the matter of module isolation.
What I mean here is that we could have a file b.py :

from sh import ls
...

and another another file a.py :

import sh, sys
sh.set_defaults(_out=sys.stdout)
import b
b.ls()
...

And when running python a.py nothing would get printed, because the sh module in b is properly isolated.

The main blocker to achieve this is Python module management logic.
Currently if sh is imported in 2 modules, both will have access to the same SelfWrapper object.
In fact even sh.__env will be the same object, but Command instances will be different.

Do you thing it's a problem worth investigating ?

@amoffat
Copy link
Owner

amoffat commented Aug 21, 2015

That's great news. I had forgotten all about that and it's not in the docs.

Re: isolation. I think we will need to create a new module object in SelfWrapper.__call__, then insert it into sys.modules. This module will be wrapped with SelfWrapper too, and will need to be named something like "sh.shellN" or something. I think this will allow us to import from it then. This might be helpful.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 22, 2015

The more I think about it, the more I'm convinced it's probably not worth adding this kind of isolation, if we stick to the current usage.

In the end this planned usage:

import sh, sys
sh.set_defaults(_out=sys.stdout)
from sh import ls

Is not really such an improvment over:

import sh, sys
shell_with_defaults = sh(_out=sys.stdout)
from shell_with_defaults import ls

If you are OK, I'm simply going to at this to the documentation.

@amoffat
Copy link
Owner

amoffat commented Aug 22, 2015

sounds good

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Aug 24, 2015

Here is a first pull request implementing a solution to this issue.
I'm looking forward to know your opinion on it :)

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Sep 2, 2015

Hi. Any update on this ?

@amoffat
Copy link
Owner

amoffat commented Sep 2, 2015

Updated the PR. After reviewing this comment thread, I'm not sure we are on the same page with regards to state isolation. For any changes going forward, I would like to keep all state management explicit, for example, if you change global defaults, they should be global only in the context of an object that we launch commands from:

import sh, sys
shell_with_defaults = sh(_out=sys.stdout)
from shell_with_defaults import ls

Or, if a context manager was used...

import sh, sys
with sh.defaults(_out=sys.stdout) as sh2:
    from sh2 import ls

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Sep 2, 2015

I understand. But I think we simply cannot import from a module not named sh.
The name has to match a .py file to be added to the sys.modules.

@amoffat
Copy link
Owner

amoffat commented Sep 3, 2015

Modules don't need to be backed by any physical file. There are some cool you can do with the import system, for example, importing a module from over the network. We will just need to dynamically create a module object and register it correctly with sys.modules. I will give it a try

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Sep 4, 2015

Ok, great !
On my side I've experimented with using frames & the inspect module to solve the problem, but without success so far.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Nov 3, 2015

Any update on this ? :)
I'd love if you could provide a working example of importing from a non-existing module (like you do with shell_with_defaults and sh2 in your examples).
With this I could resume working on this and try to finalize a PR :)

@amoffat
Copy link
Owner

amoffat commented Nov 5, 2015

import sys
from types import ModuleType
from contextlib import contextmanager
from uuid import uuid4
import inspect


def fake_module():
    name = "sh-" + uuid4().hex
    mod = ModuleType(name)
    mod.thing = "i'm dynamically generated in module %s" % mod.__name__
    return mod


@contextmanager
def module_context():
    mod = fake_module()
    sys.modules[mod.__name__] = mod

    try:
        yield mod
    finally:
        del sys.modules[mod.__name__]
        del mod


class ShModuleHook(object):
    def find_module(self, fullname, path=None):
        parent_frame = inspect.stack()[-1][0]
        module = parent_frame.f_locals.get(fullname, None)

        loader = None
        if module:
            loader = self
        return loader

    def load_module(self, fullname):
        parent_frame = inspect.stack()[-1][0]
        module = parent_frame.f_locals.get(fullname, None)
        return module


sys.meta_path = [ShModuleHook()]

with module_context() as sh2:
    from sh2 import thing
    print(thing)

@amoffat
Copy link
Owner

amoffat commented Nov 5, 2015

I should probably add that it's a little crazy, but nothing that I'm not ok with.

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Nov 28, 2015

Thanks a lot for this code snippet !
I spent most my daily train trips this week playing and trying to understanding it.
It didn't know about Python module import logic, it was fascinating.

Based on this trick, I updated a new version of my PR.
Here is a basic usage example :

import sh, sys
_sh = sh(_out=sys.stdout.buffer if sys.version_info[0] == 3 else sys.stdout)
from _sh import echo
echo('Test OK')

Tell me what you think of it :)

@Lucas-C
Copy link
Contributor Author

Lucas-C commented Dec 6, 2015

I have updated the PR.

The feature is fully implemented, with all the tests passing, and just one notable limitation:
the new module returned as output of sh(...) should not be stored in a variable named sh too.

Because of Python module caching system, using sh later on would result in accessing the original sh module.

I tried to find a way around this limitation, but del sys.modules["sh"] only causes more mayem.

Given we document this in the docs, I think this PR is ready to be merged.

@amoffat amoffat added the 1.2 label Oct 6, 2016
@amoffat
Copy link
Owner

amoffat commented Oct 6, 2016

this is in the release-1.2 branch. thanks for your work

0-wiz-0 added a commit to NetBSD/pkgsrc-wip that referenced this issue Dec 12, 2016
*   added `_out` and `_out_bufsize` validator [#346](amoffat/sh#346)
*   bugfix for internal stdout thread running when it shouldn't [#346](amoffat/sh#346)

*   regression bugfix on timeout [#344](amoffat/sh#344)
*   regression bugfix on `_ok_code=None`

*   further improvements on cpu usage

*   regression in cpu usage [#339](amoffat/sh#339)

*   fd leak regression and fix for flawed fd leak detection test [#337](amoffat/sh#337)

*   support for `io.StringIO` in python2

*   added support for using raw file descriptors for `_in`, `_out`, and `_err`
*   removed `.close()`ing `_out` handler if FIFO detected

*   composed commands no longer propagate `_bg`
*   better support for using `sys.stdin` and `sys.stdout` for `_in` and `_out`
*   bugfix where `which()` would not stop searching at the first valid executable found in PATH
*   added `_long_prefix` for programs whose long arguments start with something other than `--` [#278](amoffat/sh#278)
*   added `_log_msg` for advanced configuration of log message [#311](amoffat/sh#311)
*   added `sh.contrib.sudo`
*   added `_arg_preprocess` for advanced command wrapping
*   alter callable `_in` arguments to signify completion with falsy chunk
*   bugfix where pipes passed into `_out` or `_err` were not flushed on process end [#252](amoffat/sh#252)
*   deprecated `with sh.args(**kwargs)` in favor of `sh2 = sh(**kwargs)`
*   made `sh.pushd` thread safe
*   added `.kill_group()` and `.signal_group()` methods for better process control [#237](amoffat/sh#237)
*   added `new_session` special keyword argument for controlling spawned process session [#266](amoffat/sh#266)
*   bugfix better handling for EINTR on system calls [#292](amoffat/sh#292)
*   bugfix where with-contexts were not threadsafe [#247](amoffat/sh#195)
*   `_uid` new special keyword param for specifying the user id of the process [#133](amoffat/sh#133)
*   bugfix where exceptions were swallowed by processes that weren't waited on [#309](amoffat/sh#309)
*   bugfix where processes that dupd their stdout/stderr to a long running child process would cause sh to hang [#310](amoffat/sh#310)
*   improved logging output [#323](amoffat/sh#323)
*   bugfix for python3+ where binary data was passed into a process's stdin [#325](amoffat/sh#325)
*   Introduced execution contexts which allow baking of common special keyword arguments into all commands [#269](amoffat/sh#269)
*   `Command` and `which` now can take an optional `paths` parameter which specifies the search paths [#226](amoffat/sh#226)
*   `_preexec_fn` option for executing a function after the child process forks but before it execs [#260](amoffat/sh#260)
*   `_fg` reintroduced, with limited functionality.  hurrah! [#92](amoffat/sh#92)
*   bugfix where a command would block if passed a fd for stdin that wasn't yet ready to read [#253](amoffat/sh#253)
*   `_long_sep` can now take `None` which splits the long form arguments into individual arguments [#258](amoffat/sh#258)
*   making `_piped` perform "direct" piping by default (linking fds together).  this fixes memory problems [#270](amoffat/sh#270)
*   bugfix where calling `next()` on an iterable process that has raised `StopIteration`, hangs [#273](amoffat/sh#273)
*   `sh.cd` called with no arguments no changes into the user's home directory, like native `cd` [#275](amoffat/sh#275)
*   `sh.glob` removed entirely.  the rationale is correctness over hand-holding. [#279](amoffat/sh#279)
*   added `_truncate_exc`, defaulting to `True`, which tells our exceptions to truncate output.
*   bugfix for exceptions whose messages contained unicode
*   `_done` callback no longer assumes you want your command put in the background.
*   `_done` callback is now called asynchronously in a separate thread.
*   `_done` callback is called regardless of exception, which is necessary in order to release held resources, for example a process pool
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants