Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add asynchronous file io and path wrappers #180

Merged
merged 57 commits into from
Jun 13, 2017
Merged

add asynchronous file io and path wrappers #180

merged 57 commits into from
Jun 13, 2017

Conversation

buhman
Copy link
Member

@buhman buhman commented May 29, 2017

This is an attempt to implement #20.

Todo:

_file_io:

_path:


Currently supported usages:

# files

async with await trio.open_file(filename) as f:
    await f.read()

async_string = trio.wrap_file(StringIO('test'))
assert await async_string.read() == 'test'

# paths

path = trio.Path('foo')
await path.resolve()

@codecov
Copy link

codecov bot commented May 29, 2017

Codecov Report

Merging #180 into master will increase coverage by 0.04%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #180      +/-   ##
==========================================
+ Coverage   99.01%   99.05%   +0.04%     
==========================================
  Files          59       63       +4     
  Lines        8012     8397     +385     
  Branches      569      606      +37     
==========================================
+ Hits         7933     8318     +385     
  Misses         62       62              
  Partials       17       17
Impacted Files Coverage Δ
trio/_file_io.py 100% <100%> (ø)
trio/_path.py 100% <100%> (ø)
trio/__init__.py 100% <100%> (ø) ⬆️
trio/tests/test_path.py 100% <100%> (ø)
trio/_util.py 87.2% <100%> (+1.13%) ⬆️
trio/tests/test_file_io.py 100% <100%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5e9e245...e74396c. Read the comment docs.

@njsmith
Copy link
Member

njsmith commented Jun 4, 2017

Hey, sorry for being slow to get to this. Even though you can't see it here, I've been thinking about it (and the annoying problem of how to best wrap that IO class hierarchy) all week...

High level comments

I'm not a huge fan of the metaclass thing, but there aren't any really nice solutions here, and it's local and well-contained so meh, we can switch it out for something else later if we want. (I suspect in the long run we may want to define some delegation helpers that work as class decorators and inject methods at class definition time. But there's no reason to try and solve that design problem here; async file IO already raises enough tricky design problems :-)

Regarding the overall public API: So far the rule has been that if there's a standard library module X and a module trio.X, then the trio version's public API is a mirror of the stdlib version. That's not the case here. Nor should it be: it doesn't make sense for trio to be re-exporting all the stuff in io. (Also trio.io would be a super confusing name given that like, everything in trio is about io in one way or another. I don't know why Python's designers thought that "io" was a good short name for "file io". But this is just me being grumpy.) So, my suggestion is that we move the stuff in your io.py and io/types.py to a _file_io.py file, and have it define open_file, wrap_file, and Path objects to re-export from the top-level trio namespace.

I have mixed feelings about whether to make the AsyncIO* classes public, except via wrap_file; it's just asking for someone to subclass them or something. We definitely shouldn't make AsyncIOBase public, because as a concrete class it's useless – it doesn't even have read or write! I guess for the other ones I'll probably come around eventually...

Regarding the "async with await open(...) nonsense": I agree that it's nonsense, but it's nonsense that accurately reflects the standard Python semantics, so I actually think we should keep it that way :-). And the trio style is quite strict about enforcing the idea that async functions are just functions that you call with await – I don't want to have functions that support both await foo(...) and async with foo(..) as calling conventions. Obviously the way coroutines work under the hood makes it possible to get up to all sorts of shenanigans with them, but that's not unusual... generally in Python you can do anything, but most of it's in bad taste, so we don't :-).

Also, it should be totally legal to write code like:

file_obj = await open(...)
async with file_obj:
    ...

so we need __aenter__ and __aexit__ methods on AsyncIOBase in any case.

Regarding custom handling for Path.open: I think this is no big deal? We can use the generated wrappers for the easy methods, and for the methods that need special handling we can write the wrapper methods manually. There's no rule that you can't use both techniques within a single class definition :-)

Speaking of which, AsyncIOBase.close also need to be written manually, because close methods have special semantics: they guarantee that even if they're cancelled, they still close the object before returning. This is a bit awkward given the way we're building on top of run_in_worker_thread, so I think the best solution is to write it like:

    async def close(self):
        # run_in_worker_thread will check for cancellation; make sure it can't be cancelled
        with _core.open_cancel_scope(shield=True):
            await run_in_worker_thread(self._wrapped.close)
        # but if there was a cancellation, we *should* raise `Cancelled`, now that the file is safely closed
        await _core.yield_if_cancelled()

As an optimization it might also make sense to check self._wrapped.closed before spawning a thread to call close. This whole async close thing is also pretty awkward given that e.g. socket objects in trio have synchronous close methods, but I don't know that there's much to be done about it. I guess we could call the file close methods aclose, like async generators have.

Regarding the threadpool thing: not sure what you're thinking of. Probably just some issue where I was mumbling incomprehensibly to myself :-). run_in_worker_thread is the right thing to use here.

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments on the patch itself.

trio/io/types.py Outdated
def __dir__(self):
return super().__dir__() + list(self._forward)

def __aiter__(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be decorated with trio._util.aiter_compat, which will fix the 3.5.0 vs 3.5.2 issue.

trio/io/types.py Outdated
wrapper.__name__ = meth_name
wrapper.__qualname__ = '.'.join((__name__,
cls.__name__,
meth_name))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the arguments to join are in the wrong order?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so; __name__ is trio.io.types (the module name), so it becomes something like trio.io.types.*IOBase.close.

trio/io/io.py Outdated
return _file


@singledispatch
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer an explicit if chain to the use of singledispatch, exactly because allowing random modules to change the behavior of a trio function is a great example of spooky action at a distance.

I get that you want to support pyfakefs.fake_filesystem.FakeFileWrapper, but if we're OK with mutating global state like singledispatch does then surely we can accomplish this just as easily by doing io.RawIOBase.register(pyfakefs.fake_filesystem.FakeFileWrapper) or whatever (and if pyfakefs doesn't already do this then that's surely a bug on their end?).

...actually looking more closely, I guess FakeFileWrapper actually changes which interface it implements depending on the specific object it's wrapping, which is a bit of a mess but also not one that @singledispatch can help with.

@njsmith
Copy link
Member

njsmith commented Jun 4, 2017

Hmm, looking again at the docs, they actually say that "Even though IOBase does not declare read(), readinto(), or write() because their signatures will vary, implementations and clients should consider those methods part of the interface". So maybe we should move those onto AsyncIOBase, and then wrap_file can use that as the fallback for wrapping unrecognized but allegedly file-like objects.

@njsmith
Copy link
Member

njsmith commented Jun 4, 2017

...and, more annoyingly, basically all the concrete classes in io add random extra attributes. FileIO objects have mode and name. (And empirically, BufferedReader and TextIOWrapper objects do too when wrapping a FileIO, even though this isn't documented.) BytesIO has getbuffer and getvalue; StringIO has getvalue. BufferedReader adds peek. TextIOWrapper adds line_buffering.

I really don't want to define our own file IO APIs but man these ones are a mess :-(.

@imrn
Copy link

imrn commented Jun 4, 2017 via email

@njsmith
Copy link
Member

njsmith commented Jun 4, 2017

@imrn

I don't think it is nonsense. Opening the file is an async operation. Entering its context is an other async thing.

Yes, we're agreeing. It's nonsense in the sense that it's an awkward thing to deal with and "looks wrong". (Before this I would have said that async with await was always a bug because the blocking operation could be folded into the __aenter__, and I think it might even be useful to have a version of open that only works with async with). But it's not nonsense because, well, what you said.

In fact it might might sense to support something like

async with path_obj.mode("rb") as file_obj:
    ...

Or maybe that's too much a violation of There's Only One Way To Do It, not sure.

@buhman
Copy link
Member Author

buhman commented Jun 4, 2017

I really don't want to define our own file IO APIs but man these ones are a mess :-(.

Do you think it would be useful to add more wrappers for the non-base io classes?

@buhman
Copy link
Member Author

buhman commented Jun 4, 2017

Even though IOBase does not declare read(), readinto(), or write() because their signatures will vary, implementations and clients should consider those methods part of the interface

TextIOBase doesn't define readinto.

Do you think it makes sense to do this anyway?

@njsmith
Copy link
Member

njsmith commented Jun 5, 2017

Do you think it would be useful to add more wrappers for the non-base io classes?

I'd really rather not. Probably the simplest option would be to make the attribute that holds the wrapped object public, so in the rare cases where people need to access quirky attributes they can do async_file_obj.wrapped.mode or whatever?

TextIOBase doesn't define readinto. Do you think it makes sense to do this anyway?

Ugh, good point. It might, since readinto is commonly available on random user-defined "file like" objects.

Well, here's an idea to consider: we could write the wrapper classes so that they only expose attributes that actually exist on the wrapped object. In Python, the only ways to tell whether a class has an attribute are (a) whether it shows up in __dir__, and (b) whether trying to access it raises AttributeError. (E.g. hasattr internally just tries to access the attribute, and returns False if it gets an AttributeError.) And notice that for the synchronous attributes, this PR's current __getattr__-based implementation will already raise AttributeError if the corresponding attribute is missing on the underlying object, so we're already half-way there. If we also used __getattr__ for the async methods (i.e. generating the wrapper on each call to __getattr__ instead of doing it at class initialization), and added a line like attrs = [a for a in attrs if hasattr(self, a)] to our __dir__ method, then I think that would do it.

If going this way, it's even tempting to just have a single wrapper class for all "vaguely file-like objects" in Python, since as we've learned, the "file-like" interface is pretty vague to start with. That would also solve the problem with pyfakefs.

@njsmith
Copy link
Member

njsmith commented Jun 5, 2017

Oh, another point I just realized: detach returns a file-like, so our version needs a special case to wrap the return value, like:

async def detach(self):
    return wrap(await run_in_worker_thread(self._wrapped.detach))

@buhman
Copy link
Member Author

buhman commented Jun 5, 2017

i.e. generating the wrapper on each call to __getattr__ instead of doing it at class initialization

Doesn't this mean each call to __getattr__ also returns a different instance of that method? Calling the method factory for each getattr seems a bit wasteful. Maybe we wrap __getattr__ in functools.lru_cache with infinite (bounded by the number of valid attributes in the wrapped file-like) size?

it's even tempting to just have a single wrapper class

I do like this idea.

@buhman
Copy link
Member Author

buhman commented Jun 5, 2017

so our version needs a special case to wrap the return value, like:

I noticed this while I was playing with AsyncPath, which has things like resolve and with_name. Instead of special-casing, I was a bit more heavy-handed: 24dc536#diff-c4df1d85a0d10c3c2ebc8ae9f39dd617R31; if that looks ok, I'll figure out a DRY-ish way to do that in both modules.

I also thought it was a bit too tedious to sort out the numerous methods manually, so after reading pathlib#L606-L607, I decided to automatically compute _forward and _wrap based on whether this method is defined in PurePath or Path.

@njsmith
Copy link
Member

njsmith commented Jun 5, 2017

Doesn't this mean each call to __getattr__ also returns a different instance of that method? Calling the method factory for each getattr seems a bit wasteful.

I'm not really worried about the overhead, given that each method call is also creating a thread :-). I guess it's a bit weird maybe if foo.meth is foo.meth returns False. If it matters then the simple solution is to have __getattr__ save the computed attribute the first time it's called, so the next time it gets pulled from the instance dict instead of calling __getattr__. (Maybe that's phrased confusingly. I'm talking about having __getattr__ do value = ...; setattr(self, attr, value); return value.) An LRU cache would be overkill here I think.

I also thought it was a bit too tedious to sort out the methods manually, so after reading pathlib#L606-L607, I decided to automatically compute _forward and _wrap based on whether this method is defined in PurePath or Path.

Hmm. This makes me twitch a bit because it's extremely close to the sororicide pattern but I think maybe in this specific case it's OK?

From a glance at your work-in-progress though it seems like you might be overcomplicating it a bit though – we only need to wrap pathlib.Path, nothing else. (I don't know why python documents WindowsPath and PosixPath as always being available; in any given install only one of them is usable, and it's the one aliased to Path.) And we can compute the _forwards and _wraps sets like:

_forwards = [attr for attr in dir(pathlib.PurePath) if not attr.startswith("_")]
_wraps = [attr for attr in dir(pathlib.Path) if not attrs.startswith("_") and attr not in _forwards]

and then the actual forwarding code can be identical to the code for files, except that we need the if isinstance(result, pathlib.Path): result = Path(result) check.

@njsmith
Copy link
Member

njsmith commented Jun 5, 2017

And btw, thanks for your patience here – I'm picky about these things to start with, and then this is an extraordinarily tricky API design challenge on top of that. But I think we're getting there!

@buhman
Copy link
Member Author

buhman commented Jun 5, 2017

thanks for your patience here

I'm just here to learn; I'm in no rush! Thanks for helping!

@njsmith njsmith mentioned this pull request Jun 5, 2017
@buhman
Copy link
Member Author

buhman commented Jun 6, 2017

I thought about this all day, and came up with 93ccad2, which seems to make the most sense to me—if a wrapped file object doesn't support a particular method/property, AsyncIO will just end up throwing AttributeError, which is the same thing that would happen if you tried to do that on the underlying file object anyway.

If that looks ok, I'll sort out the other forwarded or wrapped attributes accordingly, and clean up the docs.

This change makes me glad I gave up on bothering to copy function signatures earlier, because we can't reliably compute that during class creation now anyway.

@njsmith
Copy link
Member

njsmith commented Jun 6, 2017

It seems like this is maybe more complicated than it needs to be? Like I haven't tested this, but I think it has everything we need for the file-like wrapper, and I can almost fit the whole class on my screen at once:

# This class is "clever". These two globals define the attributes that it
# *can* delegate to the wrapped object, but any particular instance only
# *does* delegate the attributes that are actually found on the object its
# wrapping. So e.g. if you wrap an object that has no "readinto", then
# hasattr(wrapper, "readinto") returns False, tab completion doesn't offer
# readinto, etc.
_FILE_LIKE_SYNC_ATTRS = {...}
_FILE_LIKE_ASYNC_METHODS = {...}
class AsyncFileLikeWrapper:
    def __init__(self, wrapped):
        self.wrapped = wrapped

    def __getattr__(self, name):
        if name in _FILE_LIKE_SYNC_ATTRS:
            # May raise AttributeError which we let escape
            return getattr(self.wrapped, name)
        elif name in _FILE_LIKE_ASYNC_METHODS:
            # May raise AttributeError which we let escape
            meth = getattr(self.wrapped, name)
            async def async_wrapper(*args, **kwargs):
                return await run_in_worker_thread(partial(meth, *args, **kwargs))
            async_wrapper.__name__ = name
            async_wrapper.__qualname__ = self.__class__.__qualname__ + "." + name)
            # cache the generated method so that self.foo is self.foo == True
            setattr(self, name, async_wrapper)
            return async_wrapper
        else:
            raise AttributeError(name)

    def __dir__(self):
        attrs = set(dir(self))
        attrs.update(a for a in _FILE_LIKE_SYNC_ATTRS if hasattr(self, a))
        attrs.update(a for a in _FILE_LIKE_ASYNC_METHODS if hasattr(self, a))
        return attrs

    def __repr__(self):
        return "trio.wrap_file({!r})".format(self.wrapped)

    async def detach(self):
        ret = await run_in_worker_thread(self.wrapped.detach)
        return wrap_file(ret)

    async def close(self):
        with _core.open_cancel_scope(shield=True):
            await run_in_worker_thread(self.wrapped.close)
        # run_in_worker_thread did a schedule point, but we carefully
        # prevented it from doing a cancel point, so do that now.
        await _core.yield_if_cancelled()

    async def __aenter__(self):
        return self

    async def __aexit__(self, *_):
        await self.close()

    @_util.aiter_compat
    def __aiter__(self):
        return self

    async def __anext__(self):
        try:
            return await run_in_worker_thread(next, self.wrapped)
        except StopIteration:
            raise AsyncStopIteration

def wrap_file(file_like):
    if not hasattr(file_like, "read") and not hasattr(file_like, "write"):
        raise TypeError(
            "{!r} object doesn't quack like a file (no read or write method)"
            .format(file_like.__class__))
    return AsyncFileLikeWrapper(file_like)

async def open_file(*args, **kwargs):
    do_open = partial(open, *args, **kwargs)
    return wrap_file(await run_in_worker_thread(do_open))

The Path wrapper is a little different, because it doesn't need the cleverness where the available attributes change each time, and it needs the special handling for methods that return Paths. I'm not sure it's worth trying to share code between them, because it's really only a few lines in common and they end up needing tweaking anyway because of these differences. But it doesn't need to be any more complicated, I don't think.

@buhman
Copy link
Member Author

buhman commented Jun 6, 2017

maybe more complicated than it needs to be

Probably.

*_sync_attrs
*_async_methods

This convention seems the clearest so far.

attrs.update(a for a in _FILE_LIKE_SYNC_ATTRS if hasattr(self, a))
attrs.update(a for a in _FILE_LIKE_ASYNC_METHODS if hasattr(self, a))

I thought about doing this too before I wrote 93ccad2; if we call hasattr for every supported attribute, I think we might as well just do it once during __init__:

def __init__(self, wrapped):
    …
    self._available_sync_methods = [attr for attr in _FILE_LIKE_SYNC_ATTRS if hasattr(self.wrapped, attr)]
    self._available_async_methods = [attr for attr in _FILE_LIKE_ASYNC_METHODS if hasattr(self.wrapped, attr)]

I'll back up a few commits and try it this way (but I do like the idea of a generic-ish wrapper-generator, which I guess is what I started writing).

@buhman
Copy link
Member Author

buhman commented Jun 6, 2017

Also, is Closing(Async)ContextManager undesirable (making async with await the only supported context manager convention)?

@njsmith
Copy link
Member

njsmith commented Jun 6, 2017

if we call hasattr for every supported attribute, I think we might as well just do it once during __init__

Does freezing/caching the attribute lists like this simplify the code, though? __dir__ is extremely not performance sensitive, and usually more state = bad.

Also, is Closing(Async)ContextManager undesirable (making async with await the only supported context manager convention)?

Yeah, this is what I was trying to say somewhere up above... the await should be mandatory.

Rationale: I don't think async with await open_file(...) is the prettiest, but I like it better than having the same function support being called like await open_file(...) and like async with open_file(...) at the same time. Trio functions should pick one of those and stick to it; anything else is too magic and breaks the simplifying illusion that the difference between async functions and regular functions is just that async functions require await when called (see e.g. the tutorial). And we kind of have to support await open_file(...). So async with open_file(...) loses. "Special cases aren't special enough to break the rules", as import this says. Plus since we have to support both await open_file(...) and async with file_obj, the async with await open_file(...) case falls out as supported automatically, whereas the async with open_file(...) case would be an extra thing people would have to memorize ("there should by one – and preferably only one – obvious way to do it").

If it makes you feel better, people will also be writing things like async with await open_tcp_stream(hostname, port) as soon as I finish implementing the high level stream code :-)

@buhman
Copy link
Member Author

buhman commented Jun 6, 2017

Does freezing/caching the attribute lists like this simplify the code, though?

Not really, and __init__ is probably the more performance-sensitive thing anyway; I'll swap this around.

@buhman buhman changed the title [wip] asynchronous disk IO add asynchronous file io and path wrappers Jun 10, 2017
@buhman
Copy link
Member Author

buhman commented Jun 10, 2017

@njsmith I think this is ready to merge!

@Akuli
Copy link

Akuli commented Jun 11, 2017

I would expect type(asyncfile).method(asyncfile) to do the same thing as asyncfile.method(). It doesn't work if the methods are generated in the __getattr__ of type(asyncfile). (Edit: curio just defines the methods one by one.)

It's OK if instance.method is instance.method is False, but at least instace.method == instance.method should be True IMO. This is actually how normal methods work:

>>> class Thing:
...   def method(self): pass
... 
>>> t = Thing()
>>> t.method is t.method
False
>>> t.method == t.method
True
>>> 

Copy link
Member

@njsmith njsmith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's great to see this coming together! Comments below.


"""

if isinstance(file, io.IOBase):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should enforce this type check, now that we've moved to a more duck-type-oriented way of doing the wrapping. There are lots of legacy objects out there that people think of as file-like but that predate the existence of IOBase, and may not implement all its methods.

So I think a looser check that's just designed to catch blatant accidents would be better, e.g. the hasattr(f, "read") or hasattr(f, "write") I used before. What do you think?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think all that a file really needs is close, because that's the only assumption AsyncIOWrapper makes.

'write', 'writelines',
# not defined in *IOBase:
'readinto1', 'peek',
]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's make these {sets}; it's what they really are, and also miggggght make __getattr__ faster (though this probably doesn't matter)

setattr(cls, attr_name, wrapper)


class AsyncPath(metaclass=AsyncAutoWrapperType):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be called Path. (Just like how it's trio.socket.socket(...), not trio.socket.async_socket(...).)

The __repr__ should probably be something like trio.Path(...) though just to avoid that bit of possible confusion with stdlib's Path.

(Probably the simplest way to do this is to make the import at the top be import pathlib instead of from pathlib import Path, PurePath.)

Copy link

@Akuli Akuli Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this just blindly add a thread wrapper around all methods? I don't think it makes sense to run things like with_name in threads.

Edit: nevermind, I just noticed that we're already aware of this.



# not documented upstream
delattr(AsyncPath.absolute, '__doc__')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this here? What does it do?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of trio.AsyncPath.absolute.__doc__ makes a reference to :meth:~pathlib.Path.absolute, which does not exist. If this line were removed, building docs would fail.



# python3.5 compat
if hasattr(os, 'PathLike'): # pragma: no cover
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this need pragma: no cover?

Copy link
Member Author

@buhman buhman Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we combine coverage between 3.5/3.6+ properly?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like this # pragma: no cover still needs to go?

getattr(async_file, meth_name)

with pytest.raises(AttributeError):
getattr(wrapped, meth_name)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this a different way of writing that test above with the any and the all?

Copy link
Member Author

@buhman buhman Jun 11, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No; because this test is testing getattr while the any/all test is testing __dir__.



async def test_async_iter(async_file):
async_file._wrapped = io.StringIO('test\nfoo\nbar')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, this would be clearer and more future-proof if you dropped the fixture and wrote `async_file = wrap_file(io.StringIO(...))'

assert actual == next(expected)

with pytest.raises(StopIteration):
next(expected)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A less error-prone way to write this might be:

got = []
async for line in async_file:
    got.append(line)
assert got == expected

with pytest.raises(_core.Cancelled):
await f.write('a')

await f.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line should also raise Cancelled, so we might as well add a with pytest.raises(...) to assert that.

detached = await async_file.detach()

assert isinstance(detached, trio.AsyncIOWrapper)
assert detached.wrapped == raw
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use is, not == for testing object identity.

@njsmith
Copy link
Member

njsmith commented Jun 11, 2017

@Akuli:

I would expect type(asyncfile).method(asyncfile) to do the same thing as asyncfile.method(). It doesn't work if the methods are generated in the __getattr__ of type(asyncfile). (Edit: curio just defines the methods one by one.)

The curio way also makes sense to me. I can certainly see arguments either way. The trade-off is that with the way in this PR, tab-completion works correctly (only offers options that actually make sense for the given object) but type(asyncfile).method(asyncfile) breaks (why would you ever do this?) and it's a bit more complicated to set up. The curio approach is the other way around. I think people use tab completion more often then they try to pull class functions off of an object, but it's a pretty small-scale problem in either case.

@buhman

What do you think about ripping all of this out, then making a python-trio/async-file-io or similar that supports creating both asyncio and trio threads?

Personally, I would prefer not to. Fundamentally for me trio is an experiment, and the question it's trying to answer is: "can we make an API that's so much nicer than asyncio-and-friends that it can overcome the massive gap in maturity/ecosystem/mindshare?" I'm OK with the answer being "no", but I'm not OK with answer being "yes we could have made such an API, but we failed to do so because we got distracted with other things and now no-one will ever know" :-/. So I want to try hard to avoid distractions that might compromise trio's progress, like writing a new file io library for asyncio and then having to support both versions. Plus I think there might be unexpected complexities here – in particular cancellation works pretty differently between them, so close would need special casing, and potentially some of the side-effect-free methods on Path could be improved by passing cancellable=True to run_in_worker_thread, and after #181 gets fixed I can imagine we might want to allow users to specify the limiter= argument on a per-file basis, etc. I'm dubious about the value of supporting OS-native AIO primitives, but if someone wants to experiment with that it's much easier if they don't have to worry about asyncio in the process. And so forth. It's generally much easier to make these kinds of changes if everything lives inside the same tree and only has to worry about trio's peculiarities. Of course, if you want to make such a library, you totally should :-). I just suspect it'll end up being easier to make it two libraries instead of trying to wedge them both together into a single library under the python-trio org.

@Akuli
Copy link

Akuli commented Jun 11, 2017

Maybe the __getattr__ way isn't too bad after all. The converters argument of configparser.ConfigParser doesn't support type(thing).method(thing) either, it just adds a bunch of partials to the parser and its sections:

>>> p = configparser.ConfigParser(converters={'wololo': int})
>>> p.read_dict({'a': {'b': '123'}})
>>> p.getwololo('a', 'b')
123
>>> p['a'].getwololo('b')
123
>>> p.getwololo
functools.partial(<bound method RawConfigParser._get_conv of <configparser.ConfigParser object at 0x7f37f424c860>>, conv=<class 'int'>)
>>> p['a'].getwololo
functools.partial(<bound method SectionProxy.get of <Section: a>>, _impl=functools.partial(<bound method RawConfigParser._get_conv of <configparser.ConfigParser object at 0x7f37f424c860>>, conv=<class 'int'>))
>>> 

@buhman
Copy link
Member Author

buhman commented Jun 11, 2017

@njsmith I fixed everything except #180 (comment) which depends on #180 (comment)

buhman added 28 commits June 13, 2017 01:56
This collapses the _file_io and _path modules into the trio package, and moves
_helpers to _util
This slightly duplicates _from_wrapped logic, but is probably cleaner overall.
This replaces incorrect use of ==/is, makes test_async_iter more obvious, and
adds a missing raises assertion in test_close_cancelled.
Previously attempts to compare trio.Path with itself would fail, due to the
wrapped object not supporting comparisons with trio.Path.
This also removes the _from_wrapped classmethod, and adds an adds an additional
test for passing a trio path to trio open_file.
This removes the pathlib.Path(trio.Path()) test, unwraps args in trio.Path(),
and manually unwraps paths inside open_file.
duck-files are now required to define either write or read in addition to close
This removes references to _wrapped, and adds all possible Path comparisons that
affect trio.Path.
Instead of indiscriminately casting everything to str, including non-PathLikes,
just unwrap all trio.Paths inside args.
@njsmith njsmith merged commit 9e0df61 into python-trio:master Jun 13, 2017
@buhman
Copy link
Member Author

buhman commented Jun 13, 2017

💯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants