-
-
Notifications
You must be signed in to change notification settings - Fork 346
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
add asynchronous file io and path wrappers #180
Conversation
Codecov Report
@@ Coverage Diff @@
## master #180 +/- ##
==========================================
+ Coverage 99.01% 99.05% +0.04%
==========================================
Files 59 63 +4
Lines 8012 8397 +385
Branches 569 606 +37
==========================================
+ Hits 7933 8318 +385
Misses 62 62
Partials 17 17
Continue to review full report at Codecov.
|
Hey, sorry for being slow to get to this. Even though you can't see it here, I've been thinking about it (and the annoying problem of how to best wrap that IO class hierarchy) all week... High level commentsI'm not a huge fan of the metaclass thing, but there aren't any really nice solutions here, and it's local and well-contained so meh, we can switch it out for something else later if we want. (I suspect in the long run we may want to define some delegation helpers that work as class decorators and inject methods at class definition time. But there's no reason to try and solve that design problem here; async file IO already raises enough tricky design problems :-) Regarding the overall public API: So far the rule has been that if there's a standard library module I have mixed feelings about whether to make the Regarding the " Also, it should be totally legal to write code like: file_obj = await open(...)
async with file_obj:
... so we need Regarding custom handling for Speaking of which, async def close(self):
# run_in_worker_thread will check for cancellation; make sure it can't be cancelled
with _core.open_cancel_scope(shield=True):
await run_in_worker_thread(self._wrapped.close)
# but if there was a cancellation, we *should* raise `Cancelled`, now that the file is safely closed
await _core.yield_if_cancelled() As an optimization it might also make sense to check Regarding the threadpool thing: not sure what you're thinking of. Probably just some issue where I was mumbling incomprehensibly to myself :-). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments on the patch itself.
trio/io/types.py
Outdated
def __dir__(self): | ||
return super().__dir__() + list(self._forward) | ||
|
||
def __aiter__(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be decorated with trio._util.aiter_compat
, which will fix the 3.5.0 vs 3.5.2 issue.
trio/io/types.py
Outdated
wrapper.__name__ = meth_name | ||
wrapper.__qualname__ = '.'.join((__name__, | ||
cls.__name__, | ||
meth_name)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the arguments to join
are in the wrong order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so; __name__
is trio.io.types
(the module name), so it becomes something like trio.io.types.*IOBase.close
.
trio/io/io.py
Outdated
return _file | ||
|
||
|
||
@singledispatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer an explicit if
chain to the use of singledispatch
, exactly because allowing random modules to change the behavior of a trio function is a great example of spooky action at a distance.
I get that you want to support pyfakefs.fake_filesystem.FakeFileWrapper
, but if we're OK with mutating global state like singledispatch
does then surely we can accomplish this just as easily by doing io.RawIOBase.register(pyfakefs.fake_filesystem.FakeFileWrapper)
or whatever (and if pyfakefs
doesn't already do this then that's surely a bug on their end?).
...actually looking more closely, I guess FakeFileWrapper
actually changes which interface it implements depending on the specific object it's wrapping, which is a bit of a mess but also not one that @singledispatch
can help with.
Hmm, looking again at the docs, they actually say that "Even though IOBase does not declare read(), readinto(), or write() because their signatures will vary, implementations and clients should consider those methods part of the interface". So maybe we should move those onto |
...and, more annoyingly, basically all the concrete classes in I really don't want to define our own file IO APIs but man these ones are a mess :-(. |
On 6/4/17, Nathaniel J. Smith ***@***.***> wrote:
Regarding the "`async with await open(...)` nonsense": I agree that it's nonsense.
I don't think it is nonsense.
- Opening the file is an async operation.
- Entering its context is an other async thing.
So what is the problem? (Other than the possibility forgetting "await".)
User should be aware of the chain of operations and their conceptual
meaning.
Providing a shortcut like "async with open(...)" will cause
moving the async lines in open(...) to __aenter__(). This
will distort ortogonality of both methods as well as their possible
interactions with others.
|
Yes, we're agreeing. It's nonsense in the sense that it's an awkward thing to deal with and "looks wrong". (Before this I would have said that In fact it might might sense to support something like async with path_obj.mode("rb") as file_obj:
... Or maybe that's too much a violation of There's Only One Way To Do It, not sure. |
Do you think it would be useful to add more wrappers for the non-base io classes? |
TextIOBase doesn't define Do you think it makes sense to do this anyway? |
I'd really rather not. Probably the simplest option would be to make the attribute that holds the wrapped object public, so in the rare cases where people need to access quirky attributes they can do
Ugh, good point. It might, since Well, here's an idea to consider: we could write the wrapper classes so that they only expose attributes that actually exist on the wrapped object. In Python, the only ways to tell whether a class has an attribute are (a) whether it shows up in If going this way, it's even tempting to just have a single wrapper class for all "vaguely file-like objects" in Python, since as we've learned, the "file-like" interface is pretty vague to start with. That would also solve the problem with |
Oh, another point I just realized: async def detach(self):
return wrap(await run_in_worker_thread(self._wrapped.detach)) |
Doesn't this mean each call to
I do like this idea. |
I noticed this while I was playing with I also thought it was a bit too tedious to sort out the numerous methods manually, so after reading pathlib#L606-L607, I decided to automatically compute |
I'm not really worried about the overhead, given that each method call is also creating a thread :-). I guess it's a bit weird maybe if
Hmm. This makes me twitch a bit because it's extremely close to the sororicide pattern but I think maybe in this specific case it's OK? From a glance at your work-in-progress though it seems like you might be overcomplicating it a bit though – we only need to wrap _forwards = [attr for attr in dir(pathlib.PurePath) if not attr.startswith("_")]
_wraps = [attr for attr in dir(pathlib.Path) if not attrs.startswith("_") and attr not in _forwards] and then the actual forwarding code can be identical to the code for files, except that we need the |
And btw, thanks for your patience here – I'm picky about these things to start with, and then this is an extraordinarily tricky API design challenge on top of that. But I think we're getting there! |
I'm just here to learn; I'm in no rush! Thanks for helping! |
I thought about this all day, and came up with 93ccad2, which seems to make the most sense to me—if a wrapped file object doesn't support a particular method/property, If that looks ok, I'll sort out the other forwarded or wrapped attributes accordingly, and clean up the docs. This change makes me glad I gave up on bothering to copy function signatures earlier, because we can't reliably compute that during class creation now anyway. |
It seems like this is maybe more complicated than it needs to be? Like I haven't tested this, but I think it has everything we need for the file-like wrapper, and I can almost fit the whole class on my screen at once: # This class is "clever". These two globals define the attributes that it
# *can* delegate to the wrapped object, but any particular instance only
# *does* delegate the attributes that are actually found on the object its
# wrapping. So e.g. if you wrap an object that has no "readinto", then
# hasattr(wrapper, "readinto") returns False, tab completion doesn't offer
# readinto, etc.
_FILE_LIKE_SYNC_ATTRS = {...}
_FILE_LIKE_ASYNC_METHODS = {...}
class AsyncFileLikeWrapper:
def __init__(self, wrapped):
self.wrapped = wrapped
def __getattr__(self, name):
if name in _FILE_LIKE_SYNC_ATTRS:
# May raise AttributeError which we let escape
return getattr(self.wrapped, name)
elif name in _FILE_LIKE_ASYNC_METHODS:
# May raise AttributeError which we let escape
meth = getattr(self.wrapped, name)
async def async_wrapper(*args, **kwargs):
return await run_in_worker_thread(partial(meth, *args, **kwargs))
async_wrapper.__name__ = name
async_wrapper.__qualname__ = self.__class__.__qualname__ + "." + name)
# cache the generated method so that self.foo is self.foo == True
setattr(self, name, async_wrapper)
return async_wrapper
else:
raise AttributeError(name)
def __dir__(self):
attrs = set(dir(self))
attrs.update(a for a in _FILE_LIKE_SYNC_ATTRS if hasattr(self, a))
attrs.update(a for a in _FILE_LIKE_ASYNC_METHODS if hasattr(self, a))
return attrs
def __repr__(self):
return "trio.wrap_file({!r})".format(self.wrapped)
async def detach(self):
ret = await run_in_worker_thread(self.wrapped.detach)
return wrap_file(ret)
async def close(self):
with _core.open_cancel_scope(shield=True):
await run_in_worker_thread(self.wrapped.close)
# run_in_worker_thread did a schedule point, but we carefully
# prevented it from doing a cancel point, so do that now.
await _core.yield_if_cancelled()
async def __aenter__(self):
return self
async def __aexit__(self, *_):
await self.close()
@_util.aiter_compat
def __aiter__(self):
return self
async def __anext__(self):
try:
return await run_in_worker_thread(next, self.wrapped)
except StopIteration:
raise AsyncStopIteration
def wrap_file(file_like):
if not hasattr(file_like, "read") and not hasattr(file_like, "write"):
raise TypeError(
"{!r} object doesn't quack like a file (no read or write method)"
.format(file_like.__class__))
return AsyncFileLikeWrapper(file_like)
async def open_file(*args, **kwargs):
do_open = partial(open, *args, **kwargs)
return wrap_file(await run_in_worker_thread(do_open)) The |
Probably.
This convention seems the clearest so far. attrs.update(a for a in _FILE_LIKE_SYNC_ATTRS if hasattr(self, a))
attrs.update(a for a in _FILE_LIKE_ASYNC_METHODS if hasattr(self, a)) I thought about doing this too before I wrote 93ccad2; if we call def __init__(self, wrapped):
…
self._available_sync_methods = [attr for attr in _FILE_LIKE_SYNC_ATTRS if hasattr(self.wrapped, attr)]
self._available_async_methods = [attr for attr in _FILE_LIKE_ASYNC_METHODS if hasattr(self.wrapped, attr)] I'll back up a few commits and try it this way (but I do like the idea of a generic-ish wrapper-generator, which I guess is what I started writing). |
Also, is |
Does freezing/caching the attribute lists like this simplify the code, though?
Yeah, this is what I was trying to say somewhere up above... the Rationale: I don't think If it makes you feel better, people will also be writing things like |
Not really, and |
@njsmith I think this is ready to merge! |
I would expect It's OK if >>> class Thing:
... def method(self): pass
...
>>> t = Thing()
>>> t.method is t.method
False
>>> t.method == t.method
True
>>> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's great to see this coming together! Comments below.
trio/_file_io/_file_io.py
Outdated
|
||
""" | ||
|
||
if isinstance(file, io.IOBase): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should enforce this type check, now that we've moved to a more duck-type-oriented way of doing the wrapping. There are lots of legacy objects out there that people think of as file-like but that predate the existence of IOBase
, and may not implement all its methods.
So I think a looser check that's just designed to catch blatant accidents would be better, e.g. the hasattr(f, "read") or hasattr(f, "write")
I used before. What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all that a file really needs is close
, because that's the only assumption AsyncIOWrapper
makes.
trio/_file_io/_file_io.py
Outdated
'write', 'writelines', | ||
# not defined in *IOBase: | ||
'readinto1', 'peek', | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's make these {sets}; it's what they really are, and also miggggght make __getattr__
faster (though this probably doesn't matter)
trio/_file_io/_path.py
Outdated
setattr(cls, attr_name, wrapper) | ||
|
||
|
||
class AsyncPath(metaclass=AsyncAutoWrapperType): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be called Path
. (Just like how it's trio.socket.socket(...)
, not trio.socket.async_socket(...)
.)
The __repr__
should probably be something like trio.Path(...)
though just to avoid that bit of possible confusion with stdlib's Path.
(Probably the simplest way to do this is to make the import at the top be import pathlib
instead of from pathlib import Path, PurePath
.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this just blindly add a thread wrapper around all methods? I don't think it makes sense to run things like with_name
in threads.
Edit: nevermind, I just noticed that we're already aware of this.
trio/_file_io/_path.py
Outdated
|
||
|
||
# not documented upstream | ||
delattr(AsyncPath.absolute, '__doc__') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this here? What does it do?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The value of trio.AsyncPath.absolute.__doc__
makes a reference to :meth:~pathlib.Path.absolute
, which does not exist. If this line were removed, building docs would fail.
trio/_file_io/_path.py
Outdated
|
||
|
||
# python3.5 compat | ||
if hasattr(os, 'PathLike'): # pragma: no cover |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why does this need pragma: no cover
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we combine coverage between 3.5/3.6+ properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like this # pragma: no cover
still needs to go?
getattr(async_file, meth_name) | ||
|
||
with pytest.raises(AttributeError): | ||
getattr(wrapped, meth_name) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't this a different way of writing that test above with the any
and the all
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No; because this test is testing getattr
while the any/all test is testing __dir__
.
trio/tests/test_file_io.py
Outdated
|
||
|
||
async def test_async_iter(async_file): | ||
async_file._wrapped = io.StringIO('test\nfoo\nbar') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Again, this would be clearer and more future-proof if you dropped the fixture and wrote `async_file = wrap_file(io.StringIO(...))'
trio/tests/test_file_io.py
Outdated
assert actual == next(expected) | ||
|
||
with pytest.raises(StopIteration): | ||
next(expected) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A less error-prone way to write this might be:
got = []
async for line in async_file:
got.append(line)
assert got == expected
trio/tests/test_file_io.py
Outdated
with pytest.raises(_core.Cancelled): | ||
await f.write('a') | ||
|
||
await f.close() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line should also raise Cancelled
, so we might as well add a with pytest.raises(...)
to assert that.
trio/tests/test_file_io.py
Outdated
detached = await async_file.detach() | ||
|
||
assert isinstance(detached, trio.AsyncIOWrapper) | ||
assert detached.wrapped == raw |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use is
, not ==
for testing object identity.
The curio way also makes sense to me. I can certainly see arguments either way. The trade-off is that with the way in this PR, tab-completion works correctly (only offers options that actually make sense for the given object) but
Personally, I would prefer not to. Fundamentally for me trio is an experiment, and the question it's trying to answer is: "can we make an API that's so much nicer than asyncio-and-friends that it can overcome the massive gap in maturity/ecosystem/mindshare?" I'm OK with the answer being "no", but I'm not OK with answer being "yes we could have made such an API, but we failed to do so because we got distracted with other things and now no-one will ever know" :-/. So I want to try hard to avoid distractions that might compromise trio's progress, like writing a new file io library for asyncio and then having to support both versions. Plus I think there might be unexpected complexities here – in particular cancellation works pretty differently between them, so |
Maybe the >>> p = configparser.ConfigParser(converters={'wololo': int})
>>> p.read_dict({'a': {'b': '123'}})
>>> p.getwololo('a', 'b')
123
>>> p['a'].getwololo('b')
123
>>> p.getwololo
functools.partial(<bound method RawConfigParser._get_conv of <configparser.ConfigParser object at 0x7f37f424c860>>, conv=<class 'int'>)
>>> p['a'].getwololo
functools.partial(<bound method SectionProxy.get of <Section: a>>, _impl=functools.partial(<bound method RawConfigParser._get_conv of <configparser.ConfigParser object at 0x7f37f424c860>>, conv=<class 'int'>))
>>> |
@njsmith I fixed everything except #180 (comment) which depends on #180 (comment) |
This collapses the _file_io and _path modules into the trio package, and moves _helpers to _util
This slightly duplicates _from_wrapped logic, but is probably cleaner overall.
This replaces incorrect use of ==/is, makes test_async_iter more obvious, and adds a missing raises assertion in test_close_cancelled.
Previously attempts to compare trio.Path with itself would fail, due to the wrapped object not supporting comparisons with trio.Path.
This also removes the _from_wrapped classmethod, and adds an adds an additional test for passing a trio path to trio open_file.
This removes the pathlib.Path(trio.Path()) test, unwraps args in trio.Path(), and manually unwraps paths inside open_file.
duck-files are now required to define either write or read in addition to close
This removes references to _wrapped, and adds all possible Path comparisons that affect trio.Path.
Instead of indiscriminately casting everything to str, including non-PathLikes, just unwrap all trio.Paths inside args.
This fixes python3.5 compatibility.
💯 |
This is an attempt to implement #20.
Todo:
_file_io
:_path
:__div__
improve Path docstringI was going to write some glossary entry onasynchronous path object
, then I realized there is no upstream concept ofpath object
, onlyPathLike
pathlib.Path
instances too add asynchronous file io and path wrappers #180 (comment)Currently supported usages: