-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add API to support tracking progress while reading from a file #1759
Conversation
Hi Martin! For a moment there I thought I was on a different repos. This looks like a great addition. Would you mind updating the docs for this? A single sub-section under progress should do it. Thanks. |
Rebased against latest |
Hi @willmcgugan , any update on this? 😄 |
Codecov Report
@@ Coverage Diff @@
## master #1759 +/- ##
==========================================
- Coverage 99.46% 99.07% -0.40%
==========================================
Files 72 72
Lines 7339 7551 +212
==========================================
+ Hits 7300 7481 +181
- Misses 39 70 +31
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Hi Martin, Sorry about the delay! It's been on the backburner for a while. Love the idea, but I think we can refine the API a bit to make it easier to grok. Am I right in saying it is overloaded to either open and return a file / context manager and also to read from an open file? I suspect that will be tricky to explain to users. Could we separate those so we have What do you think? |
No worries, I also have things that have been lying around for so long and I don't always get myself up to speed, so I can definitely understand. At the moment the code will either wrap a file through a context-manager that does need to be closed, or wrap a file-like object in a context-manager that does not need to be closed, and i can understand this may be confusing. I can totally split this into two functions. |
Okay, I've split the code into two methods:
I also allowed skipping specifiying the with Progress() as progress:
# specify a total in a task beforehand
task_id = progress.add_task("Reading...", total=100)
with progress.wrap_file(file_handle, task_id=task_id) as reader:
pass
# specify a total directly
with progress.wrap_file(file_handle, total=100) as reader:
pass |
Okay, I've updated In addition, I've fixed the type annotations, and added overloaded annotations to determine the return type of |
@darrenburns Could you give this one a look through. It's a feature to wrap a file so that reading updates a progress bar. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a few small comments, really liking it overall though. Nice idea 😎
Reading from a file | ||
~~~~~~~~~~~~~~~~~~~ | ||
|
||
You can obtain a progress-tracking reader using the :meth:`~rich.progress.Progress.open` method by giving it a path. You can specify the number of bytes to be read, but by default :meth:`~rich.progress.Progress.open` will query the size of the file with :func:`os.stat`. You are responsible for closing the file, and you should consider using a *context* to make sure it is closed :: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the Progress.open
API 😎
Thanks @darrenburns ! Idea is actually coming from my Rust experience with |
Thanks for making those changes @althonos! Could you run |
Okay this should be it! I tried running |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments for your consideration 😄
rich/progress.py
Outdated
def readall(self) -> bytes: | ||
block = self.handle.readall() # type: ignore | ||
self.progress.advance(self.task, advance=len(block)) | ||
return block # type: ignore | ||
|
||
def readinto(self, b: Union[bytearray, memoryview, mmap]): # type: ignore | ||
n = self.handle.readinto(b) # type: ignore | ||
self.progress.advance(self.task, advance=n) | ||
return n | ||
|
||
def readline(self, size: int = -1) -> bytes: # type: ignore |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's the thinking behind these type: ignore
comments?
As far as I can see,self.handle
which is typed as a BinaryIO
doesn't contain readinto
or readall
(those come from RawIOBase
). mypy
s warnings seem valid to me here.
Indeed, calling readall
on the _Reader
instance will result in AttributeError: '_io.BufferedReader' object has no attribute 'readall'. Did you mean: 'readable'?
I'm wondering if these methods are required?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So as far as I've seen while doing I/O related typing in Pyfilesystem2, the I/O typing is really inconsistent between the typing
and the io
modules.
Basically, when you open a file in binary mode with open
, you get either an io.BufferedReader
or an io.FileIO
(if buffering is disabled), both of which have a readinto
method, . However when mypy
cannot determine the buffering statically, it will have open
default to returning a BinaryIO
, which doesn't have these methods, although the actual objects being returned will always have readinto
:
That's why I decided to silence mypy
here. I can definitely remove the readall
implementation, but I'd leave the readinto
, in particular because it can really improve performance in some cases by avoiding useless copies of byte data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will defer to your experience here, Martin! Regarding the readall
, it is probably better to be defensive here even if it defies the typing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's actually safer here to remove readall
because I'll expect the majority of calls to progress.open
to be buffered, so the returned io.BufferedReader
won't have a readall
method. It's also much more marginal than readinto
in my experience, so I don't think you'll get a lot complaints that _Reader
misses a readall
method because there's very limited reason to use it instead of read()
without arguments.
@althonos We added a Mypy switch recently that require the error code in any |
Thanks, @althonos ! This is a fantastic feature. Will do a release v soon. |
Don't mention it! As usual, thanks a lot both of you for the review, and very excited to use |
Still new to this API, but following some of the guides on RTDs, it looks like See: https://rich.readthedocs.io/en/latest/progress.html#basic-usage |
Type of changes
Checklist
Description
Hi Will! I've started using
rich
more and more, so far it's a great experience 😉One of my common use case for the
rich.progress
module is tracking the progress made while reading from a file. In bioinformatics it's not rare to process gigabytes of files so I like to have an idea how long of a break I can take while processing my input data!Manually updating the progress bar works for certain things, but for instance, when I want to pass a file-like handle to an API that consumes it (e.g
json.load
), there's no way to update the progress bar between each chunk. To do so, the only way I've found was to wrap the file-like object itself so that it updates the progress bar whenever the file is read from. This is something you could partially do withtqdm.tqdm.wrapattr
.Since the
rich.progress
API has no straightforward support for this (like it does withtrack
for an iterator), I made my own snippet for this, which ends up working well. I've been copying around in every project that usesrich
, so I think it would make a good addition to the library!Changes
This PR adds the
rich.progress._Reader
class, which wraps a file-like object, and updates aProgress
instance while being read from. The class itself is private; to get a reader, you have to use the newProgress.read
method, which takes either a path or a file-like object, and returns a file reader with progress support.This lets you do something like this (e.g. loading JSON-serialized data from a file, using
os.stat
to get the total number of bytes to be read):You can also directly pass a file-like object to
progress.read
, in which case you must specify the total:In addition, I added a wrapper function
rich.progress.read
likerich.progress.track
which handles setting up a progress, so that you can get a progress reader in just two lines:Notes
rich.progress.read
andProgress.read
, I madetotal
anint
instead of afloat
, because it should be a number of bytes so a float would not make sense here.seek
the_Reader
, it will also reset the position of the progress bar to the new position.