Add API to support tracking progress while reading from a file #1759

althonos · 2021-12-15T16:18:18Z

Type of changes

Checklist

I've run the latest black with default args on new code.
I've updated CHANGELOG.md and CONTRIBUTORS.md where appropriate.
I've added tests for new code.
I accept that @willmcgugan may be pedantic in the code review.

Description

Hi Will! I've started using rich more and more, so far it's a great experience 😉

One of my common use case for the rich.progress module is tracking the progress made while reading from a file. In bioinformatics it's not rare to process gigabytes of files so I like to have an idea how long of a break I can take while processing my input data!

Manually updating the progress bar works for certain things, but for instance, when I want to pass a file-like handle to an API that consumes it (e.g json.load), there's no way to update the progress bar between each chunk. To do so, the only way I've found was to wrap the file-like object itself so that it updates the progress bar whenever the file is read from. This is something you could partially do with tqdm.tqdm.wrapattr.

Since the rich.progress API has no straightforward support for this (like it does with track for an iterator), I made my own snippet for this, which ends up working well. I've been copying around in every project that uses rich, so I think it would make a good addition to the library!

Changes

This PR adds the rich.progress._Reader class, which wraps a file-like object, and updates a Progress instance while being read from. The class itself is private; to get a reader, you have to use the new Progress.read method, which takes either a path or a file-like object, and returns a file reader with progress support.

This lets you do something like this (e.g. loading JSON-serialized data from a file, using os.stat to get the total number of bytes to be read):

import json
from rich.progress import *

with Progress(columns=[BarColumn(), DownloadColumn()]) as progress:
    with progress.read("data.json") as f:
        data = json.load(f)

You can also directly pass a file-like object to progress.read, in which case you must specify the total:

import json
from rich.progress import *

with Progress(columns=[BarColumn(), DownloadColumn()]) as progress:
    with open("data.json", "rb") as f:
        data = json.load(progress.read(f, total=2048))

In addition, I added a wrapper function rich.progress.read like rich.progress.track which handles setting up a progress, so that you can get a progress reader in just two lines:

with rich.progress.read("data.json") as f:
    data = json.load(f)

Notes

In rich.progress.read and Progress.read, I made total an int instead of a float, because it should be a number of bytes so a float would not make sense here.
If you seek the _Reader, it will also reset the position of the progress bar to the new position.

willmcgugan · 2021-12-16T15:19:58Z

Hi Martin! For a moment there I thought I was on a different repos.

This looks like a great addition. Would you mind updating the docs for this? A single sub-section under progress should do it.

Thanks.

althonos · 2022-01-23T15:50:39Z

Rebased against latest master.

…g a file

…erly

althonos · 2022-03-13T11:56:13Z

Hi @willmcgugan , any update on this? 😄

codecov-commenter · 2022-03-13T15:47:30Z

Codecov Report

Merging #1759 (5717d8d) into master (7105781) will decrease coverage by 0.39%.
The diff coverage is 85.44%.

@@            Coverage Diff             @@
##           master    #1759      +/-   ##
==========================================
- Coverage   99.46%   99.07%   -0.40%     
==========================================
  Files          72       72              
  Lines        7339     7551     +212     
==========================================
+ Hits         7300     7481     +181     
- Misses         39       70      +31

Flag	Coverage Δ
unittests	`99.07% <85.44%> (-0.40%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
rich/__main__.py	`100.00% <ø> (ø)`
rich/progress.py	`93.51% <77.69%> (-4.92%)`	⬇️
rich/console.py	`99.21% <100.00%> (+0.06%)`	⬆️
rich/terminal_theme.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 9f43ccc...5717d8d. Read the comment docs.

willmcgugan · 2022-03-13T16:11:02Z

Hi Martin,

Sorry about the delay! It's been on the backburner for a while.

Love the idea, but I think we can refine the API a bit to make it easier to grok.

Am I right in saying it is overloaded to either open and return a file / context manager and also to read from an open file?

I suspect that will be tricky to explain to users. Could we separate those so we have rich.progress.open and a rich.progress.wrap_file method (plus equivalent on Progress). I'd also like to support text-files without additional wrapping, as I'm sure that will be a common use case.

What do you think?

althonos · 2022-03-13T16:16:11Z

No worries, I also have things that have been lying around for so long and I don't always get myself up to speed, so I can definitely understand.

At the moment the code will either wrap a file through a context-manager that does need to be closed, or wrap a file-like object in a context-manager that does not need to be closed, and i can understand this may be confusing. I can totally split this into two functions.

…ethods

althonos · 2022-03-13T17:41:20Z

Okay, I've split the code into two methods:

Progress.open, which takes a path and returns a file, it accepts a mode argument to open in text or binary mode, as well as an encoding argument like the builtin open method.
Progress.wrap_file which wraps a file-like object.

I also allowed skipping specifiying the total provided that a task_id is given which refers to a task that already had a total set; this way you can do both:

with Progress() as progress:
    # specify a total in a task beforehand
    task_id = progress.add_task("Reading...", total=100)
    with progress.wrap_file(file_handle, task_id=task_id) as reader:
        pass
    # specify a total directly
    with progress.wrap_file(file_handle, total=100) as reader:
        pass

…task

althonos · 2022-03-13T18:47:39Z

Okay, I've updated rich.progress.open and Progress.open to have a signature similar to open (supporting the buffering, newline, encoding and errors arguments).

In addition, I've fixed the type annotations, and added overloaded annotations to determine the return type of rich.progress.open and Progress.open based on the mode (TextIO if "r" or "rt", BinaryIO if "rb").

…file`

willmcgugan · 2022-03-15T11:09:41Z

@darrenburns Could you give this one a look through. It's a feature to wrap a file so that reading updates a progress bar.

darrenburns

Made a few small comments, really liking it overall though. Nice idea 😎

darrenburns · 2022-03-17T15:44:10Z

docs/source/progress.rst

+Reading from a file
+~~~~~~~~~~~~~~~~~~~
+
+You can obtain a progress-tracking reader using the :meth:`~rich.progress.Progress.open` method by giving it a path. You can specify the number of bytes to be read, but by default :meth:`~rich.progress.Progress.open` will query the size of the file with :func:`os.stat`. You are responsible for closing the file, and you should consider using a *context* to make sure it is closed ::


I like the Progress.open API 😎

rich/progress.py

althonos · 2022-03-21T12:36:10Z

Thanks @darrenburns ! Idea is actually coming from my Rust experience with indicatif, where you can a progress bar for reading a file with a single function call 😄

darrenburns · 2022-03-21T12:42:05Z

Thanks for making those changes @althonos! Could you run make format and make typecheck to ensure CI passes?

althonos · 2022-03-21T13:24:17Z

Okay this should be it! I tried running make typecheck but i get an error in rich/_win32_console.py, not in rich.progress, I don't know if that's caused by my PR or not.

darrenburns

A few more comments for your consideration 😄

rich/progress.py

docs/source/progress.rst

darrenburns · 2022-03-30T09:24:59Z

rich/progress.py

+    def readall(self) -> bytes:
+        block = self.handle.readall()  # type: ignore
+        self.progress.advance(self.task, advance=len(block))
+        return block  # type: ignore
+
+    def readinto(self, b: Union[bytearray, memoryview, mmap]):  # type: ignore
+        n = self.handle.readinto(b)  # type: ignore
+        self.progress.advance(self.task, advance=n)
+        return n
+
+    def readline(self, size: int = -1) -> bytes:  # type: ignore


What's the thinking behind these type: ignore comments?

As far as I can see,self.handle which is typed as a BinaryIO doesn't contain readinto or readall (those come from RawIOBase). mypys warnings seem valid to me here.

Indeed, calling readall on the _Reader instance will result in AttributeError: '_io.BufferedReader' object has no attribute 'readall'. Did you mean: 'readable'?

I'm wondering if these methods are required?

So as far as I've seen while doing I/O related typing in Pyfilesystem2, the I/O typing is really inconsistent between the typing and the io modules.

Basically, when you open a file in binary mode with open, you get either an io.BufferedReader or an io.FileIO (if buffering is disabled), both of which have a readinto method, . However when mypy cannot determine the buffering statically, it will have open default to returning a BinaryIO, which doesn't have these methods, although the actual objects being returned will always have readinto:

https://github.com/python/typeshed/blob/9687d53b65d055a990f0187b7baa1137cb7c717b/stdlib/builtins.pyi#L1331-L1341

That's why I decided to silence mypy here. I can definitely remove the readall implementation, but I'd leave the readinto, in particular because it can really improve performance in some cases by avoiding useless copies of byte data.

Will defer to your experience here, Martin! Regarding the readall, it is probably better to be defensive here even if it defies the typing.

I think it's actually safer here to remove readall because I'll expect the majority of calls to progress.open to be buffered, so the returned io.BufferedReader won't have a readall method. It's also much more marginal than readinto in my experience, so I don't think you'll get a lot complaints that _Reader misses a readall method because there's very limited reason to use it instead of read() without arguments.

…pen`

… one

willmcgugan · 2022-03-30T12:57:37Z

@althonos We added a Mypy switch recently that require the error code in any type: ignore, which explains the CI fail.

willmcgugan · 2022-03-30T14:18:57Z

Thanks, @althonos ! This is a fantastic feature. Will do a release v soon.

althonos · 2022-03-30T14:22:51Z

Don't mention it! As usual, thanks a lot both of you for the review, and very excited to use Progress.open as soon as it's out there 😉

astrochun · 2022-04-10T00:39:47Z

Still new to this API, but following some of the guides on RTDs, it looks like from rich.progress import read is no longer supported (use open instead). Is that right?

See: https://rich.readthedocs.io/en/latest/progress.html#basic-usage

althonos marked this pull request as ready for review December 16, 2021 14:11

willmcgugan self-assigned this Jan 5, 2022

althonos added 12 commits March 13, 2022 12:54

Add Progress.read and rich.progress.read to track progress readin…

cf91bd4

…g a file

Update documentation of rich.progress.Progress.read

5b2a4a7

Update Progress.read to close inner file only if given a path

a5fa231

Add tests to tests.test_progress to make sure files are closed prop…

d6f61d8

…erly

Add missing type annotations to rich.progress._Reader

5cfa203

Add an example showing how to copy a file with a progress bar

1432457

Ignore type hint in rich.progress._Reader.readall

46576a9

Reformat new progress code with black

97da885

Fix additional typing issues in rich.progress

5e8da24

Reformat rich.progress and document Progress.read in progress.rst

f3f1bca

Fix os.PathLike type annotations in rich.progress

a270e60

Remove unused import from rich.progress

3008780

Split Progress.read into Progress.open and Progress.wrap_file m…

1320046

…ethods

althonos added 3 commits March 13, 2022 18:45

Add test to ensure Progress.wrap_file can extract the total from a …

fc341f7

…task

Make Progress.open have a signature compatible with open

8663073

Add overloaded annotations for Progress.open and rich.progress.open

2b8ec3b

althonos added 2 commits March 13, 2022 19:48

Update CHANGELOG.md to show the Progress.open and `Progress.wrap_…

0406578

…file`

Merge branch 'master' into progress-reader

2e61862

Merge branch 'master' into progress-reader

b736f8a

darrenburns requested changes Mar 21, 2022

View reviewed changes

althonos added 2 commits March 21, 2022 13:29

Remove unused update_period argument from rich.progress

2e77302

Fix failing tests in tests.test_progress

aa5ca60

Reformat code in rich.progress

130f5b4

althonos requested a review from darrenburns March 22, 2022 10:41

darrenburns requested changes Mar 30, 2022

View reviewed changes

althonos and others added 5 commits March 30, 2022 13:22

Force keyword arguments for Progress.open options not in builtin `o…

5ea3032

…pen`

Update example for wrap_file in rich.progress API documentation

54d660d

Force keyword arguments for Progress.wrap_file arguments after total

bb9790e

Remove _Reader.readall method since wrapped object may not all have…

76953e2

… one

Merge branch 'master' into progress-reader

7a870e4

Add ignore codes for mypy in rich.progress

5717d8d

willmcgugan approved these changes Mar 30, 2022

View reviewed changes

willmcgugan merged commit 3f61975 into Textualize:master Mar 30, 2022

astrochun mentioned this pull request Apr 20, 2022

[BUG] from rich.progress import read should be removed from the docs and updated with rich.progress.open example #2204

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add API to support tracking progress while reading from a file #1759

Add API to support tracking progress while reading from a file #1759

althonos commented Dec 15, 2021 •

edited

Loading

willmcgugan commented Dec 16, 2021

althonos commented Jan 23, 2022

althonos commented Mar 13, 2022

codecov-commenter commented Mar 13, 2022 •

edited

Loading

willmcgugan commented Mar 13, 2022

althonos commented Mar 13, 2022

althonos commented Mar 13, 2022

althonos commented Mar 13, 2022

willmcgugan commented Mar 15, 2022

darrenburns left a comment •

edited

Loading

darrenburns Mar 17, 2022

althonos commented Mar 21, 2022

darrenburns commented Mar 21, 2022

althonos commented Mar 21, 2022

darrenburns left a comment •

edited

Loading

darrenburns Mar 30, 2022

althonos Mar 30, 2022

willmcgugan Mar 30, 2022

althonos Mar 30, 2022

willmcgugan commented Mar 30, 2022

willmcgugan commented Mar 30, 2022

althonos commented Mar 30, 2022

astrochun commented Apr 10, 2022

Add API to support tracking progress while reading from a file #1759

Add API to support tracking progress while reading from a file #1759

Conversation

althonos commented Dec 15, 2021 • edited Loading

Type of changes

Checklist

Description

Changes

Notes

willmcgugan commented Dec 16, 2021

althonos commented Jan 23, 2022

althonos commented Mar 13, 2022

codecov-commenter commented Mar 13, 2022 • edited Loading

Codecov Report

willmcgugan commented Mar 13, 2022

althonos commented Mar 13, 2022

althonos commented Mar 13, 2022

althonos commented Mar 13, 2022

willmcgugan commented Mar 15, 2022

darrenburns left a comment • edited Loading

Choose a reason for hiding this comment

darrenburns Mar 17, 2022

Choose a reason for hiding this comment

althonos commented Mar 21, 2022

darrenburns commented Mar 21, 2022

althonos commented Mar 21, 2022

darrenburns left a comment • edited Loading

Choose a reason for hiding this comment

darrenburns Mar 30, 2022

Choose a reason for hiding this comment

althonos Mar 30, 2022

Choose a reason for hiding this comment

willmcgugan Mar 30, 2022

Choose a reason for hiding this comment

althonos Mar 30, 2022

Choose a reason for hiding this comment

willmcgugan commented Mar 30, 2022

willmcgugan commented Mar 30, 2022

althonos commented Mar 30, 2022

astrochun commented Apr 10, 2022

althonos commented Dec 15, 2021 •

edited

Loading

codecov-commenter commented Mar 13, 2022 •

edited

Loading

darrenburns left a comment •

edited

Loading

darrenburns left a comment •

edited

Loading