Skip to content

Commit

Permalink
Fix Page.close() regression + docs h/t @luketudge
Browse files Browse the repository at this point in the history
See #1042

- Re-adds `Page.close()` method
    - Was accidentally removed in 9587cc7
- Makes `PDF.close()` close all pages as well
- Improves relevant documentation
  • Loading branch information
jsvine committed Nov 9, 2023
1 parent 6437b72 commit ba58e16
Show file tree
Hide file tree
Showing 4 changed files with 15 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ All notable changes to this project will be documented in this file. The format
### Added

- Add "gswin64c" as another possible Ghostscript executable in `repair.py` (h/t @echedey-ls). ([#1032](https://github.com/jsvine/pdfplumber/issues/1030))
- Re-add `Page.close()` method, have `PDF.close()` close all pages as well, and improve relevant documentation (h/t @luketudge). ([#1042](https://github.com/jsvine/pdfplumber/issues/1042))

### Fixed

Expand Down
8 changes: 7 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ The top-level `pdfplumber.PDF` class represents a single PDF and has two main pr

| Method | Description |
|--------|-------------|
|`.close()`| By default, `Page` objects cache their layout and object information to avoid having to reprocess it. When parsing large PDFs, however, these cached properties can require a lot of memory. You can use this method to flush the cache and release the memory. (In version `<= 0.5.25`, use `.flush_cache()`.)|
|`.close()`| Calling this method calls `Page.close()` on each page, and also closes the file stream (except in cases when the stream is external, i.e., already opened and passed directly to `pdfplumber`). |

### The `pdfplumber.Page` class

Expand All @@ -118,6 +118,12 @@ The `pdfplumber.Page` class is at the core of `pdfplumber`. Most things you'll d
|`.outside_bbox(bounding_box, relative=False, strict=True)`| Similar to `.crop` and `.within_bbox`, but only retains objects that fall *entirely outside* the bounding box.|
|`.filter(test_function)`| Returns a version of the page with only the `.objects` for which `test_function(obj)` returns `True`.|

... and also has the following method:

| Method | Description |
|--------|-------------|
|`.close()`| By default, `Page` objects cache their layout and object information to avoid having to reprocess it. When parsing large PDFs, however, these cached properties can require a lot of memory. You can use this method to flush the cache and release the memory.|

Additional methods are described in the sections below:

- [Visual debugging](#visual-debugging)
Expand Down
3 changes: 3 additions & 0 deletions pdfplumber/page.py
Original file line number Diff line number Diff line change
Expand Up @@ -231,6 +231,9 @@ def __init__(
# https://rednafi.com/python/lru_cache_on_methods/
self.get_textmap = textmap_cacher(self._get_textmap)

def close(self) -> None:
self.flush_cache()

@property
def width(self) -> T_num:
return self.bbox[2] - self.bbox[0]
Expand Down
4 changes: 4 additions & 0 deletions pdfplumber/pdf.py
Original file line number Diff line number Diff line change
Expand Up @@ -108,6 +108,10 @@ def open(

def close(self) -> None:
self.flush_cache()

for page in self.pages:
page.close()

if not self.stream_is_external:
self.stream.close()

Expand Down

0 comments on commit ba58e16

Please sign in to comment.