Fix Page.close() regression + docs h/t @luketudge

See #1042 - Re-adds `Page.close()` method - Was accidentally removed in 9587cc7 - Makes `PDF.close()` close all pages as well - Improves relevant documentation
jsvine · Nov 9, 2023 · ba58e16 · ba58e16
1 parent 6437b72
commit ba58e16
Show file tree

Hide file tree

Showing 4 changed files with 15 additions and 1 deletion.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -7,6 +7,7 @@ All notable changes to this project will be documented in this file. The format
 ### Added
 
 - Add "gswin64c" as another possible Ghostscript executable in `repair.py` (h/t @echedey-ls). ([#1032](https://github.com/jsvine/pdfplumber/issues/1030))
+- Re-add `Page.close()` method, have `PDF.close()` close all pages as well, and improve relevant documentation (h/t @luketudge). ([#1042](https://github.com/jsvine/pdfplumber/issues/1042))
 
 ### Fixed
 

diff --git a/README.md b/README.md
@@ -96,7 +96,7 @@ The top-level `pdfplumber.PDF` class represents a single PDF and has two main pr
 
 | Method | Description |
 |--------|-------------|
-|`.close()`| By default, `Page` objects cache their layout and object information to avoid having to reprocess it. When parsing large PDFs, however, these cached properties can require a lot of memory. You can use this method to flush the cache and release the memory. (In version `<= 0.5.25`, use `.flush_cache()`.)|
+|`.close()`| Calling this method calls `Page.close()` on each page, and also closes the file stream (except in cases when the stream is external, i.e., already opened and passed directly to `pdfplumber`). |
 
 ### The `pdfplumber.Page` class
 
@@ -118,6 +118,12 @@ The `pdfplumber.Page` class is at the core of `pdfplumber`. Most things you'll d
 |`.outside_bbox(bounding_box, relative=False, strict=True)`| Similar to `.crop` and `.within_bbox`, but only retains objects that fall *entirely outside* the bounding box.|
 |`.filter(test_function)`| Returns a version of the page with only the `.objects` for which `test_function(obj)` returns `True`.|
 
+... and also has the following method:
+
+| Method | Description |
+|--------|-------------|
+|`.close()`| By default, `Page` objects cache their layout and object information to avoid having to reprocess it. When parsing large PDFs, however, these cached properties can require a lot of memory. You can use this method to flush the cache and release the memory.|
+
 Additional methods are described in the sections below:
 
 - [Visual debugging](#visual-debugging)

diff --git a/pdfplumber/page.py b/pdfplumber/page.py
@@ -231,6 +231,9 @@ def __init__(
         # https://rednafi.com/python/lru_cache_on_methods/
         self.get_textmap = textmap_cacher(self._get_textmap)
 
+    def close(self) -> None:
+        self.flush_cache()
+
     @property
     def width(self) -> T_num:
         return self.bbox[2] - self.bbox[0]

diff --git a/pdfplumber/pdf.py b/pdfplumber/pdf.py
@@ -108,6 +108,10 @@ def open(
 
     def close(self) -> None:
         self.flush_cache()
+
+        for page in self.pages:
+            page.close()
+
         if not self.stream_is_external:
             self.stream.close()