-
Notifications
You must be signed in to change notification settings - Fork 688
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Some utility methods for logical structure #1095
Conversation
2783a19
to
b443b4d
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #1095 +/- ##
=========================================
Coverage 100.00% 100.00%
=========================================
Files 19 19
Lines 1928 1996 +68
=========================================
+ Hits 1928 1996 +68 ☔ View full report in Codecov by Sentry. |
I haven't played around with this yet, but it seems like a reasonable idea and doesn't interfere with core
If I'm understanding correctly, this this question pertains to flipping the vertical coordinates, so that Lines 399 to 402 in 1ad3905
... and then use |
I think maybe I'll add a companion / convenience method
The issue is a bit more complicated because when you crop a page, all of the object coordinates go through the I can probably hack something up so this minimally works but we might want to refactor it at some point. |
Should be ready to merge now! I didn't realize that cropping a page doesn't actually translate the coordinates of the objects, it just clips them to the new bounding box - nonetheless, this didn't work right for structure elements with BBox attributes, and now it does. |
Thanks, now merged! (And correct re. the non-translation of coordinates.) |
It's useful to be able to search in the structure tree - this has to be done from the
PDFStructTree
object itself since we return a dictionary fromstructure_tree
in keeping with the general way ofpdfplumber
.Also to get a BBox from an element for visual debugging - note the FIXME, if you play games with cropped pages, this will fail, but in general that's unlikely, you would have to do something like:
and then try to get the BBox of an element where it is explicitly specified in the attributes of that element (usually this is only the case for
Figure
andTable
).Is there a general method to properly transform PDF BBoxes into
pdfplumber
ones for a page?