You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The documentation for boxes_flow of LAParams (here) reads:
Specifies how much a horizontal and vertical position of a text matters when determining the order of text boxes. The value should be within the range of -1.0 (only horizontal position matters) to +1.0 (only vertical position matters).
if -1 <= laparams.boxes_flow and laparams.boxes_flow <= +1 \
and textboxes:
# Code to do the full layout analysis, adding groups etc
else:
def getkey(box):
if isinstance(box, LTTextBoxVertical):
return (0, -box.x1, box.y0)
else:
return (1, box.y0, box.x0)
textboxes.sort(key=getkey)
From the code, it seems that it is possible to set boxes_flow outside the range [-1.0, 1.0]. At the moment, any value outside this range will return boxes in an order based on their position. Unfortunately, you can't currently pass boxes_flow as None because you can't do None <= 1. Therefore, to disable this you essentially have to set boxes_flow to e.g. 2.
(Aside: This order is perhaps a bit strange when there are both horizontal and vertical text boxes on the page. I'd be temped to choose a scheme, perhaps depending on if there are more horizontal or vertical boxes on the page, but that's a different issue and probably a breaking change.)
Suggestions:
The documentation should be updated to explain that this can be disabled.
I think it would be nicer if the value for "disabled" was None, rather than any value in [-1, 1].
If the valid range is [-1, 1] perhaps this should be validated, throwing an exception if this isn't the case.
The text was updated successfully, but these errors were encountered:
* Update documentation for boxes_flow, allow None
* Apply comments from code review
* Small wording changes, remove unnecessary comment
* Update boxes_flow documentation for pdf2text
* Pin version of tox to ensure python 3.4 support
The documentation for
boxes_flow
of LAParams (here) reads:whereas the code (here) does the following
From the code, it seems that it is possible to set
boxes_flow
outside the range[-1.0, 1.0]
. At the moment, any value outside this range will return boxes in an order based on their position. Unfortunately, you can't currently passboxes_flow
asNone
because you can't doNone <= 1
. Therefore, to disable this you essentially have to setboxes_flow
to e.g.2
.(Aside: This order is perhaps a bit strange when there are both horizontal and vertical text boxes on the page. I'd be temped to choose a scheme, perhaps depending on if there are more horizontal or vertical boxes on the page, but that's a different issue and probably a breaking change.)
Suggestions:
None
, rather than any value in[-1, 1]
.[-1, 1]
perhaps this should be validated, throwing an exception if this isn't the case.The text was updated successfully, but these errors were encountered: