Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix text coming in reverse order with boxes flow disabled #399

Merged
merged 1 commit into from
Apr 1, 2020

Conversation

jstockwin
Copy link
Member

Closes #398

I've got a TODO in the code to change after #396 is merged, so I'll keep this as a draft PR for now.

Example PDF

Code to reproduce:

from pdfminer.high_level import extract_text
from pdfminer.layout import LAParams

laparams = LAParams(boxes_flow=3, line_margin=0.001)
print(extract_text("test.pdf", laparams=laparams))

(Note that once #396 is merged we'll be able to do boxes_flow=None, I'm passing 3 simply to disable it)

Output before fix:

Text 4
Text 3
Long Text 1
Text 2
Text 1

Output after fix:

Text 1
Text 2
Long Text 1
Text 3
Text 4

Checklist

  • I have added tests that prove my fix is effective or that my feature
    works
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial
    version
  • I have updated the README.md or I am verified that this
    is not necessary
  • I have updated the readthedocs documentation or I
    verified that this is not necessary
  • I have added a consice human-readable description of the change to
    CHANGELOG.md

Copy link
Member

@pietermarsman pietermarsman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the docs, section 4.2.1. Coordinate Space:

The positive x axis extends horizontally to the right and the positive y axis vertically upward, as in standard mathematical practice

Lets wait with merging this until #396 is merged.

@jstockwin jstockwin marked this pull request as ready for review March 27, 2020 08:55
@jstockwin
Copy link
Member Author

@pietermarsman I've rebased this now that 396 is merged, so should be good to go now

@pietermarsman pietermarsman merged commit 68e2ae8 into pdfminer:develop Apr 1, 2020
@jstockwin jstockwin deleted the fix-reversed-text branch April 1, 2020 11:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Disabling boxes_flow results in text coming in reverse order
2 participants