Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ordering of grouping textboxes. The first grouping of textboxes should be skipped if there are intermediate textboxes. #335

Merged
merged 6 commits into from
Nov 10, 2019

Conversation

pietermarsman
Copy link
Member

@pietermarsman pietermarsman commented Nov 9, 2019

Description

#315 introduces a bug by replacing 0 by True and 1 by False in the ordening of textbox distances. Consequently, all the text boxes that are checked once (for intermediate textboxes) have priority over those that are never checked. Therefore it merges the same text box over and over again.

@mikkkee, can you have a look at this?

Fixes #334

How Has This Been Tested?

Checklist

  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the README.md and other documentation, or I am sure that this is not necessary
  • I have added a consice human-readable description of the change to CHANGELOG.md
  • I have added docstrings to newly created methods and classes
  • I have optimized the code at least one time after creating the initial version

@mikkkee
Copy link
Contributor

mikkkee commented Nov 10, 2019

After figuring out what has happened, I think this PR is fine. I can confirm the speed up is still there.

The current develop branch failed using my test script in #315 with RecursionError: maximum recursion depth exceeded. Shall we add my 01.pdf test as an unit test?

@pietermarsman pietermarsman added type: bug component: converter Related to any PDFLayoutAnalyzer labels Nov 10, 2019
@pietermarsman pietermarsman merged commit 2bee7d8 into develop Nov 10, 2019
@pietermarsman pietermarsman deleted the fix-layout-merge-order branch November 17, 2019 18:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: converter Related to any PDFLayoutAnalyzer type: bug
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Version pdfminer.six 20191107 incorrectly orders some text
2 participants