Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix regression in page layout that sometimes returned text lines out of order #659

Merged
merged 10 commits into from
Jan 26, 2022

Conversation

0xabu
Copy link
Contributor

@0xabu 0xabu commented Aug 17, 2021

This fixes issue #658: a regression in text line layout (LTLayoutContainer.group_textboxes) that caused it to merge lines of text into text boxes out of order, by merging two lines as adjacent elements even when there was a third line in between the two.

PR #315 changed the first element of each distance tuple from an int 0 or 1, where 0 was used for initial entries and 1 for subsequently-added groups, to a bool is_first where True was used for initial entries. This change broke the desired sort order, because 0 < 1 but True > False, and that was fixed by the followup commit 2bee7d8 inverting the meaning of the bool, but that commit missed adding a not where the bool (now correctly called skip_isany) was used in an if expression. The fix is just to add that not.

@0xabu
Copy link
Contributor Author

0xabu commented Aug 17, 2021

@pietermarsman @mikkkee take a look, please.

Copy link
Member

@jstockwin jstockwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This LGTM, and thanks for the detailed description of the change - very helpful!

One small change needed before this will get merged:
Could you add an entry to the CHANGELOG.md to reflect your fix?

@0xabu 0xabu requested a review from jstockwin August 18, 2021 15:37
Copy link
Member

@jstockwin jstockwin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@0xabu
Copy link
Contributor Author

0xabu commented Oct 10, 2021

The py36 build failure here is definitely not the fault of this PR. Comparing https://github.com/pdfminer/pdfminer.six/runs/3846904701 (successful build) and https://github.com/pdfminer/pdfminer.six/pull/659/checks?check_run_id=3847247616 (unsuccessful build) there seem to be different versions of Python 3.6 and pip being fetched, which in turn causes cryptography to try to build Rust code.

@pietermarsman pietermarsman merged commit 95dee8d into pdfminer:develop Jan 26, 2022
@pietermarsman
Copy link
Member

Thanks again!

@0xabu 0xabu deleted the issue658 branch January 26, 2022 18:56
Beants added a commit to HiTalentAlgorithms/pdfminer.six that referenced this pull request Feb 14, 2022
* develop:
  Check blackness in github actions (pdfminer#711)
  Changed `log.info` to  `log.debug` in six files (pdfminer#690)
  Update README.md batch for Continuous integration
  Update actions.yml so that it will run for all PR's
  Update development tools: travis ci to github actions, tox to nox, nose to pytest (pdfminer#704)
  Added feature: page labels (pdfminer#680)
  Remove obsolete returns (pdfminer#707)
  Revert "Remove obsolete returns"
  Remove obsolete returns
  Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True (pdfminer#684)
  Use logger.warn instead of warnings.warn if warning cannot be prevented by user (pdfminer#673)
  Change log.info into log.debug to make pdfinterp.py less verbose
  Fix regression in page layout that sometimes returned text lines out of order (pdfminer#659)
  export type annotations in package (pdfminer#679)
  fix typos in PR template (pdfminer#681)
  pdf2txt: clean up construction of LAParams from arguments (pdfminer#682)
  Fixes jbig2 writer to write valid jb2 files
  Add support for JPEG2000 image encoding
  Added test case for CCITTFaxDecoder (pdfminer#700)
  Attempt to handle decompression error on some broken PDF files (pdfminer#637)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants