Skip to content

Releases: pdfminer/pdfminer.six

20200517

17 May 15:56
Compare
Choose a tag to compare

Added

  • Python3 shebang line to script in tools (408

Fixed

  • Fix ordering of textlines within a textbox when boxes_flow=None (#411)

20200402

01 Apr 19:45
Compare
Choose a tag to compare

Added

  • Allow boxes_flow LAParam to be passed as None, validate the input, and update documentation (#395)
  • Also accept file-like objects in high level functions extract_text and extract_pages (#392)

Fixed

  • Text no longer comes in reverse order when advanced layout analysis is disabled (#398)
  • Updated misleading documentation for word_margin and char_margin (#407)
  • Ignore ValueError when converting font encoding differences (#389)
  • Grouping of text lines outside of parent container bounding box (#386)

Changed

  • Group text lines if they are centered (#382)

20200124

24 Jan 11:42
Compare
Choose a tag to compare

Security

  • Removed samples/issue-00152-embedded-pdf.pdf because it contains a possible security thread; a javascript enabled object (#364)

20200121

21 Jan 21:11
Compare
Choose a tag to compare

Fixed

  • Interpret two's complement integer as unsigned integer (#352)
  • Fix font name in html output such that it is recognized by browser (#357)
  • Compute correct font height by removing scaling with font bounding box height (#348)
  • KeyError when extracting embedded files and a Unicode file specification is missing (#338)

Removed

  • The command-line utility latin2ascii.py (#360)

20200104

16 Jan 21:57
Compare
Choose a tag to compare

Removed

  • Support for Python 2 (#346)

Changed

  • Enforce pep8 coding style by adding flake8 to CI (#345)

20191110

16 Jan 21:56
Compare
Choose a tag to compare

Fixed

  • Wrong order of text box grouping introduced by PR #315 (#335)

20191107

16 Jan 21:54
Compare
Choose a tag to compare

Deprecated

  • The argument _py2_no_more_posargs because Python2 is removed on January
    , 2020 (#328 and
    #307)

Added

  • Simple wrapper to easily extract text from a PDF file #330
  • Support for extracting JBIG2 encoded images (#311 and #46)
  • Sphinx documentation that is published on
    Read the Docs
    (#329)

Fixed

  • Unhandled AssertionError when dumping pdf containing reference to object id 0
    (#318)
  • Debug flag actually changes logging level to debug for pdf2txt.py and
    dumppdf.py (#325)

Changed

  • Using argparse instead of getopt for command line interface of dumppdf.py (#321)
  • Refactor LTLayoutContainer.group_textboxes for a significant speed up in layout analysis (#315)

Removed

  • Files for external applications such as django, cgi and pyinstaller (#314)

20191020

20 Oct 13:02
a5a34d5
Compare
Choose a tag to compare

Deprecated

  • Support for Python 2 is dropped at January 1st, 2020 (#307)

Added

  • Contribution guidelines in CONTRIBUTING.md (#259)
  • Support new encodings OneByteEncoding and DLIdent for CMaps (#283)

Fixed

  • Use six.iteritems() instead of dict().iteritems() to ensure Python2 and Python3 compatibility (#274)
  • Properly convert Adobe Glyph names to unicode characters (#263)
  • Allow CMap to be a content stream (#283)
  • Resolve indirect objects for width and bounding boxes for fonts (#273)
  • Actually updating stroke color in graphic state (#298)
  • Interpret (invalid) negative font descent as a positive descent (#203)
  • Correct colorspace comparision for images (#132)
  • Allow for bounding boxes with zero height or width by removing assertion (#246)

Changed

  • All dependencies are managed in setup.py (#306, #219)

20181108

08 Nov 17:50
Compare
Choose a tag to compare

Changed

  • Speedup layout analysis (#141)
  • Use argparse instead of replace deprecated getopt (#173)
  • Allow pdfminer.six to be compiled with cython (#142)