Skip to content

Commit

Permalink
ENH: Text Extraction improvements (#969)
Browse files Browse the repository at this point in the history
* Improvements around /Encoding / /ToUnicode
* Extraction of CMaps improved
* Fallback for font def missing
* Support for /Identity-H and /Identity-V: utf-16-be
* Support for /GB-EUC-H / /GB-EUC-V / GBp/c-EUC-H / /GBpc-EUC-V (beta release for evaluation)
* Arabic (for evaluation)
* Whitespace extraction improvements
  • Loading branch information
pubpub-zz authored Jun 13, 2022
1 parent 9c4e7f5 commit 72fcaae
Show file tree
Hide file tree
Showing 12 changed files with 2,720 additions and 1,980 deletions.
2 changes: 1 addition & 1 deletion PyPDF2/__init__.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
from ._merger import PdfFileMerger, PdfMerger
from ._page import Transformation, PageObject
from ._page import PageObject, Transformation
from ._reader import DocumentInformation, PdfFileReader, PdfReader
from ._version import __version__
from ._writer import PdfFileWriter, PdfWriter
Expand Down
Loading

0 comments on commit 72fcaae

Please sign in to comment.