Skip to content

Commit

Permalink
Only use xref fallback if PDFNoValidXRef is raised and fallback i…
Browse files Browse the repository at this point in the history
…s True (#684)

* check obj type

* update changelog

* Update CHANGELOG.md

* add changes

* update change

* update changelog

* Use fallback in except clause

* Update changelog.md

Co-authored-by: Pieter Marsman <pietermarsman@gmail.com>
Co-authored-by: Tony Tong <baojia.tong@kensho.com>
  • Loading branch information
3 people authored Feb 1, 2022
1 parent dc530f3 commit 4b138a6
Show file tree
Hide file tree
Showing 2 changed files with 11 additions and 8 deletions.
5 changes: 4 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,10 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
- Add handling of JPXDecode filter to enable extraction of images for some pdfs ([#645](https://github.com/pdfminer/pdfminer.six/pull/645))
- Fix extraction of jbig2 files, which was producing invalid files ([#652](https://github.com/pdfminer/pdfminer.six/pull/653))
- Crash in `pdf2txt.py --boxes-flow=disabled` ([#682](https://github.com/pdfminer/pdfminer.six/pull/682))
- Only use xref fallback if `PDFNoValidXRef` is raised and `fallback` is True ([#684](https://github.com/pdfminer/pdfminer.six/pull/684))

### Changed
- Replace warnings.warn with logging.Logger.warning in line with [recommended use](https://docs.python.org/3/howto/logging.html#when-to-use-logging) ([#673](https://github.com/pdfminer/pdfminer.six/pull/673))

## [20211012]

Expand All @@ -41,7 +45,6 @@ The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
- Support for Python 3.4 and 3.5 ([#522](https://github.com/pdfminer/pdfminer.six/pull/522))
- Unused dependency on `sortedcontainers` package ([#525](https://github.com/pdfminer/pdfminer.six/pull/525))
- Support for non-standard output streams that are not binary ([#523](https://github.com/pdfminer/pdfminer.six/pull/523))
- Replace warnings.warn with logging.Logger.warning in line with [recommended use](https://docs.python.org/3/howto/logging.html#when-to-use-logging) ([#673](https://github.com/pdfminer/pdfminer.six/pull/673))
- Dependency on typing-extensions introduced by [#661](https://github.com/pdfminer/pdfminer.six/pull/661) ([#677](https://github.com/pdfminer/pdfminer.six/pull/677))

## [20201018]
Expand Down
14 changes: 7 additions & 7 deletions pdfminer/pdfdocument.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
from . import settings
from .arcfour import Arcfour
from .pdfparser import PDFSyntaxError, PDFParser, PDFStreamParser
from .pdftypes import DecipherCallable, PDFException, PDFTypeError, PDFStream,\
from .pdftypes import DecipherCallable, PDFException, PDFTypeError, PDFStream, \
PDFObjectNotFound, decipher_all, int_value, str_value, list_value, \
uint_value, dict_value, stream_value
from .psparser import PSEOF, literal_name, LIT, KWD
Expand Down Expand Up @@ -706,12 +706,12 @@ def __init__(
pos = self.find_xref(parser)
self.read_xref_from(parser, pos, self.xrefs)
except PDFNoValidXRef:
pass # fallback = True
if fallback:
parser.fallback = True
newxref = PDFXRefFallback()
newxref.load(parser)
self.xrefs.append(newxref)
if fallback:
parser.fallback = True
newxref = PDFXRefFallback()
newxref.load(parser)
self.xrefs.append(newxref)

for xref in self.xrefs:
trailer = xref.get_trailer()
if not trailer:
Expand Down

0 comments on commit 4b138a6

Please sign in to comment.