Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor Objects::recover_stream_length #1304

Open
wants to merge 15 commits into
base: main
Choose a base branch
from
Open

Conversation

m-holger
Copy link
Contributor

@m-holger m-holger commented Nov 1, 2024

Change how the maximum stream length is calculated. For streams not in the xref table, instead of returning 0 length return the maximum length that does not overlap with a known object or xref table. For further detail see #1271.

PR is last 3 commits only, remainder is #1302 and earlier.

Make resolve resolve responsible to try again after xref reconstruction
rather than returning false to indicate failure. Rename to resolve_all.
Remove invalid objects introduced into the object table during xref table
parsing. At the moment, such invalid objects could override valid objects
if valid gen < invalid gen. Once the object table is indexed by object id
only, the invalid object will prevent valid objects with the same id
entering the object table.

The additional test case uses good10.pdf, with the dangling reference
in the trailer replaced with 3 1 R. Prior to this commit, this caused
the page object 3 0 to be masked and replaced with a null object.
Store last used id as a data member. Update when calling next_id /
inserting a new object (which only occurs in make_indirect and
get_for_json). Add new methods initialize to calculate the initial value
for last_id and last_id to query it.

Use initialize in QPDF::fixDanglingReferences, which is redundant.
Refactor make_indirect to use update_table.
Decompose into new methods update_entry and Entry::update.
Allow for the fact that when processing JSON input we cannot determine
whether references are dangling until the whole input has been processed.

We handle this by optimistically storing references in the object table.
When additional gens are encountered for the same object we store them
in the new map unconfirmed_objects. Once processing is complete we clean
up the object table and clear unconfirmed_objects.

New test cases are adapted from manual-qpdf-json.json etc.
Tests use a modified dangling-bad-xref.pdf.
Use start of next object or xref table to put an upper bound on object
length.

Tests reuse existing incremental-1.pdf. incremental-1-bad.pdf is
incremental-1.pdf with whitespace inserted after object 1 0.
Also, use break to terminate loop in Xref_table::read.
Leave input source correctly positioned to find 'endobj' next.
Copy link

codecov bot commented Nov 1, 2024

Codecov Report

Attention: Patch coverage is 92.56966% with 24 lines in your changes missing coverage. Please review.

Project coverage is 93.19%. Comparing base (54cf0e5) to head (6eb5d0d).

Files with missing lines Patch % Lines
libqpdf/QPDF_objects.cc 92.94% 18 Missing ⚠️
libtests/xref.cc 61.53% 5 Missing ⚠️
libqpdf/qpdf/QPDF_objects.hh 95.83% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1304      +/-   ##
==========================================
- Coverage   93.22%   93.19%   -0.03%     
==========================================
  Files         308      309       +1     
  Lines       33795    33933     +138     
  Branches     4117     4152      +35     
==========================================
+ Hits        31504    31625     +121     
- Misses       2291     2308      +17     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@m-holger m-holger force-pushed the recover branch 2 times, most recently from 09366bd to 640676a Compare November 1, 2024 16:22
Change how the maximum stream length is calculated. For streams not in the
xref table, instead of returning 0 length return the maximum length that
does not overlap with a known object or xref table.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant