-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dangerzone may exclude the last page of a document #560
Comments
apyrgio
added a commit
that referenced
this issue
Sep 26, 2023
Do not read a line from the command output and then check if we are at EOF, because it's possible that the writer immediately exited after writing the last line of output. Instead, switch the order of actions. This is a very serious bug that can lead to Dangerzone excluding the last page of the document. It should have bit us right from the start (see aeeed41), but it seems that the small period of time it takes the kernel to close the file descriptors was hiding this bug. Fixes #560
apyrgio
added a commit
that referenced
this issue
Sep 27, 2023
Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560
apyrgio
added a commit
that referenced
this issue
Sep 28, 2023
Do not read a line from the command output and then check if we are at EOF, because it's possible that the writer immediately exited after writing the last line of output. Instead, switch the order of actions. This is a very serious bug that can lead to Dangerzone excluding the last page of the document. It should have bit us right from the start (see aeeed41), but it seems that the small period of time it takes the kernel to close the file descriptors was hiding this bug. Fixes #560
apyrgio
added a commit
that referenced
this issue
Sep 28, 2023
Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560
apyrgio
added a commit
that referenced
this issue
Sep 28, 2023
Add a sanity check at the end of the conversion from doc to pixels, to ensure that the resulting document will have the same number of pages as the original one. Refs #560
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
Dangerzone uses a tool called
pdftoppm
in order to convert a PDF document into pixels:dangerzone/dangerzone/conversion/doc_to_pixels.py
Lines 313 to 319 in 18b73d9
The way we use
pdftoppm
is the following:pdftoppm
will write PPM files (one per page).pdftoppm
to report progress in stderr. Progress reports are lines with the following format:<page> <num_pages> <file>
pdftoppm
, which will be called with each line of the progress report as an argument./tmp/dangerzone
, which is the directory that the second phase will use to convert pixels to PDF.Bug 🐞
Let's dig deeper into how the callback handler is called. We have a generic function that reads lines from a command's output:
dangerzone/dangerzone/conversion/common.py
Lines 66 to 74 in 18b73d9
For each line, it appends it to a buffer, and then calls a callback function with the line as an argument.
What's the bug here? We first read the line, and then check if we reached EOF 🤦. So, it's possible that we read the last line of the stream, and then immediately discard it, because we have indeed reached EOF. This means that the callback handler will not be called, and we will not create the necessary files under
/tmp/dangerzone
for the last page.Impact
This bug was introduced in version 0.4.1 (aeeed41). Users of affected versions may have documents with the last page missing. This bug should not have any impact on the security of the sanitization.
During our QA we never stubled into this bug, and we don't have any report from our users hinting to such an issue. I only managed to trigger it today, while working on something that made the callback handler run twice as slow.
If you have been affected though, please let us know. In any case, we will fix this issue ASAP.
Remediation
Change the order of the checks: first check if we are at EOF, and then read the line. Note that we can't check the output of
readline()
for EOF (i.e,if line == ""
), because it will detect empty lines as EOF.The fix for this bug will be included in the upcoming 0.5.0 release.
The text was updated successfully, but these errors were encountered: