Added support for Paeth PNG filter compression (predictor value = 4) #537

edugonza · 2020-10-27T11:30:44Z

Pull request

Fixes #339 Added support for Paeth PNG compression filter (predictor value = 4). Fixed bugs in other filters.

How Has This Been Tested?

I have tested this with a private PDF file that I am not allowed to share.

Checklist

I have added tests that prove my fix is effective or that my feature
works
I have added docstrings to newly created methods and classes
I have optimized the code at least one time after creating the initial
version
I have updated the README.md or I am verified that this
is not necessary
I have updated the readthedocs documentation or I
verified that this is not necessary
I have added a consice human-readable description of the change to
CHANGELOG.md

cslotboom · 2020-11-26T00:00:59Z

Kind of random, but I was having issues with reading certain pdfs. Implementing your code in this patch fixed my problem completely and, as far as I can tell, I've had no errors since including it. Looking forward for this being integrated into the base package!

pietermarsman

Hi @edugonza,

Thanks for opening this PR and taking the time to improve pdfminer.six.

The code of the PR looks good. I did not check if the actual implementation of the type 4 predictor matches the libpng specification. But since you added the link to the spec yourself I think it's ok.

One request: can add a test case with a pdf with this type of path predictor? I know yours is proprietary but perhaps you can find an example on the internet or even generate one.

pietermarsman · 2021-02-13T13:50:32Z

@cslotboom is it possible to share your pdf with a type 4 png path predictor?

edugonza · 2021-02-22T15:58:02Z

Hi @pietermarsman,

Unfortunately, I was not able to create a PDF with a PNG encoded with this type of predictor. I do not have any open/free document to use. If you know or have documentation on how to create such a document, I could try again.

Best regards
Edu

cslotboom · 2021-02-23T09:32:22Z

I'm not 100% I have an example that uses this type of predictor explicitly, I just know that this code has fixed my errors.

I'm using pdfminer to read lines in PDF drawings. From what I've found, any PDF with lines in it has a risk of not working without this patch. I suspect that the scale/rounding of the pdf had something to do with it, as it seemed random which pdfs worked and which didn't.
I can do my best to dig something up this weekend.

fgregg · 2021-07-29T12:45:40Z

this also fixed issues for me. thanks @edugonza!

cslotboom · 2021-08-03T03:38:09Z

For reference, this is roughly the type of pdf that would fail without this patch. Can not say for certain if it has a 4 png path predictor or not.
TestFloor.pdf

pietermarsman · 2021-08-15T16:08:58Z

@cslotboom The *TestFloor.pdf` is also working using the development branch of pdfminer.six.

The *2018.pdf` shared in the linked issue (I attached it here again) does indeed show that it fixes the issue.

2018.pdf

pietermarsman · 2021-08-15T16:20:47Z

@edugonza Thanks for your work!

Do you have some references for the changes to predictor values of 1 and 3? This changes current behavior and I'm always very careful with that, but it also looks like it is an improvement. Anyway, I would like a comment that explains what the algorithm is based on because (I think) it is not part of the PNG specification you already linked. Let me know if you have something.

edugonza · 2021-08-16T13:37:56Z

Hi @pietermarsman ,

If I remember well, the PNG specification in the link describes how to apply the filter to raw data. However, the piece of code at hand performs the inverse operation, to decode data from the filtered version to raw. That is why the code does not seem to apply the filter as it is, but undoes what the specification describes.

I hope that explains the differences.
KR
Edu

pietermarsman · 2021-08-23T19:04:02Z

pdfminer/utils.py

+            for j, r in enumerate(line1):
+                left = int(line2[j-bpp]) if j-bpp >= 0 else 0
+                up = int(line0[j])
+                c = ((r + math.floor(left + up)) // 2) & 255


This line must be:

c = (r + math.floor(left + up) // 2) & 255

Seems to be wrong in the current implementation as well

I fixed it in the latest commit

…nd add pieces of the docs to show what is going on.

pietermarsman · 2021-08-23T19:22:48Z

@edugonza I went over all the filter types, refactored the names for better understanding and now I'm pretty confident that everything is according to the docs.

I want to check one last time before I merge. If you have time, can you validate/check/compare the current implementation against the docs as well?

I would like to get this right in one go :)

pietermarsman · 2021-08-26T18:53:34Z

@edugonza thanks for all the work!

edugonza · 2021-08-31T08:29:51Z

Hi @pietermarsman,

Sorry for my late reply. Thank you for all the fixes and corrections.

edugonza and others added 2 commits October 27, 2020 12:27

Added support for Paeth PNG filter compression (predictor value = 4)

dd3b827

Merge branch 'develop' into develop

4240c1d

edugonza mentioned this pull request Feb 9, 2021

Unsupported predictor value #339

Closed

pietermarsman reviewed Feb 13, 2021

View reviewed changes

Use above and upper_left as in the pseudo code

c90b5fa

pietermarsman reviewed Aug 23, 2021

View reviewed changes

Refactor: use variable names that are very close to the pseudo code a…

8716e56

…nd add pieces of the docs to show what is going on.

pietermarsman added 6 commits August 23, 2021 21:27

Fix line length issues

779121c

Add line about compressions to README.md

f3e28a1

Fix merge conflict on readme

47e9ea6

Merge branch 'develop' into develop

5d26851

Fix bug in filter type Up

3909072

Make if-else consistent

0f6f968

pietermarsman merged commit ea00f56 into pdfminer:develop Aug 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added support for Paeth PNG filter compression (predictor value = 4) #537

Added support for Paeth PNG filter compression (predictor value = 4) #537

edugonza commented Oct 27, 2020 •

edited by pietermarsman

Loading

cslotboom commented Nov 26, 2020

pietermarsman left a comment

pietermarsman commented Feb 13, 2021

edugonza commented Feb 22, 2021

cslotboom commented Feb 23, 2021

fgregg commented Jul 29, 2021

cslotboom commented Aug 3, 2021

pietermarsman commented Aug 15, 2021

pietermarsman commented Aug 15, 2021

edugonza commented Aug 16, 2021

pietermarsman Aug 23, 2021

pietermarsman Aug 23, 2021

pietermarsman Aug 23, 2021

pietermarsman commented Aug 23, 2021

pietermarsman commented Aug 26, 2021

edugonza commented Aug 31, 2021

Added support for Paeth PNG filter compression (predictor value = 4) #537

Added support for Paeth PNG filter compression (predictor value = 4) #537

Conversation

edugonza commented Oct 27, 2020 • edited by pietermarsman Loading

cslotboom commented Nov 26, 2020

pietermarsman left a comment

Choose a reason for hiding this comment

pietermarsman commented Feb 13, 2021

edugonza commented Feb 22, 2021

cslotboom commented Feb 23, 2021

fgregg commented Jul 29, 2021

cslotboom commented Aug 3, 2021

pietermarsman commented Aug 15, 2021

pietermarsman commented Aug 15, 2021

edugonza commented Aug 16, 2021

pietermarsman Aug 23, 2021

Choose a reason for hiding this comment

pietermarsman Aug 23, 2021

Choose a reason for hiding this comment

pietermarsman Aug 23, 2021

Choose a reason for hiding this comment

pietermarsman commented Aug 23, 2021

pietermarsman commented Aug 26, 2021

edugonza commented Aug 31, 2021

edugonza commented Oct 27, 2020 •

edited by pietermarsman

Loading