-
Notifications
You must be signed in to change notification settings - Fork 540
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement missing cm command #720
Conversation
It seems that i will have to test it with the documents from the unit tests and see why the result differs. Will do that when i find the time |
So the problem was that i also needed to implement the q and Q commands to save/load the graphics state to/from the stack. The values are getting closer, but because of math inaccuracy they still differ:
The question is how i should handle this:
Can anyone give me some advice here? |
@DominikDostal thank you for your effort. I don't have much time right now, but will see that I can back to you soon. Please provide reference(s) to the PDF specification you are referring to or on which your code is based on.
I am not sure if I got your point. At first glance, it seems to be an improvement to have more precise float values instead of rather less precise strings. Rounding should be avoided though, because it may interfere/alter peoples results. |
I based it on https://opensource.adobe.com/dc-acrobat-sdk-docs/pdfstandards/PDF32000_2008.pdf Chapter 8.4.4 Graphics State Operators.
I guess the document used for some of the PageTests is a special case:
When you multiply them, the result is: Thats why im not sure if not rounding is a good approach, because it looks to me like it was supposed to be a periodic number, but was cut short because of rounding. This is btw how the calculation in the first part of the document looks like:
|
I already went through all of this with the PDFObject parsing to account for I really wonder if in the future we can add a global document stream parser that generates the data required by both functions/objects and thus obsoletes both of these. |
Oh i see. pdfparser/src/Smalot/PdfParser/PDFObject.php Line 769 in a3e213d
|
You're right, of course, and I should add that. But my end didn't have to worry about also storing the exact positioning matrix along with each bit of text. It only needed to know did it move enough to warrant adding a line-break. :) |
@DominikDostal I am not sure, where we stand here. Please tell me which points are open to discuss and what you need to finalize the PR.
Well, in general I would keep as much data as provided. Sure in this case, rounding seems obvious but it may have different ramifications in other cases. Therefore my statement about keep raw data. Developers can round the values themselves if they want. Can you provide a test which fails without your changes? |
@DominikDostal Please give us an update how you want to proceed here and cover the remaining points. |
@k00ni Im sorry about not being active here, I was sick in between and had to catch up with work (and then forgot 😞 ) I pushed a commit that adds a new document to test CM Commands, but i also had to fix some existing tests (since now we get more decimal numbers than before). I wasnt sure if rounding the results or changing the expected results was better. I also had to add an error margin of 0.01 to the |
111e945
to
1e51b55
Compare
Can I assume that your new test fails when using a prior version without your changes? |
Yes The results will be "slightly" different: There was 1 failure:
1) PHPUnitTests\Integration\PageTest::testCmCommandInPdfs
Failed asserting that two arrays are equal.
--- Expected
+++ Actual
@@ @@
Array (
- 0 => '0.75'
- 1 => '0.0'
- 2 => '0.0'
- 3 => '0.75'
- 4 => '59.16'
- 5 => '500.4'
+ 0 => 1.0
+ 1 => 0.0
+ 2 => 0.0
+ 3 => -1.0
+ 4 => 78.88
+ 5 => 126.56
)
D:\source\pdfparser\tests\PHPUnit\Integration\PageTest.php:957 |
Type of pull request
About
When printing a landscape word document as pdf, it will be created with a concatenation matrix (cm command) which is completely ignored by the pdfparser.
This PR aims to add it.
I tested this with this document:
This is just a test - Printed As.pdf
With this code:
Before i got these results for the text:
After my change i get the following result:
If i open the pdf in adobe acrobat reader and measure the distances:
Converted to PDF units:
~20.02mm * 2.83465 = ~56,75
~176.06mm * 2.83465 = ~499
(deviation is because i didnt measure it precise enough in adobe)