-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Attachments in PDF failing to hyperlink #900
Comments
I've generated PDF and only one attachment presents in the PDF - This attachment encoded in the Presentation XML as: <metanorma-extension>
...
<attachment name="READY-20230316-no-toc-iso-10303-49.pdf">data:application/pdf;base64,JVBERi
...
<p id="_bed0f9b3-394f-9910-dab9-8f46f0cb958b">Trial PDF document: <link target="_document_attachments/READY-20230316-no-toc-iso-10303-49.pdf">10303-49/READY-20230316-no-toc-iso-10303-49.pdf</link>
...
<bibliography>
<references id="_bibliography" normative="false" obligation="informative" hidden="true" displayorder="9">
<title depth="1">Bibliography</title>
<bibitem id="attachment-10303-49-trial" hidden="true">
<formattedref format="application/x-isodoc+xml">[NO INFORMATION AVAILABLE]</formattedref>
<uri type="attachment">_document_attachments/READY-20230316-no-toc-iso-10303-49.pdf</uri>
<uri type="citation">_document_attachments/READY-20230316-no-toc-iso-10303-49.pdf</uri>
<docidentifier type="metanorma">[10303-49/READY-20230316-no-toc-iso-10303-49.pdf]</docidentifier>
</bibitem>
</references>
</bibliography> Also, there are <p id="_3c1b569d-6058-5228-5c17-0c06c39a7da7">PDF document comparison report: <link target="10303-49-comparison-report.pdf"/>
...
<p id="_a9f03ffe-d062-d97b-a425-e9e45692f302">Annotated EXPRESS schema: <link target="10303-49/method_definition_schema/method_definition_schema.exp"/>
I need update XSLT for such case. To differentiate link to the external entity like Also, there are <p id="_be27e7cc-b2c2-f0d7-8ccb-e2d32357c97f">Trial PDF document: <xref target="attachment-10303-50-trial">[attachment-10303-50-trial]</xref> @opoudjis how to process such |
It is correct to only have 1 attachment. I can provide another file for you that I have linked the attachments but they are not attached. There are two types of links.
|
I think part of the problem is that not all the attachments that were supposed to be there were, so the links weren't properly generated. (That might even be the case in the large file I also sent.) Since I am addressing both HTML and DOC, should link/target be the same as attachment/name, so that you know which attachment is which? Or is the current arrangement workable? If you see an xref, it simply is not an attachment, because the attachment has not been loaded in: attachments are loaded in via the bibliography. If the attachment had been loaded in, it would be showing up as an eref => link. You can ignore xref as an error in the underlying markup. |
@ronaldtse yes, it would be helpful. |
I'll investigate it. |
How currently the attachment mechanism is working in the XSLT.The Presentation XML contains:
<metanorma-extension>...
<attachment name="READY-20230316-no-toc-iso-10303-49.pdf">data:application/pdf;base64,JVBER...
<link target="_document_attachments/READY-20230316-no-toc-iso-10303-49.pdf">
...
<bibliography>
<references id="_bibliography" normative="false" obligation="informative" hidden="true" displayorder="9">
<title depth="1">Bibliography</title>
<bibitem id="attachment-10303-49-trial" hidden="true">
<formattedref format="application/x-isodoc+xml">[NO INFORMATION AVAILABLE]</formattedref>
<uri type="attachment">_document_attachments/READY-20230316-no-toc-iso-10303-49.pdf</uri>
<uri type="citation">_document_attachments/READY-20230316-no-toc-iso-10303-49.pdf</uri>
<docidentifier type="metanorma">[10303-49/READY-20230316-no-toc-iso-10303-49.pdf]</docidentifier>
</bibitem>
</references>
</bibliography> I.e. there isn't explicit relationship between the attachment THEREFORE, the XSLT executes such actions:
The code: <xsl:template match="*[local-name()='link']" name="link">
...
<xsl:when test="contains(@target, concat('_', $inputxml_filename_prefix, '_attachments'))">
<!-- link to the PDF attachment -->
<xsl:variable name="target_" select="translate(@target, '\', '/')"/>
<xsl:variable name="target__" select="substring-after($target_, concat('_', $inputxml_filename_prefix, '_attachments', '/'))"/>
<xsl:value-of select="concat('url(embedded-file:', $target__, ')')"/>
</xsl:when>
BUT if input XML filename isn't @opoudjis the question - I've found second issue with links. If there is a comment note on the page, then all references are not working, i.e, they are showing as blue text without links (the mouse pointer isn't changes on mouse over): |
Can I get back to this query on Monday? I'm going out of town for the weekend. The prefix is indeed _{document-name}_attachments/{attachment-name}, which is why I suggested above that I make the name attribute in the attachment the same as the target attribute in the link, so that you do know they are the same. Looks like that is the right thing to do. |
@opoudjis ok. |
Fixed in |
common.xsl updated for PDF attachments, metanorma/metanorma-standoc#900
I've update I've found another bug. The attachments:
The content of both PDF is truncated (doesn't end with The reason - the text content of the element |
Hm. I'm going to fix the attachment link anyway, though it may make life more complicated for HTML. The MB limit is a surprise to me, and I don't think it's my doing. I have recently imposed a 10 MB limit on images, but that should be resulting in crashes, and it should not be truncating. Will investigate. |
The MB limit is indeed Nokogiri, even when I changed the code to append the string as a child. I am going to have to introduce linebreaks. Odd that Nokogiri does not have this issue with XML attributes... |
|
... Still didn't work... Having to add it one line at a time in Nokogiri. |
Works. Will generate entire document and pass it to you. |
Very strange, Adobe Reader shows only 1 (first) page for 86Mb document.pdf. I'll investigate it. |
or
The presentation XML size is 141Mb. |
common.xsl updated for PDF attachments, metanorma/metanorma-standoc#900
I don't understand why the PDF generated by @opoudjis contains only 1 page:
I've generated the PDF with increased Java heap space up to 5Gb, and can confirm that PDF contains correct all PDF attachments.
The error occurs on the Presentation XML size 141Mb, but process correctly old Presentation XML size 193Mb. So, currently there is only one issue with Java heap space. |
common.xsl, error fix, metanorma/metanorma-standoc#900
After a few attempts I've generated PDF (86Mb) with 1 page. The log contains
but the process didn't end abnormally and PDF generated with 1 page. So this is exactly the error with Java heap space.
|
FYI @Intelligent2013 it has just run out of heap space on my side again, but IMO 100MB of PDF attachments are unreasonable to compile into a PDF to begin with... |
@opoudjis could you share the Presentation XML to dropbox or similar? Thanks! |
@opoudjis thank you! I have |
@opoudjis issue |
In #898 I have had to do some debugging of attachments, to make it possible to compile an Asciidoctor document with attachments outside of the working directory.
This has worked on HTML, with it finding the attachments now. But the PDF has stopped linking to attachments.
What is perplexing is
Which makes me suspect this is not a matter of my code, but of processing constraints on the PDF.
I am sending the 200 MB Presentation XML on Skype for you to look at. @ronaldtse will be able to send you different iterations of the document in question.
The text was updated successfully, but these errors were encountered: