Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#508 add support for embedded file #509

Closed
wants to merge 4 commits into from
Closed

#508 add support for embedded file #509

wants to merge 4 commits into from

Conversation

syjer
Copy link
Contributor

@syjer syjer commented Jul 8, 2020

hi, this PR implement the support of embedding files in the pdf.

You can see in #508 (comment) how it work. Basically, adding the attribute data-embed-file="true" to a link will make the linked resource embedded.

As written in #508 (comment) , the support from the pdf viewer is, uh, variable. Especially the icon is quite ugly, and at the moment I'm not able to hide it. Edit: I was able to customize it. Seems to globally work, except for the viewer from IE/Edge. But at least adobe reader seems to work.

Note: there are security implication if a third party user has control over the html. Maybe we need to add an explicit list of embeddable uri resource in the builder?

I still need to add some kind of test.

WDYT?

PDEmbeddedFile embeddedFile = new PDEmbeddedFile(_od.getWriter(), new ByteArrayInputStream(file));
embeddedFile.setSubtype(elem.getAttribute("data-content-type") != null ? elem.getAttribute("data-content-type") : "application/octet-stream");
fs.setEmbeddedFile(embeddedFile);
String fileName = Paths.get(uri).getFileName().toString();
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won't work for custom protocol handlers reliably. I think it may be better to use another attribute, possibly download.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right, it may fail. Maybe data-file-name as an attribute is more telling?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree download isn't the best attribute name, but it is the standard html5 attribute for this purpose.

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/a#Attributes

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't know about the download attribute :D . So I could remove "data-embed-file" and use the download attribute with a name.

@danfickle
Copy link
Owner

In regard to security, we have the URL resolver which can return null for bad resource attempts. Do we think this is enough? In hindsight the uri resolver should have taken resource type as well as uri, I guess.

@syjer
Copy link
Contributor Author

syjer commented Jul 10, 2020

about the security implication, I think the current url resolver may not be fine grained enough for this specific case.

I'm not a fan of adding yet another allow list, but as we are embedding resources in the pdf, I think avoiding by default the possibility to let a user embed some arbitrary file from the file system (like the classic /etc/passwd) is more sane.

@danfickle
Copy link
Owner

I think we need to have a standard method for all resources (or at least newly added ones), something like:

builder.useAllowedProtocols(ResourceType.FileEmbed, "http://", "https://");

Then we could just add to the enum for each resource, rather than adding methods. The problem is backwards compatibility. Perhaps we could have defaults for each resource type (everything for xml, nothing for file embed, etc) and calling this method would override those defaults.

@syjer
Copy link
Contributor Author

syjer commented Jul 10, 2020

Agree about the using a ResourceType. I wonder if protocol is enough? Maybe adding the possibility to pass a custom Predicate<String> where the parameter is the uri?

For the method name, maybe defineExternalResourceAccess is more general? I would guess the more general method would receive 2 parameter: a ResourceType and a Predicate<String>. Then we can add an overload for protocols that convert them in predicates internally.

And as you said, we define a sane set of default :).

This way we can centralize the resource access and maybe remove the ad hoc constructor that I've added in https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-svg-support/src/main/java/com/openhtmltopdf/svgsupport/BatikSVGDrawer.java#L58

@danfickle
Copy link
Owner

Agree about the using a ResourceType. I wonder if protocol is enough? Maybe adding the possibility to pass a custom Predicate<String> where the parameter is the uri?

For the method name, maybe defineExternalResourceAccess is more general? I would guess the more general method would receive 2 parameter: a ResourceType and a Predicate<String>. Then we can add an overload for protocols that convert them in predicates internally.

And as you said, we define a sane set of default :).

This way we can centralize the resource access and maybe remove the ad hoc constructor that I've added in https://github.com/danfickle/openhtmltopdf/blob/open-dev-v1/openhtmltopdf-svg-support/src/main/java/com/openhtmltopdf/svgsupport/BatikSVGDrawer.java#L58

Agree with all of this. The only issue is whether the predicate should be run before or after the uri resolver.

@syjer
Copy link
Contributor Author

syjer commented Jul 16, 2020

The only issue is whether the predicate should be run before or after the uri resolver.

I would guess after? If you don't mind, I'll open a new PR with only the access control part so we can focus better on that specific feature :)

@danfickle
Copy link
Owner

@syjer, Just letting you know, I'd like to do a release, probably tomorrow. If you are busy, no worries, we can leave this PR until the next release.

Are you otherwise OK with me doing a release?

@syjer
Copy link
Contributor Author

syjer commented Jul 19, 2020

If you are busy, no worries, we can leave this PR until the next release.

yep, I prefer to let it rest for the next release.

Are you otherwise OK with me doing a release?

ok on my side :)

danfickle added a commit that referenced this pull request Jan 21, 2021
Based heavily on code by @syjer whom I'm indebted to. Thanks.

Still todo:
+ Prevent duplicate file embeds. Possibly create a map of uris to PDComplexFileSpecification and reuse if encountered again. I have to peruse the PDF spec to see if this is allowed.
+ Logging on fail.
@danfickle danfickle mentioned this pull request Jan 21, 2021
2 tasks
@syjer
Copy link
Contributor Author

syjer commented Jan 21, 2021

superseded by #640 :)

@syjer syjer closed this Jan 21, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants