Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand relative paths in href and src attributes via replace-match in sanitize-html function. #46

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

c1-g
Copy link
Contributor

@c1-g c1-g commented Nov 24, 2021

Addressing #41 and maybe #45 too?

All the work is done in org-web-tools--sanitize-html.

Added a new variable called org-web-tools-expand-relative-path.
If it's nil, relative paths won't be expanded.

This works by searching the temporary html buffer created by org-web-tools--sanitize-html
for all href and src attributes, pass the value of each attribute to url-expand-file-name,
with the URL argument or the url in user's kill ring being its base, replace the value of the
attribute with the result.

I tested this on a handful of Wikipedia articles and it works just fine.

c1-g added 4 commits November 24, 2021 19:18
Works by searching for the value of href and the src attributes
and replace their value with an expanded url.
This is in case somebody explicitly pass a url to
‘org-web-tools--url-as-readable-org’.
@c1-g c1-g changed the title Expand relative path in href and src attributes via replace-match in sanitize-html function. Expand relative paths in href and src attributes via replace-match in sanitize-html function. Nov 24, 2021
@c1-g
Copy link
Contributor Author

c1-g commented Nov 26, 2021

After a few more testing the expanding function raised error
(error Invalid use of ‘\’ in replacement text)
when it tried to expand

<a href="#Preorder_R\R">

in https://en.wikipedia.org/wiki/Binary_relation

Maybe quoting the replacement text will resolve the issue.
I'll close this PR until the problem is resolved.

@c1-g c1-g closed this Nov 26, 2021
The function was having issue expanding

<a href="#Preorder_R\R">

because the “\” is treated as special in replace-match,
with the LITERAL argument set to t, this won’t be a problem anymore.
@c1-g c1-g reopened this Nov 26, 2021
@alphapapa alphapapa self-assigned this Oct 29, 2023
@alphapapa alphapapa added this to the 1.3 milestone Oct 29, 2023
Copy link
Owner

@alphapapa alphapapa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,

Apologies for overlooking this PR for so long.

Looking at #41 again, and based on what I've learned since then, I think the best way to solve this issue would be to parse the HTML to a DOM object with libxml-html-parse-buffer, then walk the DOM using the dom library and modify any anchors' HREFs accordingly. Then the DOM can be serialized back to HTML using shr-dom-print. That should be more robust and reliable than using regexp matches on the HTML.

Also, I think the --sanitize-html function is not the place to do this change; its purpose is to "sanitize" the HTML, i.e to make it clean and safe, and adjusting links is a different purpose.

So, would you like to adjust this PR accordingly? If you're not interested anymore, that's fine, too. Just let me know.

Thanks.

@alphapapa alphapapa modified the milestones: 1.3, 1.4 Dec 20, 2023
@alphapapa alphapapa modified the milestones: 1.4, Future Dec 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants