-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expand relative paths in href
and src
attributes via replace-match
in sanitize-html function.
#46
base: master
Are you sure you want to change the base?
Conversation
Works by searching for the value of href and the src attributes and replace their value with an expanded url.
This is in case somebody explicitly pass a url to ‘org-web-tools--url-as-readable-org’.
href
and src
attributes via replace-match
in sanitize-html function.href
and src
attributes via replace-match
in sanitize-html function.
After a few more testing the expanding function raised error <a href="#Preorder_R\R"> in https://en.wikipedia.org/wiki/Binary_relation Maybe quoting the replacement text will resolve the issue. |
The function was having issue expanding <a href="#Preorder_R\R"> because the “\” is treated as special in replace-match, with the LITERAL argument set to t, this won’t be a problem anymore.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi,
Apologies for overlooking this PR for so long.
Looking at #41 again, and based on what I've learned since then, I think the best way to solve this issue would be to parse the HTML to a DOM object with libxml-html-parse-buffer
, then walk the DOM using the dom
library and modify any anchors' HREFs accordingly. Then the DOM can be serialized back to HTML using shr-dom-print
. That should be more robust and reliable than using regexp matches on the HTML.
Also, I think the --sanitize-html
function is not the place to do this change; its purpose is to "sanitize" the HTML, i.e to make it clean and safe, and adjusting links is a different purpose.
So, would you like to adjust this PR accordingly? If you're not interested anymore, that's fine, too. Just let me know.
Thanks.
Addressing #41 and maybe #45 too?
All the work is done in
org-web-tools--sanitize-html
.Added a new variable called
org-web-tools-expand-relative-path
.If it's nil, relative paths won't be expanded.
This works by searching the temporary html buffer created by
org-web-tools--sanitize-html
for all
href
andsrc
attributes, pass the value of each attribute tourl-expand-file-name
,with the URL argument or the url in user's kill ring being its base, replace the value of the
attribute with the result.
I tested this on a handful of Wikipedia articles and it works just fine.