Skip to content
This repository has been archived by the owner on Jan 11, 2023. It is now read-only.

Sane default for no base element in an html file #1208

Merged
merged 5 commits into from
Mar 26, 2021

Conversation

jamesjnadeau
Copy link
Contributor

Any html file that didn't have a <base> element would cause export to fail with an error that didn't really relate to the problem at hand.

This pull sets a sane default and adds a test static file to the export test app to test if this is fixed.

This is a fix for #1202 and a resubmission of #1118

Before submitting the PR, please make sure you do the following

  • It's really useful if your PR relates to an outstanding issue, so please reference it in your PR, or create an explanatory one for discussion. In many cases features are absent for a reason.
  • This message body should clearly illustrate what problems it solves. If there are related issues, remember to reference them.
  • Ideally, include a test that fails without this PR but passes with it. PRs will only be merged once they pass CI. (Remember to npm run lint!)

Tests

  • Run the tests tests with npm test or yarn test)

@jamesjnadeau
Copy link
Contributor Author

@Conduitry @benmccann I have rebased and submitted this new pull as a cleanup to PR #1118. Please look over and let me know if you have any feedback. Thanks for your time.

@benmccann
Copy link
Member

benmccann commented May 17, 2020

There's a couple other PRs dealing with base as well:

I think #984 and #866 may be overlapping, but the rest (including this one) are probably largely orthogonal from each other and could all have value

@wighawag
Copy link

wighawag commented May 24, 2020

#866 is supporting website running on ipfs where the basepath is unknown, hence the need for relative base href.

So from the look of it, neither #984 nor #600 will work. The latter actually mention that relative base path is not standard (and do not work on IE). This would means that #866 would not support such browser. I still think it is worth it for IPFS website that would be broken otherwise

To be clear the use case is that of website that want to be fully operational from different url like :

  1. <ipfs hash>.<gateway host>/
  2. <gateway host>/ipfs/<ipfs hash>/
  3. <domain name>

All 3 are valid url for ipfs website. A user might prefer the easily memorable <domain name> but another user might prefer to bookmark an unmodifiable version via hash, and while 1) would be compatible, it is not available on all ipfs gateway.

Also some browser when navigating to a domain name like wighawag.eth (that Opera now support for example) will redirect it to an gateway ipfs url like 2.

examples

  1. : https://bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq.cf-ipfs.com
  2. : https://cloudflare-ipfs.com/ipfs/bafybeiemxf5abjwjbikoz4mc3a3dla6ual3jsgpdr4cjr3oz3evfyavhwq

or

  1. : https://cloudflare-ipfs.com/ipfs/QmNVTWGZ4qoW4DwiTze3icNJKhKBArR4HrCR1sapAFaPYg/
  2. : https://wighawag.eth.link

If you are using a compatible browser, the latter can be also tested with wighawag.eth

@benmccann
Copy link
Member

@jamesjnadeau what if the file isn't at the base of the static directory? E.g. you use test/apps/export/static/test.html in your test, but what if the file is test/apps/export/static/abc/test.html? Then should the base be /abc/ instead of /?

@jamesjnadeau
Copy link
Contributor Author

jamesjnadeau commented May 30, 2020

@benmccann, good question,
from https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

The HTML element specifies the base URL to use for all relative URLs in a document.

Let's walk through your examples
with a page served from https://example.com/ and a base set to /

  1. /static/test.html - any relative links will be assumed to come from https://example.com/
  2. /static/abc/test.html - any relative links will be assumed to come from https://example.com/

with a base set to /abc

  1. /static/test.html - any relative links will be assumed to come from https://example.com/abc/
  2. static/abc/test.html - any relative links will be assumed to come from https://example.com/abc/

So in the above example, any links made relatively would have to be based out of /abc/. If you had an anchor with href="stuff:, a click on that ink would bring you to https://example.com/abc/stuff

The default, when no base element is supplied, is to use just to use '/' for relative urls.

I threw together an example here that demonstrates how this works. The sub-pages are using the base element to pull in different css/js. Should be easy for you to clone and play with in your browser(see the top right menu):
https://sapper-1208-example.glitch.me/

My code is really just to deal with a scenario where someone forgets to put the base element in the document at all. It's in the sapper template by default, but if it was accidentally removed, the sapper compiler will crash when exporting the site with an error that's hard to interpret.

IMHO, if someone wants to use a base element, they can easily set this in their templates as they so choose. I'm just adding a default here in case someone forgets to set it, because the export command expects it to be set, and I'm defaulting to how a browser would handle this scenario.

It's on the developer to choose how they want to handle relative urls, and with the base element they have control of this on a per page level.

So for the examples @wighawag offered, they should be capable of setting the base element to what they desire in their template. What he's discussing is adding a domain in addition to a file path for the base element, which is a whole other can of worms. If they need extra logic to set it properly, that's on them to figure out.

I hope this answers your question and clears up some confusion here. Happy to help more if you need anything else.

@benmccann
Copy link
Member

It sounds to me like base will probably be made optional, so it would be nice if this worked in that case. And I think it'd be an easy update to use the page's path instead of always using /

@benmccann benmccann mentioned this pull request May 30, 2020
Copy link
Contributor

@thgh thgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/ would break if the current page is in a subfolder.

src/api/export.ts Outdated Show resolved Hide resolved
thanks @thgh for pointing this out

Co-authored-by: Thomas Ghysels <info@thomasg.be>
@@ -186,7 +186,7 @@ async function _export({
const cleaned = clean_html(body);

const base_match = /<base ([\s\S]+?)>/m.exec(cleaned);
const base_href = base_match && get_href(base_match[1]);
const base_href = base_match ? get_href(base_match[1]) : url.pathname;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't pathname include the the name of the html file? it wouldn't really be the base_href in that case

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point @benmccann
I wrapped it with a call to path.dirname, which should do as we need.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, it should include the name of the html file. I'm pretty sure that using dirname will not work, so there should be a test for it.

And to make it even better: it should include the search params to make this work correctly: href="#test". (Except that that link probably won't output an extra page)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For anchor tags (i.e. #test) you would want the name of the html file, but it'd also be repetitive to visit those pages again, so the best approach would probably be discarding those. For normal links to other files I think you'd still want just the directory name, right?

In either case, I agree that a couple more tests would be nice to make sure it's clear those cases are working

@jamesjnadeau
Copy link
Contributor Author

jamesjnadeau commented Jun 1, 2020

Yup, I'll add some tests so we can ensure we are doing the right thing, and the right thing keeps happening.

I'm going to write out the test I think need to be made, please add a note of if I missed any.

As a user running export, I should be able to have an html file without a base element that...

  • sets the correct base if the page is in a sub-folder
  • properly deals with file names (ex. /sub/file.html should have a base of /sub/)
  • can deal with id based anchor tags and search parameters properly (ex. /sub/?search=stuff#things)

I plan to extend the existing plain old html files in the static folder so this is dead simple to reason about.

@benmccann
Copy link
Member

@jamesjnadeau sorry this hasn't been merged yet. Would you mind rebasing it?

@jamesjnadeau
Copy link
Contributor Author

@benmccann just resolved the merge conflict(it was easier than rebasing, it's just indentation)

I dropped this because life got in the way, but I was having trouble with the last check:

  • can deal with id based anchor tags and search parameters properly (ex. /sub/?search=stuff#things)

I don't remember why, but please check this out before you merge

@dhrp
Copy link

dhrp commented Mar 4, 2021

Is there still a hold-up needed for this? It seems to be blocking the merge of #984

@benmccann benmccann linked an issue Mar 26, 2021 that may be closed by this pull request
@benmccann benmccann merged commit ab2e256 into sveltejs:master Mar 26, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Export fails if html has no <base>
5 participants