Allow export to take multiple entrypoints to use when crawling. #825

pngwn · 2019-07-26T22:11:54Z

I am PRing this again because I'm a genius and the original PR was from the master of my svelte fork. I was then too terrified to actually try to do anything on another branch in-case I broke something.

Original PR: #756.

This PR allows you to pass a list of entry points to export, which allows users to fully export apps that are not entirely linked by anchor tags.

I'm not sure if this is the best way to do it but it seems to work, I've added a test that provides a few unlinked entry points in the various styles people might try to pass in. I've done my best to handle unnecessary index.htmls and /s (I'm no regex expert so that might need looking at). I also moved the oninfo call so users get a console message for each entry point, followed by the list of saved files for that 'crawl'.

I added .vscode to the gitignore as I was having trouble disabling my autoformatting with out it. I don't know what has happened with the package-lock.json it just seems to add some stuff when I install it.

Closes #749.

Conduitry · 2019-07-28T19:33:39Z

What does the currentRoot variable do that's different from the i in the loop? I'm confused about how this is implemented.

pngwn · 2019-07-28T19:42:03Z

From what I can tell, nothing, I probably just thought it was more descriptive. I don't really remember, I wrote this code quite a while ago.

In terms of implementation, It's just running handle in a loop instead of doing it just the once on /.

Conduitry · 2019-07-28T19:49:55Z

src/api/export.ts

@@ -201,7 +212,7 @@ async function _export({
 			type = 'text/html';
 			body = `<script>window.location.href = "${location.replace(origin, '')}"</script>`;

-			tasks.push(handle(resolve(root.href, location)));
+			tasks.push(handle(resolve(entryPoints[currentRoot].href, location)));


What's the reason for resolving this relative to the current entry point? I think this should still always be relative to root. My understanding is that the multiple entry points is just so that the crawler can find pages it wouldn't ordinarily find. All of them should still have the same base href value.

The test you wrote still passes when using root.href, fwiw

Yeah, I think you're right. I don't think I looked too deeply into this but I guess this clause is handling redirects/ 300s? I think it's using the Location header and building a new url to handle from that. My tests definitely don't test this behaviour, I don't know if any of the existing ones do either but resolving with anything but the root url would fail, I think.

Yeah, looking back this clears up some of the confusion I had at the time with the whole entryPoints[currentRoot] thing, I wondered why that was necessary. This correction removes the need for that entirely. I probably should have taken more time to understand the code before implementing this.

onionhammer · 2020-06-17T19:53:45Z

It looks like this was broken recently

kevlened · 2020-07-09T21:24:35Z

@onionhammer I also thought this was broken; I was using the wrong syntax. I tried comma-separation and multiple --entry flags, but entries should be separated by spaces: sapper export --entry="/ /products /products/example"

pngwn and others added 2 commits July 26, 2019 22:57

Allow export to take multiple entrypoints to use when crawling.

5c292c0

tidy

5f14315

Conduitry reviewed Jul 28, 2019

View reviewed changes

Conduitry added 2 commits July 28, 2019 16:01

tidy

1cdb3c0

adjust cli option descriptions

8420983

Conduitry merged commit 910d28e into sveltejs:master Jul 28, 2019

Conduitry mentioned this pull request Dec 4, 2019

Export all pages, not just those linked to with <a> tags #1014

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow export to take multiple entrypoints to use when crawling. #825

Allow export to take multiple entrypoints to use when crawling. #825

pngwn commented Jul 26, 2019

Conduitry commented Jul 28, 2019

pngwn commented Jul 28, 2019

Conduitry Jul 28, 2019

Conduitry Jul 28, 2019

pngwn Jul 28, 2019

pngwn Jul 28, 2019

onionhammer commented Jun 17, 2020

kevlened commented Jul 9, 2020

Allow export to take multiple entrypoints to use when crawling. #825

Allow export to take multiple entrypoints to use when crawling. #825

Conversation

pngwn commented Jul 26, 2019

Conduitry commented Jul 28, 2019

pngwn commented Jul 28, 2019

Conduitry Jul 28, 2019

Choose a reason for hiding this comment

Conduitry Jul 28, 2019

Choose a reason for hiding this comment

pngwn Jul 28, 2019

Choose a reason for hiding this comment

pngwn Jul 28, 2019

Choose a reason for hiding this comment

onionhammer commented Jun 17, 2020

kevlened commented Jul 9, 2020