Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

file-system to domain remap example #2

Merged
merged 4 commits into from
Mar 22, 2024
Merged

file-system to domain remap example #2

merged 4 commits into from
Mar 22, 2024

Conversation

tgaff
Copy link
Contributor

@tgaff tgaff commented Mar 19, 2024

Based on #1

@mre
Copy link
Member

mre commented Mar 19, 2024

Awesome, thanks for the PR!

I tried it on the lychee docs repo as suggested and it works.

lychee --base https://lychee.cli.rs  --remap '(.*).md $1.html' src/content/docs/**.md*

However, be aware that this uses your shell's glob handling: it expands the glob expression before calling lychee. That means, that in the case of a large directory, this call could fail because the input becomes too big.

In order to use lychee's own Rust-based glob matcher, you'd have to put it in quotes. (That's just to tell the shell to not touch the string.)

lychee --base https://lychee.cli.rs  --remap '(.*).md $1.html' "src/content/docs/**.md*"

But this would lead to an error message:

Error: UNIX glob pattern is invalid

Caused by:
    Pattern syntax error near position 19: recursive wildcards must form a single path component

The problem is the final * at the end. If it's removed, the command works as expected:

lychee --base https://lychee.cli.rs  --remap '(.*).md $1.html' "src/content/docs/**/*.md"                                                                                                   

Can you change that in your version, or was there a reason for the * at the end?


As a side note, I think --remap '(.*).md $1.html' is also not 100% correct.

For example, (.*).md would match fremd.
That's because the dot before md matches any single character. (The e in this case).
You'd have to escape it to (.*)\.md in order to actually match a dot only.

Additionally, the regex doesn't say that the string ends at md either. 😅
So (.*)\.md would still match fre.mde.

A more accurate expression would be

lychee --dump --remap '(.*)\.md$ $1.html' test.html

I leave it up to you to decide if you want to add a remark about this or not.
In my opinion, we should keep it simple and people will probably find it easier to use --exclude to get rid of false-positives anyway.

@tgaff
Copy link
Contributor Author

tgaff commented Mar 21, 2024

The problem is the final * at the end. If it's removed, the command works as expected:

lychee --base https://lychee.cli.rs --remap '(.*).md $1.html' "src/content/docs/**/*.md"
Can you change that in your version, or was there a reason for the * at the end?

The lychee docs include some .mdx files including the index page. I'm not sure how the rust based glob matcher works, but I wasn't able to use shell-knowledge to find a pattern that works for it. The best I found was reproducing the whole matcher with the single change:

❯ lychee --base https://lychee.cli.rs  --remap '(.*)\.md $1.html' "src/content/docs/**/*.mdx" "src/content/docs/**/*.md"

I think this may be getting a bit far-afield from this documentation topic though. Perhaps we should drop the lychee docs example anyway, particularly given that it has an intentional link to a markdown file that getting unintentionally failed due to the conversion. (No idea how to solve that either.)

@mre
Copy link
Member

mre commented Mar 21, 2024

The lychee docs include some .mdx files including the index page.

Ah! Got it. That makes sense. 👍

I'm not sure how the rust based glob matcher works, but I wasn't able to use shell-knowledge to find a pattern that works for it.

Yeah, tricky one. The glob crate does not support all the typical glob patterns. Here's the full list.
Typically, one would use a pattern like .{md,mdx}, but that's not supported. globset supports it, so we might want to migrate to that.

I touched on that here:
lycheeverse/lychee#284 (comment)
lycheeverse/lychee#418 (comment)

I think this may be getting a bit far-afield from this documentation topic though.

Fully agree.

Perhaps we should drop the lychee docs example anyway, particularly given that it has an intentional link to a markdown file that getting unintentionally failed due to the conversion. (No idea how to solve that either.)

I kinda liked the lychee docs example. It's a very practical use-case.
One way to fix the unintentional remap is to exclude the path:

lychee --base https://lychee.cli.rs  --remap '(.*)\.md $1.html' --exclude-path src/content/docs/CONTRIBUTING.md "src/content/docs/**/*.mdx" "src/content/docs/**/*.md"

If you like, you can update the command or, alternatively, we drop that part. A middle-ground would be to change --remap '(.*).md $1.html' to --remap '(.*)\.md $1.html' and keep the rest as it is. I leave that up to you. 😃

@tgaff
Copy link
Contributor Author

tgaff commented Mar 22, 2024

OK, I'm on board with keeping the second part. I think it adds a practical more complex example to the initial simpler example and might end up actually being helpful to someone. I've updated it as you suggested.

@mre
Copy link
Member

mre commented Mar 22, 2024

Looks good. Thanks for adding the example. 😃

@mre mre merged commit 60392b0 into lycheeverse:main Mar 22, 2024
@tgaff tgaff deleted the patch-1 branch March 25, 2024 03:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants