Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lycheeignore ergonomics #418

Closed
mre opened this issue Dec 4, 2021 · 12 comments · Fixed by #623
Closed

lycheeignore ergonomics #418

mre opened this issue Dec 4, 2021 · 12 comments · Fixed by #623
Labels
help wanted Extra attention is needed request-for-comments

Comments

@mre
Copy link
Member

mre commented Dec 4, 2021

In refined-github/refined-github#5115, @fregante ran into an interesting issue.
The lychee call used was as follows:

lychee --verbose --no-progress --exclude-file "safari/Shared (App)/Base.lproj/Main.html" --exclude-file "distribution/options.html" -- '**/*.md' '**/*.html' '**/*.css' '**/*.ts' '**/*.tsx'

and this caused the following error:

Error: Compiled regex exceeds size limit of 10485760 bytes.

It took me a while to figure out what was going on.

The --exclude-file, was not used as a newline-separated list of regular expressions for filtering out links.
Instead in this case it was meant as a way to exclude a single file from getting checked.
However lychee interpreted that file as one... huge... regex. 🙈💥

Given that there currently is no way to exclude individual files, I wonder if we should reinterpret the --exclude-file setting as a way to literally exclude files. It would probably avoid that sort of confusion in the future.
Our ignore file is called .lycheeignore so if anything the naming of the param should match that.
git doesn't have a way to specify an ignore file from the cli at all, so perhaps we just make it a convention that the file is called .lycheeignore and simply have no way to add more.

Any objections? Otherwise I'd cut out a PR for that.

@lebensterben
Copy link
Member

Something like:

--exclude-file for regex/glob pattern
--exclude-file-fixed for exact match

This won't break existing workflows. (But this is a breaking change in API so version number should increment.)

@mre
Copy link
Member Author

mre commented Dec 5, 2021

--exclude-file should really have been called --ignore-file or never have existed at all.
Not a big fan of naming an option --exclude-file-fixed either as it sounds cumbersome to me. 😕 There might be a better, shorter name.
I would like to fix that wart before 1.0. My plan is to remove the option entirely and send PRs to all (public) repos of that feature on Github. We obviously can't control external users, but I think most people use it from the CLI and not inside any build pipelines outside Github anyway. We could add a warning that the option got removed when people try to use it.

@untitaker
Copy link
Collaborator

untitaker commented Dec 5, 2021

I don't think exclude vs ignore clears anything up. To me both can be understood as either directive: "ignore/exclude this file" -- or as "load this ignore/exclude-file"

How about:

  • --use-exclude-file/--use-ignore-file: Loads a list of regexes from given filepath
  • --exclude/--ignore: Exclude the file/URL given on the argument line -- this already exists, does it cover OPs usecase?

--exclude-file can stay as deprecated alias then.

Just deleting the feature ofc also works, upgrade path could be:

lychee $(cat ./my_exclude_file.txt | xargs -d $'\n' -n1 echo --exclude)

@mre
Copy link
Member Author

mre commented Dec 5, 2021

--exclude/--ignore: Exclude the file/URL given on the argument line -- this already exists, does it cover OPs usecase?

Arguments are treated as URL patterns, so no. At least not yet. We could add support for file:// URIs but that would clash with actual files that occur within the inputs.
We have to find a solution for that. Technically we could exclude files via multiple globs, but it might be unergonomic for simple use-cases.

Apart from that I like the suggestions.

@mre
Copy link
Member Author

mre commented Feb 7, 2022

To make some progress on this, here is a proposal:

  • --use-ignore-file: Point to an ignore file with patterns as in .lycheeignore.
    I prefer that over --use-exclude-file because it's aligned with the naming of the ignore file.

  • --exclude-path: Exclude path to file or directory from getting checked.
    All inputs matching the path will be exluded. The naming is still not ideal, but I hope to clear up the situation for future users by strictly separating exclude for input exclusions from ignore for ignoring URL within inputs, similar to the behavior of .gitignore.

  • ⚠️ --exclude-file: deprecate and print warning.
    ("[Deprecated]: To exclude entire paths, use --exclude-path; to ignore URLs within inputs, use --exclude <url> or add a .lycheeignore with patterns.

  • Future work: Add globset support, so that a user can alternatively do lychee './**/*.md' '!foo.md'.
    This will be a separate issue and PR, though.

@mre
Copy link
Member Author

mre commented Feb 7, 2022

I would first get a lot of feedback before changing this, because I want to avoid future confusion. Will probably try to get more eyeballs on this.

@mre
Copy link
Member Author

mre commented Feb 14, 2022

I'd like to move forward with the proposal as this issue keeps biting people. If anyone has any comments, please post them here or react with 👍 or 👎 . If there are no further reactions I would go ahead and create a PR with the changes.

@choldgraf
Copy link

I think the proposal in #418 (comment) is reasonable! Though I feel like it is giving the user some unnecessary complexity to remember when it is exclude vs. ignore. It seems arbitrary to me that:

  • If you pass URLs to ignore directly to the CLI, it is --exclude
  • If you pass a file with a list of those URLs, it is --use-ignore-file.

Since in this case exclude and ignore are synonyms, I believe.

Would it be too much to use the above proposal AND rename --exclude to --ignore? Then ignore is the only verb users need to remember for ignoring URLs, and has a clear connection to .lycheeignore, and exclude is reserved for excluding an entire file or folder.

@lebensterben
Copy link
Member

in my opinion the most important use case of lychee-bin is in CI.

And the preferred way of configuration is via a .toml file.

Directly passing arguments via CLI is secondary in my opinion. My only recommendation is to keep it similar to the interface of ripgrep (as I mentioned earlier) and probably also fd-find.

@mre
Copy link
Member Author

mre commented Feb 14, 2022

There are a lot of --exclude* commands like --exclude-link-local and --exclude-mail, which handle URLs. Would we have to rename them, too, for consistency?

A lot of bigger pipelines pass arguments via lychee-action's args flag and many of them also use --exclude*. See for instance opensearch and awesome-kubernetes. I don't think we should break those pipelines.

Maybe the separation between exclude and ignore is unnecessary. I'd say 99% of users would use .lycheeignore instead of passing a file via --exclude-file. We could just opt for deprecating the option and not having a way to specify any different filename. It works for git, too. If someone really wants it, we could add an overly verbose --overwrite-lycheeignore-file or so in the future.

That leaves us with

  • deprecate --exclude-file and print warning.
  • add --exclude-path

@choldgraf
Copy link

choldgraf commented Feb 14, 2022

That sounds reasonable to me :-) maybe deprecate it for now, and once it's removed if a bunch of people start complaining that they want to call it something other than .lycheeignore, it could be selectively added back after discussion? But for now I agree that it doesn't feel necessary. (apologies if I am suggesting this without knowing about some gigantic debate from 2 years ago or something 😬).

For what it's worth, I likely assumed that exclude-file meant "exclude this file from the build" precisely because I thought that .lycheeignore was "the one way" to provide URL exclusion patterns vis a file on disk.

@mre
Copy link
Member Author

mre commented May 29, 2022

The changes got merged to master now and will be published with the next release. 🥳 Thanks for the collaboration.

  • Deprecated --exclude-file, which is still available for now but will print a warning when being used.
  • Added --exclude-path.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed request-for-comments
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants