Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase richness of glob syntax? #3

Closed
epage opened this issue Apr 13, 2018 · 17 comments
Closed

Increase richness of glob syntax? #3

epage opened this issue Apr 13, 2018 · 17 comments

Comments

@epage
Copy link
Contributor

epage commented Apr 13, 2018

I'm working on a file system stager to help make lay out packages. Currently I'm using globwalk for a way to grab files by pattern.

In stager, I want to add exclusion support. I can add exclusions as an additional field but I got thinking, why not use gitignore syntax?

Example: ["*.rs", "!/tests/**/*.rs"] would find all rust source except in a test directory.

@epage
Copy link
Contributor Author

epage commented Apr 13, 2018

The idea is to take all paths returned by WalkDir, make them relative to the WalkDir root, and if there is a match from ignore::GitIgnore, then return it to the user

If this isn't something globwalk is interested in expanding into, I'll probably create a parallel crate globtrotter that does. In some respects, I'm hoping you want to keep globwalk simple because I like the name globtrotter ;).

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 13, 2018

I'm hesitant to implement this as you suggest since I've been careful not to inspect the strings passed to globset.

This might be achieved by using the filteriterator adapter

@epage
Copy link
Contributor Author

epage commented Apr 13, 2018

My thought isn't to use this (EDIT: ignore) on top of globset but to use ignore instead of globset.

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 13, 2018

I'm not sure it is possible to manually add glob-patterns to ignore without using an ignore file.

It is possible to use ignore, but it would still require usingglobset, and unless I've misunderstood something, it won't give you the features you want

@epage
Copy link
Contributor Author

epage commented Apr 13, 2018

I'm not sure it is possible to manually add glob-patterns to ignore without using an ignore file.

While the top level API of ignore deals with files on disk, the gitignore module works entirely in-memory. I'm already using it in cobalt for advanced blacklisting

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 13, 2018

Oh, I see, I'll look into it

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 19, 2018

I can't seem to make gitignore work with the existing API, and tbh I don't have much time to work on it right now.
You're welcome to open a PR but I don't think I'll be able to work on it.

Good day, :)

@epage
Copy link
Contributor Author

epage commented Apr 19, 2018

Oh, I was assuming I would until it sounded like you were going to jump on it. It might be a few days.

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 19, 2018 via email

@epage
Copy link
Contributor Author

epage commented Apr 19, 2018

I see two general options for using ignore

  • switch the API to GlobWalk::new(path).add(pattern) / GlobWalk::from_patterns(path, patterns)
  • Delay the turning GitignoreBuilder into Gitignore until it is time to iterate, requiring returning an error somewhere, possibly as one of the items from walking.

How do you feel about changing the API? Any preference on route?

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 19, 2018

I'd rather take the second option.

Maybe it's possible to replace the IntoIter impl with a build(self)->Result<Iter> or something.

@killercup I'd love to hear your 0.016€ (as of 21:10 19/04/2018 UTC)

@epage
Copy link
Contributor Author

epage commented Apr 19, 2018

So in looking at the different APIs and considering the trade offs, here are some random thoughts:

  • globwalk approach is good if a primary use case is cloning GlobWalk to apply it to a different path
  • I'm assuming globwalk is trying to match people's expectations coming from glob.
    • Except I'm assuming globwalk is only matching globs relative behavior but not absolute behavior.

      Return an iterator that produces all the Paths that match the given pattern, which may be absolute or relative to the current working directory.

    • This suggests there is an impedance mismatch in the two APIs (whether root and glob are distinct or combined) and trying to mirror globs API might mislead people trying to use it.
  • ignore seems to see the base dir as a required argument. globwalk assumes . is a reasonable default.
    • Will people coming across globwalk realize . is the default?
    • Is . common enough to be worth defaulting?

I personally lean towards globwalks original API / Gitignores API (::new(base)) rather than globwalks current API (base is defaulted).

I didn't even realize globwalks API changed until I started implementing this. I got confused; I didn't remember the API looking like this :)

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 19, 2018

Yes, the GlobWalk API was changed to mirror that of glob's.

I agree in that the base case of "iterate from CWD" is unlikely and probably a bad practice.
Personally I don't see myself using it when writing a CLI application.

Saying that, I'd rather we keep enough of the current API to be mostly a drop in replacement for glob. I wouldn't mind for a change as long as the freestanding glob function remains the same.

@epage
Copy link
Contributor Author

epage commented Apr 19, 2018

I'd rather we keep enough of the current API to be mostly a drop in replacement for glob

Except a drop-in replacement for globs relative or absolute behavior? Or do you plan to find a way to attempt to do both?

@Gilnaa
Copy link
Owner

Gilnaa commented Apr 19, 2018

I've just realised I'm making no sense. It's possible the have glob implemented by means of the API you propose, so I have no objectiond really  ¯_(ツ)_/¯

@killercup
Copy link
Contributor

@Gilnaa, I think I can even spare 0.019 CHF, I still have some from RustFest.

I'm a huge fan of the pattern syntax used in .gitignore files, which is also becoming more popular in other areas like file search in editors.

I'm not sure I get your discussion on glob-crate-like-API instead of a base path argument? Having a new as well as a new_from_cwd sounds like no big deal to me. Having both also makes it obvious what the default is.

@epage
Copy link
Contributor Author

epage commented Apr 19, 2018

In preparing my change, do you mind if its re-formatted with rustfmt 1.25 (dedicated commit, of course)? Or should I disable my editors auto-formatting?

epage added a commit to epage/globwalk that referenced this issue Apr 20, 2018
This change switches `globwalk` from basic glob syntax to using
gitignore syntax.

Benefits include
- Excluding content from being walked using `!`
- Controlling for depth with a leading `/`

Fixes Gilnaa#3

BREAKING CHANGES:
- API: `GlobWalk::new` and `GlobWalk::from_patterns` now accept the
  `base_dir` parameter again.
- Syntax: Some characters are now reserved for gitignore syntax.
@Gilnaa Gilnaa closed this as completed in #4 Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants