Skip to content

Commit

Permalink
FOLLOW should control traversal of symlinks, not return of them (#223)
Browse files Browse the repository at this point in the history
* `FOLLOW` should control traversal of symlinks, not return of them

Some fixes to globmatch symlinks are also included

* Add more symlink cases for globmatch

* Windows fix

* Bump version to 10 as symlik change may be surprising
  • Loading branch information
facelessuser authored Sep 24, 2024
1 parent 2939d2a commit 35e9cac
Show file tree
Hide file tree
Showing 7 changed files with 56 additions and 30 deletions.
7 changes: 7 additions & 0 deletions docs/src/markdown/about/changelog.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,12 @@
# Changelog

## 10.0

- **NEW**: Symlinks should not be traversed when `GLOBSTAR` is enabled unless `FOLLOW` is also enabled, but they
should still be matched. Prior to this change, symlinks were not traversed _and_ they were ignored from matching
which contradicts how Bash works, which is are general target.
- **FIX**: Fix some inconsistencies with `globmatch` and symlink handling when `REALPATH` is enabled.

## 9.0

- **NEW**: Remove deprecated function `glob.raw_escape`.
Expand Down
12 changes: 6 additions & 6 deletions docs/src/markdown/glob.md
Original file line number Diff line number Diff line change
Expand Up @@ -432,7 +432,7 @@ False
If you would like for `globmatch` (or [`globfilter`](#globfilter)) to operate on your current filesystem directly,
simply pass in the [`REALPATH`](#realpath) flag. When enabled, the path under consideration will be analyzed and
will use that context to determine if the file exists, if it is a directory, does it's context make sense compared to
what the pattern is looking vs the current working directory, or if it has symlinks that should not be matched by
what the pattern is looking vs the current working directory, or if it has symlinks that should not be traversed by
[`GLOBSTAR`](#globstar).

Here we use [`REALPATH`](#realpath) and can see that `globmatch` now knows that `doc` is a directory.
Expand Down Expand Up @@ -529,7 +529,7 @@ Path-like object input support is only available in Python 3.6+ as the path-like
Like [`globmatch`](#globmatch), `globfilter` does not operate directly on the file system, with all the caveats
associated. But you can enable the [`REALPATH`](#realpath) flag and `globfilter` will use the filesystem to gain
context such as: whether the file exists, whether it is a directory or not, or whether it has symlinks that should not
be matched by `GLOBSTAR`. See [`globmatch`](#globmatch) for examples.
be traversed by `GLOBSTAR`. See [`globmatch`](#globmatch) for examples.

/// new | New 5.1
- `root_dir` was added in 5.1.0.
Expand Down Expand Up @@ -754,8 +754,8 @@ file matches the excluded pattern. Essentially, it means if you use a pattern su
patterns were given: `**` and `!*.md`, where `!*.md` is applied to the results of `**`, and `**` is specifically treated
as if [`GLOBSTAR`](#globstar) was enabled.

Dot files will not be returned unless [`DOTGLOB`](#dotglob) is enabled. Symlinks will also be ignored in the return
unless [`FOLLOW`](#follow) is enabled.
Dot files will not be returned unless [`DOTGLOB`](#dotglob) is enabled. Symlinks will also not be traversed unless
[`FOLLOW`](#follow) is enabled.

#### `glob.MINUSNEGATE, glob.M` {: #minusnegate}

Expand All @@ -768,7 +768,7 @@ When `MINUSNEGATE` is used with [`NEGATE`](#negate), exclusion patterns are reco

#### `glob.FOLLOW, glob.L` {: #follow}

`FOLLOW` will cause [`GLOBSTAR`](#globstar) patterns (`**`) to match and traverse symlink directories.
`FOLLOW` will cause [`GLOBSTAR`](#globstar) patterns (`**`) to traverse symlink directories.

#### `glob.REALPATH, glob.P` {: #realpath}

Expand All @@ -784,7 +784,7 @@ file path for the given system it is running on. It will augment the patterns us
logic so that the path must meet the following in order to match:

- Path must exist.
- Directories that are symlinks will not be matched by [`GLOBSTAR`](#globstar) patterns (`**`) unless the
- Directories that are symlinks will not be traversed by [`GLOBSTAR`](#globstar) patterns (`**`) unless the
[`FOLLOW`](#follow) flag is enabled.
- When presented with a pattern where the match must be a directory, but the file path being compared doesn't indicate
the file is a directory with a trailing slash, the command will look at the filesystem to determine if it is a
Expand Down
4 changes: 2 additions & 2 deletions tests/test_glob.py
Original file line number Diff line number Diff line change
Expand Up @@ -510,7 +510,7 @@ class Testglob(_TestGlob):
] if not can_symlink() else [
('',), ('aab',), ('aab', 'F'), ('a',), ('a', 'bcd'), ('a', 'bcd', 'EF'), ('a', 'bcd', 'efg'),
('a', 'bcd', 'efg', 'ha'), ('a', 'D'), ('aaa',), ('aaa', 'zzzF'), ('EF',), ('ZZZ',),
('sym1',), ('sym2',)
('sym1',), ('sym2',), ('sym3',)
],
glob.L
],
Expand Down Expand Up @@ -553,7 +553,7 @@ class Testglob(_TestGlob):
[
('EF',), ('ZZZ',), ('',)
] if not can_symlink() else [
('EF',), ('ZZZ',), ('',), ('sym1',), ('sym2',)
('EF',), ('ZZZ',), ('',), ('sym1',), ('sym2',), ('sym3',)
],
glob.N | glob.L
],
Expand Down
2 changes: 2 additions & 0 deletions tests/test_globmatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -1606,12 +1606,14 @@ def test_globmatch_symlink(self):

self.assertFalse(glob.globmatch(self.tempdir + '/sym1/a.txt', '**/*.txt}', flags=self.default_flags))
self.assertTrue(glob.globmatch(self.tempdir + '/a.txt', '**/*.txt', flags=self.default_flags))
self.assertTrue(glob.globmatch(self.tempdir + '/sym1/', '**', flags=self.default_flags))

def test_globmatch_follow_symlink(self):
"""Test `globmatch` with symlinks that we follow."""

self.assertTrue(glob.globmatch(self.tempdir + '/sym1/a.txt', '**/*.txt', flags=self.default_flags | glob.L))
self.assertTrue(glob.globmatch(self.tempdir + '/a.txt', '**/*.txt', flags=self.default_flags | glob.L))
self.assertTrue(glob.globmatch(self.tempdir + '/sym1/', '**', flags=self.default_flags))

def test_globmatch_trigger_symlink_cache(self):
"""Use a pattern that exercises the symlink cache."""
Expand Down
2 changes: 1 addition & 1 deletion wcmatch/__meta__.py
Original file line number Diff line number Diff line change
Expand Up @@ -193,5 +193,5 @@ def parse_version(ver: str) -> Version:
return Version(major, minor, micro, release, pre, post, dev)


__version_info__ = Version(9, 0, 0, "final")
__version_info__ = Version(10, 0, 0, "final")
__version__ = __version_info__._get_canonical()
57 changes: 37 additions & 20 deletions wcmatch/_wcmatch.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,30 @@
# Right half can return an empty set if not supported
SUPPORT_DIR_FD = {os.open, os.stat} <= os.supports_dir_fd and os.scandir in os.supports_fd


RE_WIN_MOUNT = (
re.compile(r'\\|[a-z]:(?:\\|$)', re.I),
re.compile(br'\\|[a-z]:(?:\\|$)', re.I)
re.compile(r'\\|/|[a-z]:(?:\\|/|$)', re.I),
re.compile(br'\\|/|[a-z]:(?:\\|/|$)', re.I)
)
RE_MOUNT = (
re.compile(r'/'),
re.compile(br'/')
)
RE_WIN_SPLIT = (
re.compile(r'\\|/'),
re.compile(br'\\|/')
)
RE_SPLIT = (
re.compile(r'/'),
re.compile(br'/')
)
RE_WIN_STRIP = (
r'\\/',
br'\\/'
)
RE_STRIP = (
r'/',
br'/'
)


class _Match(Generic[AnyStr]):
Expand Down Expand Up @@ -49,8 +64,7 @@ def _fs_match(
self,
pattern: Pattern[AnyStr],
filename: AnyStr,
is_dir: bool,
sep: AnyStr,
is_win: bool,
follow: bool,
symlinks: dict[tuple[int | None, AnyStr], bool],
root: AnyStr,
Expand All @@ -65,36 +79,37 @@ def _fs_match(
We only check for the symlink if we know we are looking at a directory.
And we only call `lstat` if we can't find it in the cache.
We know it's a directory if:
We know we need to check the directory if:
1. If the base is a directory, all parts are directories.
2. If we are not the last part of the `globstar`, the part is a directory.
3. If the base is a file, but the part is not at the end, it is a directory.
1. If the match has not reached the end of the path and directory is in `globstar` match.
2. Or the match is at the end of the path and the directory is not the last part of `globstar` match.
"""

matched = False
split = (RE_WIN_SPLIT if is_win else RE_SPLIT)[self.ptype] # type: Any
strip = (RE_WIN_STRIP if is_win else RE_STRIP)[self.ptype] # type: Any

end = len(filename)
end = len(filename) - 1
base = None
m = pattern.fullmatch(filename)
if m:
matched = True
# Lets look at the captured `globstar` groups and see if that part of the path
# contains symlinks.
if not follow:
last = len(m.groups())
try:
for i, star in enumerate(m.groups(), 1):
if star:
at_end = m.end(i) == end
parts = star.strip(sep).split(sep)
parts = split.split(star.strip(strip))
if base is None:
base = os.path.join(root, filename[:m.start(i)])
for part in parts:
last_part = len(parts)
for j, part in enumerate(parts, 1):
base = os.path.join(base, part)
key = (dir_fd, base)
if is_dir or i != last or not at_end:
if not at_end or (at_end and j != last_part):
is_link = symlinks.get(key, None)
if is_link is None:
if dir_fd is None:
Expand Down Expand Up @@ -125,13 +140,15 @@ def _match_real(
) -> bool:
"""Match real filename includes and excludes."""

temp = '\\' if util.platform() == "windows" else '/'
is_win = util.platform() == "windows"

if isinstance(self.filename, bytes):
sep = os.fsencode(temp)
sep = b'/'
is_dir = (RE_WIN_SPLIT if is_win else RE_SPLIT)[1].match(self.filename[-1:]) is not None
else:
sep = temp
sep = '/'
is_dir = (RE_WIN_SPLIT if is_win else RE_SPLIT)[0].match(self.filename[-1:]) is not None

is_dir = self.filename.endswith(sep)
try:
if dir_fd is None:
is_file_dir = os.path.isdir(os.path.join(root, self.filename))
Expand All @@ -153,14 +170,14 @@ def _match_real(

matched = False
for pattern in self.include:
if self._fs_match(pattern, filename, is_dir, sep, self.follow, symlinks, root, dir_fd):
if self._fs_match(pattern, filename, is_win, self.follow, symlinks, root, dir_fd):
matched = True
break

if matched:
if self.exclude:
for pattern in self.exclude:
if self._fs_match(pattern, filename, is_dir, sep, True, symlinks, root, dir_fd):
if self._fs_match(pattern, filename, is_win, True, symlinks, root, dir_fd):
matched = False
break

Expand Down
2 changes: 1 addition & 1 deletion wcmatch/glob.py
Original file line number Diff line number Diff line change
Expand Up @@ -665,7 +665,7 @@ def _glob_dir(

path = os.path.join(curdir, file)
follow = not is_link or self.follow_links
if (matcher is None and not hidden and (follow or not deep)) or (matcher and matcher(file)):
if (matcher is None and not hidden) or (matcher and matcher(file)):
yield path, is_dir

if deep and not hidden and is_dir and follow:
Expand Down

0 comments on commit 35e9cac

Please sign in to comment.