-
-
Notifications
You must be signed in to change notification settings - Fork 30.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[doc] os.walk() consider some symlinks as dirs instead of non-dirs #57179
Comments
Consider code: for (root, dirs, nondirs) in os.walk(path, followlinks=False):
print (nondirs) This code will not print symlinks that refer to some dir. I think it is the bug. In other words: If followlinks is True, we should consider some symlinks as dirs. If not, any symlink is the non-dir. Patch included. Also, please fix documentation about this nuance. |
Also, there is some mis-optimisation for followlinks=False: stat() and then lstat() will be called. Instead of one lstat(). Code may be rewritten as (but I don't know about cross-platform issues): if followlinks:
mode = os.stat(path).st_mode
else:
mode = os.lstat(path).st_mode
if stat.S_ISDIR(mode):
dirs.append(path)
else:
nondir.append(path) It will be much cleaner than current (or patched with my patch) implementation |
This behaviour came up recently when implementing os.fwalk() [1]. There are problems with all 3 possible approaches (list as dirs, list as files, don't list at all) when followlinks is False. Since all alternatives are potentially surprising, the current behaviour wins by default (as people will already have written their code to cope with that behaviour and there's no net gain in changing the default, since the desired treatment of such links will vary according to the task at hand). As a result, I'm converting this to a pure documentation issue - the os.walk() docs should definitely mention this subtlety. The behaviour won't be changing, though. |
I like the function as it is documented, i.e. "filenames is a list of the names of the non-directory files in dirpath.". This includes all symlinks (in the followlinks=False cast at least). I'd say not including symlinks to directories but symlinks to files is a magnitude more surprising than treating a symlink to a directory as a file. And if you consider this as a short comming of the documentation this isn't (IMHO) a subtlety. The (my?) intuition says: all entries of a root (apart from . and .. as documented) are included in either dirnames or filenames. Yes, changing behaviour here might break some code, but this applies to all changes. For some usecases it might be right to just skip over symlinks-to-dirs, but if it's not you have to opendir + read all root entries again in the loop to find all symlinks which effectively means reimplementing os.walk. |
I've updated os.walk() documentation to mention that *dirnames* list To imitate the other two cases:
for dirpath, dirnames, files in os.walk(top):
dirs = []
for name in dirnames:
(files if islink(join(dirpath, name)) else dirs).append(name)
dirnames = dirs
for dirpath, dirnames, files in os.walk(top):
dirnames[:] = [name for name in dirnames
if not islink(join(dirpath, name))] where islink = os.path.islink and join = os.path.join. I've uploaded the documentation patch. Please, review. |
(cherry picked from commit 0f498f1) Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
(cherry picked from commit 0f498f1) Co-authored-by: Stanley <46876382+slateny@users.noreply.github.com>
* main: (38 commits) pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058) pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060) pythongh-57179: Add note on symlinks for os.walk (python#94799) pythongh-94808: Fix regex on exotic platforms (python#98036) pythongh-90085: Remove vestigial -t and -c timeit options (python#94941) pythonGH-83901: Improve Signature.bind error message for missing keyword-only params (python#95347) pythongh-61105: Add default param, note on using cookiejar subclass (python#95427) pythongh-96288: Add a sentence to `os.mkdir`'s docstring. (python#96271) pythongh-96073: fix backticks in NEWS entry (pythonGH-98056) pythongh-92886: [clinic.py] raise exception on invalid input instead of assertion (pythonGH-98051) pythongh-97997: Add col_offset field to tokenizer and use that for AST nodes (python#98000) pythonGH-88968: Reject socket that is already used as a transport (python#98010) pythongh-96346: Use double caching for re._compile() (python#96347) pythongh-91708: Revert params note in urllib.parse.urlparse table (python#96699) pythongh-96265: Fix some formatting in faq/design.rst (python#96924) pythongh-73196: Add namespace/scope clarification for inheritance section (python#92840) pythongh-97646: Change `.js` and `.mjs` files mimetype to conform to RFC 9239 (python#97934) pythongh-97923: Always run Ubuntu SSL tests with others in CI (python#97940) pythongh-97956: Mention `generate_global_objects.py` in `AC How-To` (python#97957) pythongh-96959: Update HTTP links which are redirected to HTTPS (python#98039) ...
* main: (5519 commits) Minor edits to the Descriptor HowTo Guide (pythonGH-24901) Fix link to Lifecycle of a Pull Request in CONTRIBUTING (python#98102) pythonGH-94597: deprecate `SafeChildWatcher`, `FastChildWatcher` and `MultiLoopChildWatcher` child watchers (python#98089) Auto-cancel old builds when new commit pushed to branch (python#98009) pythongh-95011: Migrate syslog module to Argument Clinic (pythonGH-95012) pythongh-68686: Retire eptag ptag scripts (python#98064) pythongh-97922: Run the GC only on eval breaker (python#97920) GitHub Workflows security hardening (python#96492) Add `@ezio-melotti` as codeowner for `.github/`. (python#98079) pythongh-97913 Docs: Add walrus operator to the index (python#97921) [doc] Fix broken links to C extensions accelerating stdlib modules (python#96914) pythongh-97822: Fix http.server documentation reference to test() function (python#98027) pythongh-91052: Add PyDict_Unwatch for unwatching a dictionary (python#98055) pythonGH-98023: Change default child watcher to PidfdChildWatcher on supported systems (python#98024) pythonGH-94182: Run the PidfdChildWatcher on the running loop (python#94184) pythongh-92886: make test_ast pass with -O (assertions off) (pythonGH-98058) pythongh-92886: make test_coroutines pass with -O (assertions off) (pythonGH-98060) pythongh-57179: Add note on symlinks for os.walk (python#94799) pythongh-94808: Fix regex on exotic platforms (python#98036) pythongh-90085: Remove vestigial -t and -c timeit options (python#94941) ...
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: