-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
gh-88569: add ntpath.isreserved()
#95486
Conversation
In
Also, a name that ends with a space or dot should be reserved. The system's path normalization strips trailing spaces and dots from the final path component (e.g. "spam. . ." -> "spam"). Path normalization applies to all cases except opening "\\?\" extended paths. Also, the characters In file systems that support file streams, |
Not sure if it's relevant to this function but Windows 11 has greatly simplified how reserved names work. Now |
In Windows 11, path normalization no longer special cases a DOS device name if it has an extension (e.g. "con.txt"). Also a DOS device name isn't special cased if it's the leaf component of a path -- except for the "NUL" device. For example:
However, since DOS device names are still special cased as unqualified names and still reserved by the SMB server, they should still be avoided as file names. For example, creating a file named "con" in the current working directory would have to use the path "./con". |
@eryksun what should |
Isn't that a cross-platform question? The names "." and ".." are reserved in POSIX and Windows. For ".." it always has to be exact. Otherwise trailing dots and spaces are stripped in Windows. For example:
|
That would make |
I'm referring to the base name, i.e. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea, but I think there's an extraneous function call to take out. I would also like @eryksun to make sure we are copying over the right implementation details, or make any tweaks as necessary now if possible.
A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated. Once you have made the requested changes, please leave a comment on this pull request containing the phrase |
Thinking out loud: There's probably a distinction between reserved names and invalid names, analogous to the distinction between reserved Python keywords (
|
Does this mean that you don't want to reserve names that contain the wildcard characters What about names that are illegal to create because they're reserved in other contexts? That's the case when creating a file with a DOS device name on an SMB share. SMB disallows creating a file named "nul", for example, because it causes problems with accessing the file directly on the server (e.g. as "C:\share\nul"). What about base names that contain colons? For example, in an NTFS filesystem, "spam:00" creates a file named "spam" that contains a data stream named "00". That can surprise even experienced developers, particularly if they come from a POSIX background. In a FAT filesystem, "spam:00" is an invalid name. In a VboxSharedFolderFS (VirtualBox) filesystem with a Linux host, "spam:00" is allowed as the literal name. What about a name that changes when accessed because trailing dots and spaces are stripped? For example, "spam ." -> "spam". What about "." and ".." components in "\\?\" extended paths? In Windows, "." and ".." components are handled in the user-mode API. However, normalization is skipped for a "\\?\" extended path, so the filesystem is passed a path that contains "." and ".." components. The results can be dysfunctional and surprising. For example, FAT filesystems (volume "E:" in this example) allow creating regular files named "." and "..":
The first two are the required "." and ".." entries, which are directories (16) and have no short name. The last two are regular files with the archive attribute set (32) and legacy short names. Otherwise, FAT filesystems don't support opening paths with literal "." and ".." entries. The open fails with
NTFS fails all three of the above cases with |
@eryksun whats your view on the existing |
The As to reserved characters and "." and ".." components in extended paths, I'm just putting it out there for discussion. I'm less concerned about the five wildcard characters. They're always disallowed in base file names (but not stream names). A filesystem that didn't exclude them would be broken. Creating a file named "spam?.txt" isn't going to magically succeed in a surprising way. But a filesystem might allow |
Co-authored-by: Eryk Sun <eryksun@gmail.com>
The following check would need to be removed from # UNC paths are never reserved.
self.assertIs(False, P('//my/share/nul/con/aux').is_reserved()) Though this entire pathlib test is redundant now. As you consolidate the implementations, you might want to remove redundant tests, and only keep tests that are specific to how |
Considering the rules about which paths are reserved have changed in recent Windows releases (e.g. With a suitable warning in the docs (e.g. "this is an approximation of Windows rules as of this Python release; actual paths may differ, and this function may be updated as changed rules become more widely used"), I think it's okay to have in ntpath, and would deprecated PurePath. |
Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Even in Windows 11, the SMB redirector still reserves names of legacy DOS devices in the filename part of an open/create (e.g. NUL, CON, PRN, AUX, LPT1-LPT9, COM1-COM9). At least SMB has never reserved DOS device names with a file extension (e.g. "con.txt"). And at least it just denies access instead of trying to access a remote device. It would be nice if the next version of the SMB protocol allowed legacy DOS device names in the filename part of an open/create. |
I don't have any other concerns with this. However, I think we should refer to |
@brettcannon Your objection was a while ago and I believe has been addressed. You don't need to re-review unless you want to, but I think we'd like an ack that you aren't blocking the PR anymore |
@barneygale I refreshed my review and I'm not blocking this. 🙂 |
os.path.isreserved()
ntpath.isreserved()
@zooba just checking you're happy for me to merge? |
Yeah, go ahead |
Thanks everyone for the help with this |
Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON". Deprecate `pathlib.PurePath.is_reserved()`. --------- Co-authored-by: Eryk Sun <eryksun@gmail.com> Co-authored-by: Brett Cannon <brett@python.org> Co-authored-by: Steve Dower <steve.dower@microsoft.com>
When invoking `git` to find the configuration file path associated with the `git` installation itself, this sets `GIT_DIR` to a path that cannot be a `.git` directory for any repository, to keep `git config -l` from including any local scope entries in the output of the `git config -l ...` command that is used to find the origin for the first Git configuration variable. Specifically, a path to the null device is used. This is `/dev/null` on Unix and `NUL` on Windows. This is not a directory, and when treated as a file it is always treated as empty: reading from it, if successful, reaches end-of-file immediately. This problem is unlikely since GitoxideLabs#1523, which caused this `git` invocation to use a `/tmp`-like location (varying by system and environment) as its current working directory. Although the goal of that change was just to improve performance, it pretty much fixed the bug where local-scope configuration could be treated as installation-level configuration when no configuration variables are available from higher scopes. This change further hardens against two edge cases: - If the `git` command is an old and unpatched vulnerable version in which `safe.directory` is not yet implemented, or in which GHSA-j342-m5hw-rr3v or other vulnerabilities where `git` would perform operations on untrusted local repositories owned by other users are unpatched, then a `.git` subdirectory of a shared `/tmp` or `/tmp`-like directory could be created by another account, and its local configuration would still have been used. (This is not a bug in gitoxide per se; having vulnerable software installed that other software may use is inherently insecure. But it is nice to offer a small amount of protection against this when readily feasible.) - If the `/tmp`-like location is a Git repository owned by the current user, then its local configuration would have been used. Any path guaranteed to point to a nonexistent entry or one that is guaranteed to be (or to be treated as) an empty file or directory should be sufficient here. Using the null device, even though it is not directory-like, seems like a reasonably intuitive way to do it. A note for Windows: There is more than one reasonable path to the null device. One is DOS-style relative path `NUL`, as used here. One of the others, which `NUL` in effect resolves to when opened, is the fully qualified Windows device namespace path `\\.\NUL`. I used the former here to ensure we avoid any situation where `git` would misinterpret a `\` in `\\.\NUL` in a POSIX-like fashion. This seems unlikely, and it could be looked into further if reasons surface to prefer `\\.\NUL`. One possible reason to prefer `\\.\NUL` is that which names are treated as reserved legacy DOS device names changes from version to version of Windows, with Windows 11 treating some of them as ordinary filenames. However, while this affects names such as `CON`, it does not affect `NUL`, at least written unsuffixed. I'm not sure if any Microsoft documentation has yet been updated to explain this in detail, but see: - dotnet/docs#41193 - python/cpython#95486 (comment) - python/cpython#95486 (comment) At least historically, it has been possible on Windows, though rare, for the null device to be absent. This was the case on Windows Fundamentals for Legacy PCs (WinFPE). Even if that somehow were ever to happen today, this usage should be okay, because attempting to open the device would still fail rather than open some other file (as may even be happening in Git for Windows already), the name `NUL` would still presumably be reserved (much as the names `COM?` where `?` is replaced with a Unicode superscript 1, 2, or 3 are reserved even though those devices don't really exist), and I think `git config -l` commands should still shrug off the error opening the file and give non-local-scope configuration, as it does when `GIT_DIR` is set to a nonexistent location.
Add `ntpath.isreserved()`, which identifies reserved pathnames such as "NUL", "AUX" and "CON". Deprecate `pathlib.PurePath.is_reserved()`. --------- Co-authored-by: Eryk Sun <eryksun@gmail.com> Co-authored-by: Brett Cannon <brett@python.org> Co-authored-by: Steve Dower <steve.dower@microsoft.com>
Add
ntpath.isreserved()
, which identifies reserved pathnames such as "NUL", "AUX" and "CON".Deprecate
pathlib.PurePath.is_reserved()
.