-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
This is the ideal FileType on Windows. You may not like it, but this is what peak performance looks like. #47956
Conversation
(rust_highfive has picked a reviewer for you, use r? to override) |
cc @roblourens Could you test this to determine whether this actually fixes it? |
src/libstd/sys/windows/fs.rs
Outdated
@@ -38,8 +38,9 @@ pub struct FileAttr { | |||
} | |||
|
|||
#[derive(Copy, Clone, PartialEq, Eq, Hash, Debug)] | |||
pub enum FileType { | |||
Dir, File, SymlinkFile, SymlinkDir, ReparsePoint, MountPoint, | |||
pub stuct FileType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
struct
The fact that this had to be rewritten does not bode well
r? @BurntSushi (some random libs team member) |
Closed by accident because of github silliness. |
This commit fixes a bug on Windows where directory traversals were completely broken when attempting to scan OneDrive directories that use the "file on demand" strategy. The specific problem was that Rust's standard library treats OneDrive directories as reparse points instead of directories, which causes methods like `FileType::is_file` and `FileType::is_dir` to always return false, even when retrieved via methods like `metadata` that purport to follow symbolic links. We fix this by peppering our code with checks on the underlying file attributes exposed by Windows. We consider an entry a directory if and only if the directory bit is set on the attributes. We are careful to make sure that the code remains the same on non-Windows platforms. Note that we also bump the dependency on `walkdir`, which contains a similar fix for its traversals. This bug is recorded upstream: rust-lang/rust#46484 Upstream also has a pending PR: rust-lang/rust#47956 Fixes #705
cc @rust-lang/libs This PR fixes a pretty bad bug that impacts any program on Windows that interacts with the file system and OneDrive directories. This was originally reported to me via the VS code folks on ripgrep: BurntSushi/ripgrep#705 --- Basically, the issue here is that OneDrive directories are reported as reparse points, and I managed to work around this bug, but the work-around is gnarly. You basically need to replace all With all that said, I am by no means a Windows person, so I don't actually know whether this PR is correct. But, it is consistent with my own fix to the problem (which was helpfully informed by @retep998 and @roblourens). |
I'm running short on time, but I'm now doubting this solution. I think @retep998 mentioned this to me on IRC, but the problem here is that this change will cause any entry that is a directory to return true for This logic change, for example, ended up requiring changes like this in |
The problem is a fundamental difference in how Windows and Unix handle these sorts of attributes. On Windows whether a given path is a file or a directory is a completely separate question from whether it is a reparse point, whereas on Unix files directories and symbolic links are all mutually exclusive. Normally this wouldn't matter so much because you'd typically only have to deal with something being a reparse point or symbolic link when you specifically ask for |
I like how matching the attributes is folded into the functions here. But I don't agree this is the best solution, as it pretty heavily changes the existing behavior. Yes, it matches better with the Windows model of things, but makes it harder to write something cross-platform. What do you think about only changing the final match condition, so that it reports a reparse point as a directory or file if the reparse point is not recognized as a symlink or directory junction? I think that would also solve the bug with OneDrive, and not be a breaking change. |
@pitdicker Interesting. I also believe that would solve the OneDrive bug. |
What if there is a reparse point which does act like a symbolic link, but is not recognized as a symlink or directory junction? What would happen if someone called I think the best we could do is change how we construct |
@retep998 So I still feel like we're missing something here. If a reparse point is considered a symlink, and OneDrive directories are reparse points, then that means we're treating OneDrive directories as a symlinks. But that doesn't seem right to me? They aren't necessarily symlinked to something else on the file system. For example, if I were removing a OneDrive directory, I would expect it to recursively descend and remove it where as I wouldn't expect that behavior with symlinks (unless the routine was specifically instructed to follow symlinks). What else are reparse points used for on Windows? |
(This is the first time I've read about this but I believe this summary is correct) @BurntSushi The good news is that the surrogate bit in the reparse tag specifies if "the file or directory represents another named entity in the system". That sounds like symlink behavior to me! I made a program to test this behavior. Here are the results on my computer with Files On-Demand enabled:
So the flag works as I'd expect. Therefore, I'd argue the correct way to implement fn IsReparseTagNameSurrogate(reparse_tag: ULONG) -> bool {
(reparse_tag & 0x20000000) == 0x20000000
}
pub fn is_symlink(&self) -> bool {
self.is_reparse_point() && IsReparseTagNameSurrogate(self.reparse_tag)
} Edit: |
@mattico Nice find! If that is indeed the case then perhaps that is the secret sauce we need to handle reparse points. @roblourens @retep998 What do you think? |
@roblourens I disagree that mount points need to be handled differently than symlinks. Mount Points are basically identical to Junctions, except that mount points can point into volumes that aren't mounted with a drive letter. The documentation indicates that they were added to aid users who are using more volumes than can be assigned to 26 drive letters. I'd argue, therefore, that the purpose of Mount Points is not to be "a junction except that applications will pretend it is a regular directory", but to be "a junction, but I ran out of drive letters". Second, I believe that (anyway, I'm not super concerned about this because NTFS mount points are rarely used so as long as it's documented... meh)
Note that, despite being |
According to posix, to get the file type, you can mask In addition, such posixy platforms have a single function for creating symlinks. There is no distinction between a directory symlink and a file symlink in the API (unlike Windows where |
Ping from triage, @BurntSushi ! |
@rust-lang/libs I'd like to nominate this PR for merging, since I think it is in a good place. Allow me to briefly summarize this PR. An issue was reported recently on Windows where So what does this PR do? It effectively rewrites The reason why this issue is coming up now is because it seems like there was a relatively recent change to cause OneDrive's "files on demand" feature to use reparse points. You can see that others are also grappling with a similar issue:
In light of the above I think we are on very solid ground when we say that the existing behavior is incorrect and should be fixed. Applications and libraries can work around it (like I have), but it is a monstrous pain. :-) |
Thanks for the extensive report @BurntSushi! Out of curiosity, do you have an example of situations that don't want this sort of logic? Are there examples programs which will break as a result of this change, for example relying on the fact that I'm inclined to merge given the fallout here, but would just like to understand the potential fallout here! |
@alexcrichton I actually can't think of any case (and haven't explicitly come across one), but I think someone more familiar with Windows should probably answer that. cc @roblourens @retep998 @mattico |
(I'm going to equate "use of a non-surrogate reparse point" with "the OneDrive folder" for expediency) I expect these functions are mostly used for recursive directory traversal. Users may have noticed that the OneDrive folder is special and isn't affected by these methods. I can imagine some cases where this behavior could be relied upon:
The deletion scenario is slightly scary because it opens the door to data loss. I don't think it's an issue in practice because:
|
I added an unstable This might need an FCP to be added, along with a tracking issue. |
@retep998 If there is any controversy around |
In this case I argue software should be paying attention to attributes like |
Ok thanks for the info @mattico! This seems fine to land by me |
@alexcrichton I think an unstable API was added to this since I last commented. Should we get tickboxes from the std team to land it? (I think the new unstable API is probably uncontroversial and is a Windows specific API.) |
@BurntSushi nah unstable APIs can land at any time, I think this may want to hold off on waiting for beta to branch (to help give it maximal time to bake on nightly), but other than that I think this can be r+'d whenever. |
@bors r+ |
📌 Commit 9269e83 has been approved by |
Thanks again @retep998! 🎉 |
This is the ideal FileType on Windows. You may not like it, but this is what peak performance looks like. Theoretically this would fix #46484 The current iteration of this PR should not cause existing code to break, but instead merely improves handling around reparse points. Specifically... * Reparse points are considered to be symbolic links if they have the name surrogate bit set. Name surrogates are reparse points that effectively act like symbolic links, redirecting you to a different directory/file. By checking for this bit instead of specific tags, we become much more general in our handling of reparse points, including those added by third parties. * If something is a reparse point but does not have the name surrogate bit set, then we ignore the fact that it is a reparse point because it is actually a file or directory directly there, despite having additional handling by drivers due to the reparse point. * For everything which is not a symbolic link (including non-surrogate reparse points) we report whether it is a directory or a file based on the presence of the directory attribute bit. * Notably this still preserves invariant that when `is_symlink` returns `true`, both `is_dir` and `is_file` will return `false`. The potential for breakage was far too high. * Adds an unstable `FileTypeExt` to allow users to determine whether a symbolic link is a directory or a file, since `FileType` by design is incapable of reporting this information.
☀️ Test successful - status-appveyor, status-travis |
Can this be marked stable? I need it when copying around windows directory trees, to be able to correctly re-create symlinks. The alternative approach is to check the type of the link target, but that fails or gives wrong result if:
|
Theoretically this would fix #46484
The current iteration of this PR should not cause existing code to break, but instead merely improves handling around reparse points. Specifically...
is_symlink
returnstrue
, bothis_dir
andis_file
will returnfalse
. The potential for breakage was far too high.FileTypeExt
to allow users to determine whether a symbolic link is a directory or a file, sinceFileType
by design is incapable of reporting this information.