Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

path.push() should work as expected on windows verbatim paths #89270

Merged
merged 1 commit into from
Oct 5, 2021

Conversation

seanyoung
Copy link
Contributor

On Windows, std::fs::canonicalize() returns an so-called UNC path. UNC paths differ with regular paths because:

  • This type of path can much longer than a non-UNC path (32k vs 260 characters).
  • The prefix for a UNC path is Component::Prefix(Prefix::DiskVerbatim(..)))
  • No / is allowed
  • No . is allowed
  • No .. is allowed

Rust has poor handling of such paths. If you join a UNC path with a path with any of the above, then this will not work.

I've implemented a new method fn join_fold() which joins paths and also removes any . and .. from it, and replaces / with \ on Windows. Using this function it is possible to use UNC paths without issue. In addition, this function is useful on Linux too; paths can be appended without having to call canonicalize() to remove the . and ...

This PR needs test cases, which can I add. I hope this will a start of a discussion.

@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member

the8472 commented Sep 26, 2021

This mixes two orthogonal functions join and an often-proposed but currently not implemented normalize. What's the benefit of putting them into a single method?

  • No . is allowed
  • No .. is allowed

The microsoft docs say that those actually are allowed in paths with the verbatim prefix and will access files named . and .. instead of traversing directories. So interpreting those differently can have the opposite of the intended effect of using the verbatim prefix.

@ChrisDenton
Copy link
Member

I think it would be great to have something that addresses this but there I there should perhaps be a holistic look at Windows path handling in Rust before adding adhoc methods as there are currently numerous open issues with paths. Discussions like this are probably best had on the internals forum or Zulip.

Personally speaking, my thinking at the moment is that push should "just work" for verbatim paths, with the path being pushed normalized according to platform conventions as this is what's wanted 99.999999% of the time (approx). There might have to be a push_verbatim for pushing paths that you want to include ., .. or / components. Though it would be very rarely used because Windows filesystems do not allow these as file names (and indeed they would screw up non-verbatim paths if they were allowed).

I guess path.push(normalize(subpath)) would also work but I'd prefer the common case to be simpler to do than the exceedingly rare case. Otherwise it's too easy to do the wrong thing unless you have knowledge of the arcane corners of Windows.

@rust-log-analyzer

This comment has been minimized.

@seanyoung seanyoung force-pushed the join_fold branch 2 times, most recently from 6bacc73 to 8575dbb Compare September 27, 2021 07:13
@rust-log-analyzer

This comment has been minimized.

@seanyoung
Copy link
Contributor Author

I think it would be great to have something that addresses this but there I there should perhaps be a holistic look at Windows path handling in Rust before adding adhoc methods as there are currently numerous open issues with paths. Discussions like this are probably best had on the internals forum or Zulip.

That is probably a good idea. In order to facilitate that discussion, I'm updating this PR.

Personally speaking, my thinking at the moment is that push should "just work" for verbatim paths, with the path being pushed normalized according to platform conventions as this is what's wanted 99.999999% of the time (approx). There might have to be a push_verbatim for pushing paths that you want to include ., .. or / components. Though it would be very rarely used because Windows filesystems do not allow these as file names (and indeed they would screw up non-verbatim paths if they were allowed).

This is a good point and I totally agree this. I've updated this PR to do exactly that.

I guess path.push(normalize(subpath)) would also work but I'd prefer the common case to be simpler to do than the exceedingly rare case. Otherwise it's too easy to do the wrong thing unless you have knowledge of the arcane corners of Windows.

path.push(normalize(subpath)) would not work. If subpath is ../foo then path.push() still needs to have special handling for verbatim paths.

@seanyoung
Copy link
Contributor Author

This mixes two orthogonal functions join and an often-proposed but currently not implemented normalize. What's the benefit of putting them into a single method?

normalize() would not solve the issue. I have changed the PR so now join() works with verbatim paths.

  • No . is allowed
  • No .. is allowed

The microsoft docs say that those actually are allowed in paths with the verbatim prefix and will access files named . and .. instead of traversing directories. So interpreting those differently can have the opposite of the intended effect of using the verbatim prefix.

I don't think this is true. I've tried creating . and .. files using unc paths and it does not work. Besides, even the folks at microsoft are not insane. This would be a terrible idea.

@the8472
Copy link
Member

the8472 commented Sep 27, 2021

normalize() would not solve the issue.

Can you explain how it wouldn't solve it?

I don't think this is true. I've tried creating . and .. files using unc paths and it does not work.

That might depend on the filesystem. I think it's similar on linux. On most filesystems . point will do the expected thing but with FUSE you can create funky directories where it behaves differently or doesn't exist at all.

@seanyoung
Copy link
Contributor Author

normalize() would not solve the issue.

Can you explain how it wouldn't solve it?

Considerpath.join(normalize("../foo")):

  • normalize() would return ../foo or ..\foo on Windows
  • path.join() would still have to process .. and remove the last directory from self before appending foo.

I don't think this is true. I've tried creating . and .. files using unc paths and it does not work.

That might depend on the filesystem. I think it's similar on linux. On most filesystems . point will do the expected thing but with FUSE you can create funky directories where it behaves differently or doesn't exist at all.

Your original point was about Windows, so I tried this on Windows:

use std::fs::File;
use std::io::Write;

fn main() {
    let mut file =
        File::create(r"\\?\..").expect("create dot file");
    file.write_all(b"Hello, world!").expect("write dot file");
}

(I tried ., .. with relative and absolute paths, does not work).

On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise. See https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/fs/namei.c#n2215

(BTW I'm a kernel maintainer)

@the8472
Copy link
Member

the8472 commented Sep 27, 2021

normalize() would not solve the issue.

Can you explain how it wouldn't solve it?

Considerpath.join(normalize("../foo")):

  • normalize() would return ../foo or ..\foo on Windows
  • path.join() would still have to process .. and remove the last directory from self before appending foo.

Then why not normalize(path.join("../foo"))?

On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise.

Maybe it's limited to dir entry enumeration, but I do recall getting surprising results when using FUSE

use std::fs::File;
use std::io::Write;

fn main() {
    let mut file =
        File::create(r"\\?\..").expect("create dot file");
    file.write_all(b"Hello, world!").expect("write dot file");
}

(I tried ., .. with relative and absolute paths, does not work).

But did you try with different filesystems? E.g. FAT? I don't have a windows machine here, otherwise I'd poke around myself.

I assume there is a reason why the documentation says:

Because it turns off automatic expansion of the path string, the "\?" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path.

@seanyoung
Copy link
Contributor Author

Considerpath.join(normalize("../foo")):

  • normalize() would return ../foo or ..\foo on Windows
  • path.join() would still have to process .. and remove the last directory from self before appending foo.

Then why not normalize(path.join("../foo"))?

Because I assumed that path.join("../foo")) returns garbage, and you're using normalize to clean up the mess. I'm not so sure now, see below.

On Linux . and .. are special names which are handled in kernel space. You cannot create . or .. files, it does not depend on the filesystem, fuse or otherwise.

Maybe it's limited to dir entry enumeration, but I do recall getting surprising results when using FUSE

path walks are done for file creation too. It would be interesting to hear more about "surprising results".

use std::fs::File;
use std::io::Write;

fn main() {
    let mut file =
        File::create(r"\\?\..").expect("create dot file");
    file.write_all(b"Hello, world!").expect("write dot file");
}

(I tried ., .. with relative and absolute paths, does not work).

But did you try with different filesystems? E.g. FAT? I don't have a windows machine here, otherwise I'd poke around myself.

I assume there is a reason why the documentation says:

Because it turns off automatic expansion of the path string, the "?" prefix also allows the use of ".." and "." in the path names, which can be useful if you are attempting to perform operations on a file with these otherwise reserved relative path specifiers as part of the fully qualified path.

Now this is interesting - and I didn't expect this. With FAT and exFAT you can create files called . and ... They're only accessible using UNC paths, but they show up with dir. Normally dir does not show . or ...

If you then mount the file system on Linux, it cannot deal with the . or .. files. This is must be a bug in the linux exfat driver. It actually enumerates them as directories.

So now the question is, if verbatim paths permit .. and . on Windows and (ex)fat file systems, then what do we do? What does PathBuf::from("\\?\bar").join("../foo") mean?

  • \\?\bar\../foo
  • \\?\foo

I can see the appeal of the fn normalize() as you proposed.

@ChrisDenton
Copy link
Member

Btw, I wrote a thing about NT and Win32 paths, in case it helps.

The short version is NT paths can include anything except \ (including NUL). What I didn't mention there is that the NT kernel does only the minimum to resolve the relevant device, the rest of the path it simply passes to that device to resolve. The device can do anything with that sub-path.

So these NT paths are essentially impossible to handle sensibly with Rust's Path/PathBuf without special knowledge of the device being used. Maybe the device uses $ as the path separator? In that case push is already broken. Maybe . is a filename? In that case components is broken. Etc, etc.

I think Path methods should assume a "normal" filesystem path as the default and work as expected without the programmer needing special knowledge of the arcane workings of the NT kernel. Special methods could be added to address the special cases but understanding them should not be necessary for writing cross-platform software.

@seanyoung
Copy link
Contributor Author

https://lore.kernel.org/linux-fsdevel/20210927111948.GA16257@gofer.mess.org/T/

@seanyoung
Copy link
Contributor Author

Btw, I wrote a thing about NT and Win32 paths, in case it helps.

The short version is NT paths can include anything except \ (including NUL). What I didn't mention there is that the NT kernel does only the minimum to resolve the relevant device, the rest of the path it simply passes to that device to resolve. The device can do anything with that sub-path.

Very interesting.

So these NT paths are essentially impossible to handle sensibly with Rust's Path/PathBuf without special knowledge of the device being used. Maybe the device uses $ as the path separator? In that case push is already broken. Maybe . is a filename? In that case components is broken. Etc, etc.

I think Path methods should assume a "normal" filesystem path as the default and work as expected without the programmer needing special knowledge of the arcane workings of the NT kernel. Special methods could be added to address the special cases but understanding them should not be necessary for writing cross-platform software.

I concur completely. I would much rather see path join handling of . and .. do what you expect, and do what the rest of the path handling does. The fact that some windows filesystem drivers can do files called . and .. is an oddity which rust Path/PathBuf can do without.

If anyone wants to create a path like \\?\D:\foo\.., then that is still possible. I don't think it would surprise anyone that a Path/PathBuf join thinks this is a walk to the parent of foo.

@seanyoung seanyoung changed the title Add variant of std::path::Path join() method which removes .. and . path.join() should work as expected on windows verbatim paths Sep 29, 2021
@seanyoung seanyoung changed the title path.join() should work as expected on windows verbatim paths path.push() should work as expected on windows verbatim paths Sep 29, 2021
@rust-log-analyzer

This comment has been minimized.

@the8472
Copy link
Member

the8472 commented Sep 29, 2021

r? rust-lang/libs

@seanyoung seanyoung force-pushed the join_fold branch 2 times, most recently from 1c61cde to ddddc1a Compare September 29, 2021 18:25
Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 5, 2021
path.push() should work as expected on windows verbatim paths

On Windows, std::fs::canonicalize() returns an so-called UNC path.  UNC paths differ with regular paths because:

- This type of path can much longer than a non-UNC path (32k vs 260 characters).
- The prefix for a UNC path is ``Component::Prefix(Prefix::DiskVerbatim(..)))``
- No `/` is allowed
- No `.` is allowed
- No `..` is allowed

Rust has poor handling of such paths. If you join a UNC path with a path with any of the above, then this will not work.

I've implemented a new method `fn join_fold()` which joins paths and also removes any `.` and `..` from it, and replaces `/` with `\` on Windows. Using this function it is possible to use UNC paths without issue. In addition, this function is useful on Linux too; paths can be appended without having to call `canonicalize()` to remove the `.` and `..`.

This PR needs test cases, which can I add. I hope this will a start of a discussion.
Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 5, 2021
path.push() should work as expected on windows verbatim paths

On Windows, std::fs::canonicalize() returns an so-called UNC path.  UNC paths differ with regular paths because:

- This type of path can much longer than a non-UNC path (32k vs 260 characters).
- The prefix for a UNC path is ``Component::Prefix(Prefix::DiskVerbatim(..)))``
- No `/` is allowed
- No `.` is allowed
- No `..` is allowed

Rust has poor handling of such paths. If you join a UNC path with a path with any of the above, then this will not work.

I've implemented a new method `fn join_fold()` which joins paths and also removes any `.` and `..` from it, and replaces `/` with `\` on Windows. Using this function it is possible to use UNC paths without issue. In addition, this function is useful on Linux too; paths can be appended without having to call `canonicalize()` to remove the `.` and `..`.

This PR needs test cases, which can I add. I hope this will a start of a discussion.
Manishearth added a commit to Manishearth/rust that referenced this pull request Oct 5, 2021
path.push() should work as expected on windows verbatim paths

On Windows, std::fs::canonicalize() returns an so-called UNC path.  UNC paths differ with regular paths because:

- This type of path can much longer than a non-UNC path (32k vs 260 characters).
- The prefix for a UNC path is ``Component::Prefix(Prefix::DiskVerbatim(..)))``
- No `/` is allowed
- No `.` is allowed
- No `..` is allowed

Rust has poor handling of such paths. If you join a UNC path with a path with any of the above, then this will not work.

I've implemented a new method `fn join_fold()` which joins paths and also removes any `.` and `..` from it, and replaces `/` with `\` on Windows. Using this function it is possible to use UNC paths without issue. In addition, this function is useful on Linux too; paths can be appended without having to call `canonicalize()` to remove the `.` and `..`.

This PR needs test cases, which can I add. I hope this will a start of a discussion.
bors added a commit to rust-lang-ci/rust that referenced this pull request Oct 5, 2021
…ingjubilee

Rollup of 15 pull requests

Successful merges:

 - rust-lang#87993 (Stabilize try_reserve)
 - rust-lang#88090 (Perform type inference in range pattern)
 - rust-lang#88780 (Added abs_diff for integer types.)
 - rust-lang#89270 (path.push() should work as expected on windows verbatim paths)
 - rust-lang#89413 (Correctly handle supertraits for min_specialization)
 - rust-lang#89456 (Update to the final LLVM 13.0.0 release)
 - rust-lang#89466 (Fix bug with query modifier parsing)
 - rust-lang#89473 (Fix extra `non_snake_case` warning for shorthand field bindings)
 - rust-lang#89474 (rustdoc: Improve doctest pass's name and module's name)
 - rust-lang#89478 (Fixed numerus of error message)
 - rust-lang#89480 (Add test for issue 89118.)
 - rust-lang#89487 (Try to recover from a `=>` -> `=` or `->` typo in a match arm)
 - rust-lang#89494 (Deny `where` clauses on `auto` traits)
 - rust-lang#89511 (:arrow_up: rust-analyzer)
 - rust-lang#89536 (update Miri)

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit 7aa9ce5 into rust-lang:master Oct 5, 2021
@rustbot rustbot added this to the 1.57.0 milestone Oct 5, 2021
@seanyoung seanyoung deleted the join_fold branch October 5, 2021 07:12
seanyoung added a commit to seanyoung/normpath that referenced this pull request Oct 8, 2021
Since rust-lang/rust#89270 rust's pathbuf
push() normalizes the path if it is a verbatim path, so that the result
is still a valid verbatim path. This involves parsing reconstructing the
path.

Appending a path separator using push("") will no longer work, and if
it did, it would be very inefficient.

rust-lang/rust#89658

Signed-off-by: Sean Young <sean@mess.org>
@rustbot rustbot added the P-high High priority label Oct 8, 2021
@apiraino apiraino removed the P-high High priority label Oct 8, 2021
@Eh2406
Copy link
Contributor

Eh2406 commented Oct 15, 2021

I am aware of these issues but not knowledgeable. Thank you for the good discussion, both here and on zulip! I think I am convinced that this is a positive change.

But, I am very surprised about the process here! Given this is a subtle change to windows behavior I would have expected knowledgeable parties to be cc'ed, like @ehuss or @retep998 maybe @kennykerr or @rylev.

Given that this is a (invisible to crater) breaking change to a stable API I would expect some team to need to do a FCP, maybe @rust-lang/libs-api. And for there to be a discussion of how we are going to let people know about the breaking change.

@retep998
Copy link
Member

I agree this is the right change to make, but yes there should have been an FCP given this was a behavioral change to a rather fundamental stable API.

kazatsuyu added a commit to kazatsuyu/cargo-mobile that referenced this pull request Nov 5, 2021
On Windows, Path::join may makes invalid UNC path.
It was fixed on nightly (rust-lang/rust#89270), but not yet on stable.
seanyoung added a commit to seanyoung/solang that referenced this pull request Dec 2, 2021
Since rust 1.57.0, path.push() works correctly on verbatim paths on
Windows. See rust-lang/rust#89270

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to seanyoung/solang that referenced this pull request Dec 2, 2021
Since rust 1.57.0, path.push() works correctly on verbatim paths on
Windows. See rust-lang/rust#89270

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to seanyoung/solang that referenced this pull request Dec 2, 2021
Since rust 1.57.0, path.push() works correctly on verbatim paths on
Windows. See rust-lang/rust#89270

Signed-off-by: Sean Young <sean@mess.org>
seanyoung added a commit to hyperledger-solang/solang that referenced this pull request Dec 2, 2021
Since rust 1.57.0, path.push() works correctly on verbatim paths on
Windows. See rust-lang/rust#89270

Signed-off-by: Sean Young <sean@mess.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants