Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

proposal: path/filepath: add Resolve, replacing EvalSymlinks #42201

Closed
ianlancetaylor opened this issue Oct 25, 2020 · 43 comments
Closed

proposal: path/filepath: add Resolve, replacing EvalSymlinks #42201

ianlancetaylor opened this issue Oct 25, 2020 · 43 comments

Comments

@ianlancetaylor
Copy link
Contributor

ianlancetaylor commented Oct 25, 2020

This is a new proposal to replace #37113, which was closed for non-technical reasons.

Paraphrasing @rsc, the proposal is a new function in the path/filepath package:

// Resolve returns the path name as an absolute path that does not contain any symlinks.
// Resolve calls Clean on the result.
func Resolve(path string) string

The expectation is that on Unix systems this will be essentially filepath.Abs(filepath.EvalSymlinks(path)) and on Windows it will essentially acquire a handle for the path and call GetFinalPathNameByHandle.

Objections to this approach (in my own words, apologies if I misrepresent some position):

  • We should instead make EvalSymlinks work better on Windows, such that filepath.Abs(filepath.EvalSymlinks(path)) will suffice on both Unix and Windows systems. This may involve changing EvalSymlinks to call GetFinalPathNameByHandle. However, any such change to EvalSymlinks on Windows may break programs that currently work on Windows.
  • It will be tempting to think that the proposed Resolve function will return a canonical path, but it will not, neither on Unix nor Windows (on Unix it will not be canonical due to hard links and multiple mounts). Therefore this function will mislead people into writing buggy programs. In particular, os.SameFile can return true for two different paths returned by Resolve.
@rasky
Copy link
Member

rasky commented Oct 25, 2020

I have a couple of other reasons to vote against this proposal:

  • GetFinalPathNameByHandle has 4 different modes where it produces different "canonical pathnames". The previous proposal was originated from git-lfs so the author needed the Go function to do exactly what git happens to do (in fact, even on Linux, git calls realpath(3), which I think it should return mostly the same result as Abs(EvalSymlinks(path)), but if it doesn't, we're back to step one). I disagree that filepath.Resolve should have an implementation which is matched against what another project does; if somebody needs absolute compatibility with something else, they should probably reimplement it outside std.
  • Resolve and EvalSymlinks would basically be duplicates. It would be very hard to explain in the documentation why we need two similar functions, and it would confuse users. If Abs(EvalSymlinks(path)) on Windows is different from Resolve, we should document how and why, which would basically be an implementation detail with no real semantic meaning.

@mvdan
Copy link
Member

mvdan commented Oct 25, 2020

It would be very hard to explain in the documentation why we need two similar functions, and it would confuse users.

To me, this is probably the most important drawback of the proposal.

@ericwj
Copy link

ericwj commented Oct 25, 2020

I can add to @rasky's comment that some modes do not produce any result, depending on the link type and mode used, e.g. see #39786.

I also have objections, which I have documented in #40180 and those objections carry over to this proposal for most part because EvalSymlinks is implemented with GetFinalPathNameByHandle. All links will be resolved using that for the implementation.

For these reasons I would like to see a proposal which allows application code to determine which links in a path get resolved, and for that to be ported to Unix instead of attempting to sort of port realpath(3) to Windows, such that go on Unix gains functionality that adheres to the objections listed in #40180, too.

@networkimprov
Copy link

We need a new API because EvalSymlinks is broken on Windows (how is described in #37113 & #40180), and cannot be fixed without breaking some existing callers. EvalSymlinks would be deprecated.

I agree with Eric that it's not correct to always resolve all symlinks in a path, but I also think we must rely on the WinApi to implement this on Windows. I don't see a way to do both. We should include in the Resolve docs Eric's suggestions from #40180 re best practices for handling paths not created by the application which resolves them.

As to the GetFinalPathNameByHandle mode, we'd use VOLUME_NAME_NT "Return the path with the volume device path" (since drive letters are mount points). This produces \\?\Device\device_name\path\file.ext (pls correct me if not).

https://docs.microsoft.com/en-us/windows/win32/api/fileapi/nf-fileapi-getfinalpathnamebyhandlew
https://docs.microsoft.com/en-us/windows/win32/fileio/displaying-volume-paths

@ericwj
Copy link

ericwj commented Oct 26, 2020

we must rely on the WinApi to implement this on Windows. I don't see a way to do both.

There are other ways besides GetFinalPathNameByHandle to resolve links, for example there is WNetGetUniversalName to translate paths with drive letters to UNC syntax. I'm sure research into the topic will yield similar API's for reparse points and the example above already hints at volume mount points and mounted folders. I don't think there are many other types of links at all, but I'm not sure.

This produces \\?\Device\device_name\path\file.ext (pls correct me if not).

I didn't get the prefix \\?, but that doesn't mean that cannot appear. But such paths are useless or almost useless for applications and probably very confusing, definitely without additional API's that accept such a path. Also the same file yields \Device\Mup\ComputerName\ShareName\file.ext when accessed over the SMB share \\ComputerName\ShareName, so it still doesn't canonicalize.

@networkimprov
Copy link

What code did you find that doesn't prepend \\? to \Device\...?

Are you saying that some WinApis can't accept paths starting \\?\Device? If not, why would they be useless?

I assume a Volume GUID prefix wouldn't be helpful since that differs on two hosts with otherwise identical configuration.

I think we've concluded that there's no such thing as a "canonical" path.

@rasky
Copy link
Member

rasky commented Oct 26, 2020

It would be very hard to explain in the documentation why we need two similar functions, and it would confuse users.

To me, this is probably the most important drawback of the proposal.

I note that this concern could be prevented if the proposal would also include a plan to eventually deprecate EvalSymlinks. I now notice that the issue title mentions "replacing EvalSymlinks", but there is no mention of "replacing" in the proposal text. @ianlancetaylor, can you please clarify this, by making the title and the description match one way or the other?

@ericwj
Copy link

ericwj commented Oct 26, 2020

What code did you find that doesn't prepend \? to \Device...?

Are you saying that some WinApis can't accept paths starting \?\Device? If not, why would they be useless?

They are not filesystem paths in the first place. They work with the API's used in the example - volume management API's, perhaps they also work with other kernel API's that deal with devices or with namespaces, but you cannot use them with regular file system API's like Chdir and so forth. I ran the API and didn't get \\?\. I don't know whether they can ever appear.

I assume a Volume GUID prefix wouldn't be helpful since that differs on two hosts with otherwise identical configuration.

On GPT disks they are the unique identifier listed in the partition entry. They should be stable and unique unless we're talking about cloned disks. But there are paths for which asking for the volume GUID does not yield a result. The one case I am aware of is accessing files over SMB. I don't have a computer around that doesn't have/use UEFI/GPT and I haven't tested with external disks which are usually not GPT, or any disks formatted with the old school MBR partitioning scheme. If Windows offers a volume GUID at all, it'll be a constructed one which isn't unique and maybe not stable.

@networkimprov
Copy link

So are you suggesting VOLUME_NAME_GUID? It seems to me that VOLUME_NAME_DOS is more generally useful.

@ericwj
Copy link

ericwj commented Oct 26, 2020

So are you suggesting VOLUME_NAME_GUID? It seems to me that VOLUME_NAME_DOS is more generally useful.

No, I agree with your previous statement.

I think we've concluded that there's no such thing as a "canonical" path.

VOLUME_NAME_GUID and VOLUME_NAME_DOS produce errors depending on the link type and the configuration of the target volume or share (whether it is a share or has a DOS device name associated with it) and none of them can be compared with each other. VOLUME_NAME_NT I think always returns a result, but still doesn't work for files on the network - not even if the host is localhost (I tested that) - since the result reveals that the target was accessed over SMB, not it's original location.

@networkimprov
Copy link

Well if VOLUME_NAME_DOS returns an error, can't we fall back to VOLUME_NAME_GUID?

@ericwj
Copy link

ericwj commented Oct 26, 2020

And if that returns an error too, we fall back to - what? WNetGetUniveralName?

@ericwj
Copy link

ericwj commented Oct 26, 2020

Sure there will certainly be ways to make the implementation complex enough such that it can resolve any and all links that it can encounter. I would say write it down, see how it can be fitted in a universally useful API. I haven't seen anyone comment on how the Unix API's behave in the presence of Samba or other file systems - would be interesting to see a discussion about that. Whether to cater to the objections listed in #40180 is a mere choice imho. I am obviously in favour of doing that. Canonicalization will not be the end result either way however. Not on Windows.

@networkimprov
Copy link

networkimprov commented Oct 26, 2020

Hm, can we use VOLUME_NAME_NT and either construct a \\?\UNC\... path for a network object, or look up the device to get a drive letter or volume GUID?

EDIT: Or try VOLUME_NAME_DOS, then try VOLUME_NAME_NT and construct UNC path, then try VOLUME_NAME_GUID.

Again, we're not concerned with "canonical" names, there's no such thing.

@ericwj
Copy link

ericwj commented Oct 26, 2020

I think it depends on the use case how links should be resolved. I get a different drive letter using VOLUME_NAME_DOS for some paths, for example. Drive letters are local to a session - or a desktop - such that already just if elevation is required for (part of) an app, they might in some situations first need to obtain a path without drive letter to be sure they can access the files their non-elevated copy refers to. Store apps can use this pattern of shipping a separate full-privilege executable which can do work that requires elevation, but they cannot themselves outright elevate and terminate the non-elevated copy like a desktop app can do. This is especially essential for SMB drive mappings. One more reason to make the API slightly less trivial in terms of formal declaration.

Apps shouldn't elevate and terminate the non-elevated copy, even if it is technically possible and very handy. This is a Certification requirement for Windows Desktop Apps: 9.2 Your app s main process must be run as a standard user (asInvoker).

@networkimprov
Copy link

I don't think we'll get agreement here on a more sophisticated API, but feel free to suggest something. Otherwise...

Try VOLUME_NAME_GUID, then VOLUME_NAME_NT and construct UNC path.

If the app wants to separately lookup the volume GUID for a drive letter, so be it.

@ericwj
Copy link

ericwj commented Oct 26, 2020

Well, without having done the research or testing for it, nor considering all situations where there might be different requirements for resolving links, I just make up these formal declarations as I go right now, with whatever names are descriptive, regardless of whether they are suitable or consistent:

// Returns the longest part of path that is a link, if there are links on path,
// or a suitable error if path is not valid, or does not contain any links.
func GetDeepestLinkPath(path string): (string result, error err)

// Returns the target of the link pointed to by path, if it points directly at a link,
// subject to the mode requested, or a suitable error if path is not valid, or is not itself a link.
func ResolveExactLinkPath(path string, mode ResolveMode): (string result, error err)

These could be primitives upon which an easier API could be built, something like:

// A callback function that allows applications to determine whether
// the target of the link pointed to by path should be resolved or not
// and if so, how.
type ShouldResolveFunc func(path string) bool/ResolveMode

// Resolves links on path if there exist links on path below root,
// subject to the mode requested and the return value of
// shouldResolve for each link that is encountered on path below root.
// Returns the target path obtained, or a suitable error if either root 
// or path are invalid, path is not within root, or link resolution fails.
// If the callback function shouldResolve is nil, path is returned.
// Applications should conservatively and consciously decide for each link
// whether it is to be resolved and implement a suitable shouldResolve 
// callback accordingly.
// The callback will receive the exact path to links on path
// in the order GetDeepestLinkPath encounters them, recursively.
// Its return value will determine whether and how links are resolved.
// Links may have been created to solve administrative problems 
// of which most applications should remain unaware.
// Most applications should only resolve specific links that they 
// require to resolve, use the result immediately, forget the result
// and never show the result to the user, unless they have specific
// information about the link that was resolved and whether their
// resolved target is stable and can be cached.
func Resolve(root, path string, mode ResolveMode, shouldResolve ShouldResolveFunc): (string result, error err)

Obviously I just list stuff for which I am not sure where it should be declared everywhere it makes sense instead of making any choices. ResolveMode could be some enumeration of flags or discrete values, depending on what emerges as being required, at the very least with something resembling None which would be the default return value for ShouldResolveFunc if it doesn't return bool.

@networkimprov
Copy link

Can you add comments describing the behaviors of your API concept? And suggestions for implementing them on Windows?

@ericwj
Copy link

ericwj commented Oct 27, 2020

It gets to be a bit wordy, but yeah, the world is a messy place. It doesn't give any ins and outs just yet even.

My earlier comment about performance applies here - strings don't carry (much) context or (any) proof of work done. Perhaps other types of arguments allow better performance and IDE's to help write code faster at the cost of (some) API 'complexity'. Types are good for both.

Suggestions for how to implement them are exact Windows API's to resolve links. FindFirstFile first of all to get the reparse point tag if any. I haven't done research into how to use that subsequently in the cases where this indicates the path is a symbolic link, junction, or other reparse point that should be supported for resolution here. If the path has a drive letter, GetDriveType would be needed, followed by API such as WNetGetUniversalame, GetVolumeNameForVolumeMountPoint, and similar functions for other types of links. The latter works for reparse points that are mounted folders and for mapping drive letters, volume paths or mounted folders to the volume path; the former yields the path in UNC syntax for paths that start with an SMB drive mapping.

The shouldResolve callback may use (new proposal) new public API to great effect that classifies links in a portable way but with enough detail that it can discriminate how to resolve them as well. What that looks like depends on how things work on *nix and how e.g. can be determined that a path refers to a remote file over SMB or in the cloud.

How precisely this all works would have to be written down in code and tested, preferably reviewed by Windows experts, before deciding what the API actually will exactly have to look like.

@ericwj
Copy link

ericwj commented Oct 27, 2020

I would even say before it is actually included in any standard library, it'll have to be battle tested, especially the part that tries to summarize application scenario's for resolving links and makes the right choices for each of those.

@networkimprov
Copy link

The ShouldResolveFunc callback is an interesting idea! But I think the right place to prototype it is probably a more widely used language than Go (e.g. Python, Java, C#, C) where it would see more users and use cases, esp on Windows.

Re a simple filepath.Resolve API, what do you think of my last suggestion for VOLUME_NAME_x?

@ianlancetaylor
Copy link
Contributor Author

Good point about looking at other languages.

Python has os.path.realpath(path) (https://docs.python.org/2/library/os.path.html#os.path.realpath).

Java has os.File.getCanonicalPath() (https://docs.oracle.com/javase/7/docs/api/java/io/File.html#getCanonicalPath()).

C++ has std::filesystem::canonical (https://en.cppreference.com/w/cpp/filesystem/canonical).

What do those functions do on Windows?

@ericwj
Copy link

ericwj commented Oct 27, 2020

I think those have ported from *nix, or in the case of C++ have taken POSIX semantics (hear say), without the considerations we are having here and will have the issues we identified.

Certainly the C++ version, which has been linked before, with the implementation actually here. It is pretty mangled code, but appears to use GetFinalPathNameByHandle and request VOLUME_NAME_DOS and if that fails VOLUME_NAME_NT, then strips both \\?\ and \\?\UNC\ prefixes and then adds the prefix \\?\GLOBALROOT if the result was obtained from VOLUME_NAME_NT. I'm not exactly sure what the consequences are or what LR"(\\?\GLOBALROOT)"sv exactly does - correct me if I am wrong.

Re a simple filepath.Resolve API, what do you think of my last suggestion for VOLUME_NAME_x?

If any attempt at canonicalizing is deemed needed for code compatibility reasons, I would port the C++ version of it. It won't work to canonicalize files accessed over SMB, nor will it yield afaik paths that can be used with other file system API's, but it might about always return some result that can at least be compared between local files and between remote files, but not the remote with the same file accessed locally and it won't substitute drive letters for the imho better alternative of volume GUID paths. So I would actually have the implementation also try VOLUME_NAME_GUID.

PowerShell's Get-ChildItem aka dir shows link targets as well. It has the problem that it doesn't indicate in any way that hardlinks are links. The list in #40180 uses this to show what the result should be, but for hardlinks I believe I hardcoded it. This could be an issue with GetFinalPathNameByHandle itself, in which case this is a persistent problem with this approach.

@ericwj
Copy link

ericwj commented Oct 27, 2020

I think the right place to prototype it is probably a more widely used language than Go (e.g. Python, Java, C#, C)

Perhaps I would be able to create a C# version, even to propose such a thing to become part of .NET, which would be awesome because it'd have to be reviewed by Microsoft themselves. If such a proposal would be deemed useful or recommended to expose to the general public in the first place. And if it is accepted, the cadence is one release per year in November - it could take 1 or 2 years before they get to it.

ADDITION: It would also have to be portable to a wide array of Linux versions, MacOS, probably Android, with that work also reviewed by Microsoft.

@networkimprov
Copy link

Actually, it prepends \\?\GLOBALROOT to result of VOLUME_NAME_NT, and strips \\?\ otherwise, so the prefix is inconsistent. I guess long paths fail for *_NAME_DOS? From https://github.com/microsoft/STL/blob/master/stl/inc/filesystem#L3023

I believe we can yield a consistent, general-purpose result via \\?\GLOBALROOT\Device\device_name\path\file.ext -- which is most likely to be useful across hosts with identical configuration, unlike drive letters and volume GUIDs. An app can look up either of the latter from that value.

BTW hardlinks are not at issue here (but are one reason the term "canonical" isn't appropriate).

@tandr
Copy link

tandr commented Oct 28, 2020

(very meta)
Folks, I think there is a bit of irony(?) hiding in there somewhere - you are discussing things about internals of Microsoft's software on platform that belongs to Microsoft. Why won't we ask someone from Microsoft? Would it be possible to tag some @microsoft folks to pipe in? Like, I don't know, @markrussinovich who might know a bit about Windows internals, or @oldnewthing (Raymond Chen) about what "canonical" might mean on Windows and what is the best way to go forward? (Sorry Mark and Raymond, your names came first to my mind - if you know anyone who might help please direct them here.)

@networkimprov
Copy link

cc @jstarks who often comments here...

@networkimprov
Copy link

networkimprov commented Oct 28, 2020

Eric, I believe the device name is configurable, and it would be ideal to get the same result for Resolve() when you move an application to a mirror or failover host with identical configuration. At the very least the device name is more readable than the volume GUID, and potentially more meaningful.

My understanding is that \\?\GLOBALROOT\Device\device_name\... is a valid path name for WinAPIs.

And we should consistently use the \\?\ prefix to be long-path safe.

https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-setvolumelabela

@tandr
Copy link

tandr commented Oct 28, 2020

To use \\?\ as a path prefix I think (but I might be wrong) an executable needs to pack a manifest file to indicate that it can handle long paths.

(Re: #17835)

@ericwj
Copy link

ericwj commented Oct 28, 2020

Well the STL strips it. It means 'do not parse' which means most API's shouldn't perform normalization or use forward slashes etc - which go is adept at ignoring, but it is what it is defined to mean. So unless the prefix is required it is prudent to remove them, such that a) go flakes less often and b) paths will be processed more smoothly and look nicer.

@ericwj
Copy link

ericwj commented Oct 28, 2020

Eric, I believe

I just don't know. I would prefer volume GUID paths, technically and esthetically.

Wrt involving Microsoft... YEAH. Hire them if that's what it takes. Lots of work to be done.

@tandr
Copy link

tandr commented Oct 28, 2020

Maybe of some help - https://github.com/golang/go/blob/master/src/os/path_windows.go#L131 has a very interesting function fixLongPath (that I had to reexport for our code using //go:linkname trickery...) that is kind of trying to do some (minor) long path normalization.

@networkimprov
Copy link

Eric and I looked at the MS C++ STL canonical() implementation, and he ran tests for a variety of paths with each of the VOLUME_NAME_* options. I expect he'll post them below.

We agree that returning a path starting with a drive letter isn't at all "canonical" as a storage volume may not be mapped to one, and if it is, that can change anytime. The other options are...

  • a volume GUID \\?\Volume{...}\path\... (not applicable to SMB paths)
  • a UNC path \\?\UNC\host\path\... (only applicable to SMB paths)
  • an NT device name \\?\GLOBALROOT\Device\device_name\path\... (applicable to anything)

Eric and I agree that the Volume GUID is the most "canonical" for resources that have one, and that a UNC path should be returned otherwise. The implementation entails first trying GetFinalPathNameByHandle with VOLUME_NAME_GUID, and if that fails with ERROR_PATH_NOT_FOUND (which it does for SMB paths), trying again with VOLUME_NAME_DOS.

He has also discovered that some path/filepath APIs choke on paths prefixed with \\?\ so the above results are incompatible with them, but that is a bug that should be resolved separately. That's not a problem with the os APIs.

@rasky
Copy link
Member

rasky commented Oct 30, 2020

He has also discovered that some path/filepath APIs choke on paths prefixed with \?\ so the above results are incompatible with them, but that is a bug that should be resolved separately. That's not a problem with the os APIs.

I'm happy to try to fix them, please ping me on that.

@ericwj
Copy link

ericwj commented Oct 30, 2020

GetFinalPathNameByHandle modes

Resuls for plain calls to CreateFile to get a file handle (using OPEN_EXISTING and FILE_FLAG_BACKUP_SEMANTICS), then GetFinalPathNameByHandle using the flag specified in the tables below, without any processing before or after. FILE_NAME_NORMALIZED is implicit (0x0), i.e. not specifying FILE_NAME_OPENED (0x8) as this also shows - casing is consistently fixed against the names on disks or conventions for paths (uppercase drive letters).

  • S:\ - when present - is the drive letter assigned to \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\
  • M:\ is the root of a volume on a virtual disk initialized with the MBR partitioning scheme (note the zeroes in the volume GUID).
  • N:\ is a mapping to SMB share \\Z68\UncShare which shares \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\Source
    (sharing S:\Source obviously doesn't work after I remove that mapping)

First without having a drive letter mounted to \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\.

  • C:\Users\Eric\Source is a directory junction to \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\Source
Path Flag Result
n:\SOURCE.ICO VOLUME_NAME_DOS \\?\UNC\Z68\UncShare\Source.ico
n:\SOURCE.ICO VOLUME_NAME_GUID The system cannot find the path specified. (0x80070003)
n:\SOURCE.ICO VOLUME_NAME_NONE \Z68\UncShare\Source.ico
n:\SOURCE.ICO VOLUME_NAME_NT \Device\Mup\Z68\UncShare\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_DOS The system cannot find the path specified. (0x80070003)
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_GUID \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_NONE \Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_NT \Device\HarddiskVolume15\Source\Source.ico
m:\PATH\FILE.EXT VOLUME_NAME_DOS \\?\M:\path\file.ext
m:\PATH\FILE.EXT VOLUME_NAME_GUID \\?\Volume{ac96f27a-0000-0000-0000-010000000000}\path\file.ext
m:\PATH\FILE.EXT VOLUME_NAME_NONE \path\file.ext
m:\PATH\FILE.EXT VOLUME_NAME_NT \Device\HarddiskVolume21\path\file.ext

Now I add S:\ to the volume.

Path Flag Result
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_DOS \\?\S:\Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_GUID \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_NONE \Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE.ICO VOLUME_NAME_NT \Device\HarddiskVolume15\Source\Source.ico

Using a mounted folder instead of a junction.

  • C:\Users\Eric\Source is a mounted folder to \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\
    Note the additional \SOURCE segment I now need since the mounted folder points to the root of the volume, not to the Source folder in the root of it.
Path Flag Result
c:\USERS\ERIC\SOURCE\SOURCE\SOURCE.ICO VOLUME_NAME_DOS \\?\C:\Users\Eric\Source\Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE\SOURCE.ICO VOLUME_NAME_GUID \\?\Volume{607932d3-78af-41c0-9786-dd0177e78a39}\Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE\SOURCE.ICO VOLUME_NAME_NONE \Source\Source.ico
c:\USERS\ERIC\SOURCE\SOURCE\SOURCE.ICO VOLUME_NAME_NT \Device\HarddiskVolume15\Source\Source.ico

@ericwj
Copy link

ericwj commented Oct 30, 2020

A summary of our analysis in addition to @networkimprov's comment above and the list of results for each mode:

  • C++ canonicalize prefixes the NT device name with \\?\GLOBALROOT which counts as DOS to NT namespace prefix, such that the result is usable in place of normal file paths.
  • C++ canonicalize removes prefixes when they are not systematically required, unconditionally, so even for long paths that might usually need them and for paths that contain special or even invalid names that cannot work without prefix. That is, UNC paths are returned in UNC notation with just \\ before the server name and any DOS paths are returned without prefix.
  • Both Rust and some version of the Java OpenJDK use VOLUME_NAME_DOS only and unconditionally and perform no processing at all, so they introduce prefixes even for paths with drive letters that are short enough not to need them and will return only errors for a whole variety of paths.
  • Volume GUID's and paths count as familiar albeit lengthy and badly memoizable (but I find them recognizable), whereas NT device paths are considered alien and impractical to translate.
  • Volume GUID's are persisted on GPT volumes and hence independent of system configuration, whereas NT device names are not. Volume GUIDs are stable on multiboot systems, across normal Windows installations, Windows RE and Windows PE, or when disks travel between systems.
  • I tested whether volume GUID's are persistent with MBR disk M:\; two systems show the same volume GUID for the partition.
  • proposal: path/filepath: Add SameFile and InTree #42202 is about functionality with which applications can determine that path A and B are actually guaranteed to refer to the same thing if either one is a local path and the other a remote SMB path and the server name can be established to refer to the local system, or both are remote SMB paths and the server name matches, or the differing names can be established to refer to the same SMB server through other means.

@ericwj
Copy link

ericwj commented Nov 4, 2020

Canonicalization will work just fine for files on the local machine and links pointing at the local machine, but for remote paths, there is as far as I have been able to determine, no way around getting just errors trying to translate paths that contain links. The only ways around this are:

  • Not translating remote paths any further if GetFinalPathNameByHandle returns errors.
  • Running an application service on the remote machine and talking to it to canonicalize paths local to it.
  • Running elevated and against a remote machine that has remote management enabled and use that to canonicalize paths.

In both latter cases, canonicalization will not yield a path usable with normal file API, but must either have a syntax that includes the machine name but also a volume GUID, or be a pair of values - a path and a machine name or even some unique, canonical machine identifier.

Since this conclusion is largely based on behavior of SMB, I don't think this is unique to Windows, but similar problems as described will also affect other operating systems that use SMB, or other remote file systems that are designed similarly.

Symlink evaluation policy

There is a global system policy in Windows called SymlinkEvaluation, described in fair detail in this StackOverflow question, which defaults to the following settings:

$P$G> fsutil behavior query SymlinkEvaluation
Local to local symbolic links are enabled.
Local to remote symbolic links are enabled.
Remote to local symbolic links are disabled.
Remote to remote symbolic links are disabled.

This means that for paths that resolve to a UNC location on a remote machine that contain links declared on the remote machine, GetFinalPathNameByHandle will return:

The symbolic link cannot be followed because its type is disabled. (0x800705B7)

I have found no mention of any way in which this policy can be circumvented by a particular process. The global policy should not be modified by any individual application, since that might break other apps. Good citizenship even mandates adhering to it. Even enabling the policy for all link types still doesn't quite offer canonicalization. Links might still not resolve with access denied or path not found, depending on the path declared as target in the reparse point:

  • a file or a directory and
  • a relative path or an absolute path which is
  • rooted at a drive letter, a volume GUID or a UNC location,
  • the latter of which might again be local or remote to the machine resolving the links.

Going around the Symlink evaluation policy

The only way around this policy as far as I know is to use DeviceIoControl with the control code FSCTL_GET_REPARSE_POINT, at least if the SMB server runs on Windows. But this returns remote paths as declared in the reparse point, meaning that they could be local, rooted paths with e.g. drive letters which are valid only for use on the remote machine.

This is one of the bugs with EvalSymlinks and is documented in #40180. It may return paths obtained with FSCTL_GET_REPARSE_POINT as if they are proper local paths, not even checking whether the reparse point is on a remote machine. EvalSymlinks should be broken further in order to not just adhere to the default SymlinkEvaluation policy, but to refuse to translate these paths either way because it forgets that they are not from the local machine. Hence, implementing the proposed API using FSCTL_GET_REPARSE_POINT must then return a result that may be canonical, but not usable with normal file API. Then Resolve is a really confusing name.

As far as I know, SMB does not provide a way to translate local paths on a remote machine to volume GUID's or to expose whether or not and which share on the remote machine might host the translated path. Obviously, such a translated path could not have been shared over SMB at all and be inaccessible completely from a remote machine, except through the path that was translated. I don't think this is any different on *nix with SMB. I have no idea whether alternatives like NFS are designed any different.

SMB server and share names are not normalized

Also, SMB server and share names are never normalized, but returned as opened (if present in the requested path), or in some random casing which appears to be stable but does not match the conventions for computer names (uppercase for NetBIOS names, lowercase for DNS) or the share name as it was defined on the remote machine (if the requested path is a drive mapping, for example), even if the FILE_NAME_OPENED flag is not specified in the call to GetFinalPathNameByHandle.

@rsc
Copy link
Contributor

rsc commented Nov 11, 2020

It seems like we are not really headed for a consensus. Maybe this would be better to do in an external package to start?

@rsc
Copy link
Contributor

rsc commented Nov 18, 2020

It seems clear there is no consensus here. This seems like a likely decline.

@networkimprov
Copy link

Let's put this on hold, as more ppl are likely to raise this, and the only answer at present is, "Sorry, filepath.EvalSymlinks is broken beyond repair on Windows; call it on unix, but call x/sys/windows.GetFinalPathNameByHandle on windows"

@ericwj
Copy link

ericwj commented Nov 24, 2020

Yes, for local files. Combine that with #42202 for remote files. The hard part is determining IsSameSmbServer (non-existent API) in that case and figuring out which of the problems described in this thread also affect *nix and/or other remote file systems like NFS.

@rsc
Copy link
Contributor

rsc commented Dec 2, 2020

No change in consensus, so declined.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

8 participants