-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add symbolic link APIs #54253
Add symbolic link APIs #54253
Conversation
Note regarding the This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change. |
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsFixes #24271 Co-authored with @jozkee PTAL @iSazonov @mklement0 @tmds @carlreinke
|
src/libraries/System.Private.CoreLib/src/Resources/Strings.resx
Outdated
Show resolved
Hide resolved
/// <exception cref="ArgumentNullException"><paramref name="path"/> or <paramref name="pathToTarget"/> is <see langword="null"/>.</exception> | ||
/// <exception cref="ArgumentException"><paramref name="path"/> or <paramref name="pathToTarget"/> is empty. | ||
/// -or- | ||
/// <paramref name="path"/> is not an absolute path. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is different from File.Create
, which allows path
to be relative. I'm not necessarily opposed to requiring an absolute path, but it seems like it could be a gotcha.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Implementation wise we do support relative paths, similarly to any other API in System.IO. This is just a mistake in the documentation. Thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad. I initially wrote the empty APIs with docs on top of them assuming some things, but didn't adjust them after writing the code. Thanks for catching it, @carlreinke. We'll make sure to double check the docs.
src/libraries/System.IO.FileSystem/tests/Base/SymbolicLinks/BaseSymbolicLinks.FileSystem.cs
Outdated
Show resolved
Hide resolved
/// -or- | ||
/// <paramref name="path"/> is not an absolute path. | ||
/// -or- | ||
/// <paramref name="path"/> or <paramref name="pathToTarget"/> contains invalid path characters.</exception> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Grouping path
and pathToTarget
here makes it sound like they will both be validated against the same set of invalid characters. pathToTarget
allows almost anything, including most of Path.InvalidPathChars
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the target is not the regular path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@carlreinke, a symlink points to another file-system entry using a regular path, so it seems sensible to validate the target path formally like any other path (not in terms of existence; although, given that symlinks can span file-systems, conceivably they have different sets of valid characters - how does FileSystem.VerifyValidPath
handle this currently?). Why do you think pathToTarget
allows (should allow) almost anything?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mklement0 pathToTarget
allows almost anything because mklink
(and thus the underlying OS API) allows almost anything in the target on Windows.
C:\temp>mklink foo "<>:/\|?*"
symbolic link created for foo <<===>> <>:/\|?*
C:\temp>dir | findstr foo
2021-06-16 23:08 <SYMLINK> foo [<>:/\|?*]
(On Linux the only invalid path character is '\0'
, so it's obvious that it will allow almost anything there.)
See this comment for a use case for storing arbitrary data in a symlink.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow - I certainly did not expect mklink foo "<>:/\|?*"
to work, and you have to wonder why that is permitted. The implementation via reparse points means that arbitrary data can be stored, so one can see how it's technically possible, but to what end? Probably not in anticipation of "off-label" uses such as as a lock on Unix, per the comment you link to.
I guess that disallowing '\0'
is then the only check we can perform without making underlying capabilities inaccessible (I haven't tried the APIs directly, but at least mklink
/ ln
don't support '\0'
.)
The only caveat is that we must make sure that other .NET APIs then don't run into trouble when they encounter a FileSystemInfo
instance with such a misshapen path obtained via .ResolveLinkTarget()
- note that
new FileInfo("<>:/\|?*")
does not work, for instance.
And, of course, this permissiveness should be clearly documented.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
De-facto only OS file system driver can do full path validation. In the library it makes sense only to do simple checks for null/\0
.
/// <param name="linkPath">The path of the file link.</param> | ||
/// <param name="returnFinalTarget"><see langword="true"/> to follow links to the final target; <see langword="false"/> to return the immediate next link.</param> | ||
/// <returns>A <see cref="FileInfo"/> instance if <paramref name="linkPath"/> exists, independently if the target exists or not. <see langword="null"/> if <paramref name="linkPath"/> does not exist.</returns> | ||
public static System.IO.FileSystemInfo? ResolveLinkTarget(string linkPath, bool returnFinalTarget = false) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are at least a couple <exception/>
cases that should be listed here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+ /// <exception cref="ArgumentNullException"><paramref name="linkPath"/> is <see langword="null"/>.</exception>
+ /// <exception cref="ArgumentException"><paramref name="linkPath"/> is empty.
+ /// -or-
+ /// <paramref name="linkPath"/> contains a null character.</exception>
{ | ||
// Unix max paths are typically 1K or 4K UTF-8 bytes, 256 should handle the majority of paths | ||
// without putting too much pressure on the stack. | ||
internal const int DefaultPathBufferSize = 256; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if the team has taken any measurements on this? What is typical path size?
I would choose half of the maximum i.e. 512 so as to always have no more than one fallback (bufferSize *= 2).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good question, I don't know if we have data around this. Since this number is used in some other APIs (and this const is now consumed in those places) I would rather not make that change now. Feel free to open an issue though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This will give us a new const for Unix, while we already have Interop.Kernel32.MAX_PATH
for Windows. It would be nice to have one const for both platforms.
/// Allows creation of symbolic links when the process is not elevated. Starting with Windows 10 Insiders build 14972. | ||
/// Developer Mode must first be enabled on the machine before this option will function. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the comment correct? Docs don't say about developer mode now
https://docs.microsoft.com/en-us/windows/win32/fileio/maximum-file-path-limitation?tabs=cmd#enable-long-paths-in-windows-10-version-1607-and-later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- It's mentioned in the blog post under the "Background" section: https://blogs.windows.com/windowsdeveloper/2016/12/02/symlinks-windows-10/
- It's also documented in the
dwFlags
parameter of the API: https://docs.microsoft.com/en-us/windows/win32/api/winbase/nf-winbase-createsymboliclinka#parameters - This site is unofficial but is usually very well documented and also mentions it: https://ss64.com/nt/mklink.html
src/libraries/Common/src/Interop/Windows/Kernel32/Interop.DeviceIoControl.cs
Show resolved
Hide resolved
// https://msdn.microsoft.com/library/windows/hardware/ff552012.aspx | ||
// We don't need all the struct fields; omitting the rest. | ||
[StructLayout(LayoutKind.Sequential)] | ||
public unsafe struct REPARSE_DATA_BUFFER | ||
{ | ||
public uint ReparseTag; | ||
public ushort ReparseDataLength; | ||
public ushort Reserved; | ||
public SymbolicLinkReparseBuffer ReparseBufferSymbolicLink; | ||
|
||
[StructLayout(LayoutKind.Sequential)] | ||
public struct SymbolicLinkReparseBuffer | ||
{ | ||
public ushort SubstituteNameOffset; | ||
public ushort SubstituteNameLength; | ||
public ushort PrintNameOffset; | ||
public ushort PrintNameLength; | ||
public uint Flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since C# doesn't support C++ UNION I think it is better to define flat struct with REPARSE_DATA_BUFFER_SYMBOLICLINK name.
// https://msdn.microsoft.com/library/windows/hardware/ff552012.aspx | |
// We don't need all the struct fields; omitting the rest. | |
[StructLayout(LayoutKind.Sequential)] | |
public unsafe struct REPARSE_DATA_BUFFER | |
{ | |
public uint ReparseTag; | |
public ushort ReparseDataLength; | |
public ushort Reserved; | |
public SymbolicLinkReparseBuffer ReparseBufferSymbolicLink; | |
[StructLayout(LayoutKind.Sequential)] | |
public struct SymbolicLinkReparseBuffer | |
{ | |
public ushort SubstituteNameOffset; | |
public ushort SubstituteNameLength; | |
public ushort PrintNameOffset; | |
public ushort PrintNameLength; | |
public uint Flags; | |
// https://msdn.microsoft.com/library/windows/hardware/ff552012.aspx | |
// We don't need all the struct fields; omitting the rest. | |
[StructLayout(LayoutKind.Sequential)] | |
public unsafe struct REPARSE_DATA_BUFFER | |
{ | |
public uint ReparseTag; | |
public ushort ReparseDataLength; | |
public ushort Reserved; | |
public ushort SubstituteNameOffset; | |
public ushort SubstituteNameLength; | |
public ushort PrintNameOffset; | |
public ushort PrintNameLength; | |
public uint Flags; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The nested struct was added for clarity and easier manipulation.
Do you have any concerns around having a nested struct, as opposed to having it all flatly defined in the top struct?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only concern is that we will have to change this and define some structs if we will want to support other reparse points. (mount points and others). This is that we have in PowerShell code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
According to the documentation, the C++ struct containing nested union structs can define the data for a symbolic link, for a mount point and the generic type:
typedef struct _REPARSE_DATA_BUFFER {
ULONG ReparseTag;
USHORT ReparseDataLength;
USHORT Reserved;
union {
struct {
USHORT SubstituteNameOffset;
USHORT SubstituteNameLength;
USHORT PrintNameOffset;
USHORT PrintNameLength;
ULONG Flags;
WCHAR PathBuffer[1];
} SymbolicLinkReparseBuffer;
struct {
USHORT SubstituteNameOffset;
USHORT SubstituteNameLength;
USHORT PrintNameOffset;
USHORT PrintNameLength;
WCHAR PathBuffer[1];
} MountPointReparseBuffer;
struct {
UCHAR DataBuffer[1];
} GenericReparseBuffer;
} DUMMYUNIONNAME;
} REPARSE_DATA_BUFFER, *PREPARSE_DATA_BUFFER;
Correct me if I'm wrong, but I think the code can be preserved as it is right now, and when the time comes to support junctions and others, we add the extra structs as needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we add the extra structs as needed.
I mean we can not use the same name REPARSE_DATA_BUFFER for all the structs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean we can not use the same name REPARSE_DATA_BUFFER for all the structs.
Why not? My idea was to use REPARSE_DATA_BUFFER for all the pinvokes related to reparse points. Similar to what we have for symbolic links here:
https://github.com/dotnet/runtime/blob/55c1c67ad9653e82ca5d63f6bc47721321891af8/src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Windows.cs#L503
For mount points we would do rdb.MountPointReparseBuffer.SubstituteNameOffset
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using the same name for different purposes will be misleading. For example, there will be references with the same name in searches. This will make it difficult to understand and research the code.
src/libraries/System.Private.CoreLib/src/Resources/Strings.resx
Outdated
Show resolved
Hide resolved
GetFinalLinkTarget(linkPath, isDirectory) : | ||
GetImmediateLinkTarget(linkPath, isDirectory, throwOnNotFound: true, normalize: true); | ||
|
||
return targetPath == null ? null : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
return targetPath == null ? null : | |
return targetPath is null ? null : |
src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Windows.cs
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Windows.cs
Show resolved
Hide resolved
Win32Marshal.GetExceptionForLastWin32Error(linkPath); | ||
} | ||
|
||
char* buffer = stackalloc char[Interop.Kernel32.MAX_PATH]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can it be long path?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I discussed this with @jozkee. We should verify what happens if we pass a path that's larger than MAX_PATH
. if the API can fail and tell us that the buffer was not large enough, then we could apply similar logic to what we added in the Unix side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@iSazonov : It can.
src/libraries/System.Private.CoreLib/src/System/IO/Directory.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/System/IO/FileSystem.Windows.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/tests/Base/SymbolicLinks/BaseSymbolicLinks.FileSystemInfo.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.IO.FileSystem/tests/Base/SymbolicLinks/BaseSymbolicLinks.FileSystem.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/tests/TestUtilities/System/IO/FileCleanupTestBase.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CreateSymbolicLink.cs
Outdated
Show resolved
Hide resolved
@dotnet/runtime-infrastructure the failed CI legs had a problem with Docker:
I tried restarting them but they failed again. Any clues why this is happening? |
This is only for the linker tests. And it's because the container failed to initialize it looks like, it seems to be unable to download the image, but I can fetch it locally. Seems like a timeout because it's a huge image (3.4 gb) and it timed out during link verification. The other legs are seeing some instability in using AzDO NuGet feeds. |
src/libraries/Common/src/Interop/Windows/Kernel32/Interop.CreateSymbolicLink.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Private.CoreLib/src/Resources/Strings.resx
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Native/Interop.ReadLink.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but:
System.IO.Tests.Directory_SymbolicLinks.ResolveLinkTarget_ReturnFinalTarget_MaxFollowedLinks
test is failing in one of the Windows CI legs. Please fix it before merging- If we don't know what should be the default value of
bool returnFinalTarget
(Add symbolic link APIs #54253 (comment)) IMO we should not provide default value and the users should provide it in explicit way.
That sounds like a good approach to me. |
This was due Windows not being consistent with the reparse point limit specified in the docs. I've loosen the validation so we now check it completes successfully with small chains of links (1, 10, 40) and check that a nice exception is thrown with large ones (100).
Done but this is an API change. cc @bartonjs @terrajobst |
I'm not opposed to removing the default but I generally don't like "we can't agree so we make the user choose"-line of thinking. The reason being that we're the experts and most users won't be. There are cases where the purpose of the parameter is immediately clear (e.g. Curious what @bartonjs thinks. |
If we would have |
My standard starting position is to ask the FDG what they have to say about it:
OK, so if it's possible we should provide a good default. So, what makes a good default in this case? I think "predictability". Let's imagine that we didn't have the parameter at all... so it's just So now lets just collapse that and look at it with a default parameter. The name of the method is ResolveTarget. Resolve is, admittedly, slightly ambiguous, so let's pretend it's named GetTarget. Well, GetTarget is obviously "give me the target value of this symlink", so The problem with recursive being the default is that it has unexpected behaviors. If we have no work limiter, then mutually referential symlinks cause an infinite loop:
If we do have a work limiter then mutually referential symlinks will.. do something. Throw, return the last one it saw when it exhausted the work factor, something. Whatever it does, you probably didn't expect it. Sure, the docs'll mention it, but no one reads the docs until something goes wrong. To me, from a logical perspective (of how I think about things), "once" is the correct default. To me, from a security perspective (not "hiding" unexpected behaviors in server products), "once" is the correct default. So, to me, we should just figure out what makes us happy with that choice (e.g. renaming Resolve to Get). |
From a logical perspective wouldn't it make more sense to have `public class Symbolic Link'? |
@iSazonov it might be but being so close to the release, I don't think its a good idea to try to do another API from scratch. We can evaluate the |
… real temp path Bring back previous InlineData since it wasn't the cause of the CI issue
@jozkee @carlossanlop @adamsitnik Thanks for great work! @jozkee I fully trust MSFT team experience. Nevertheless I am a bit nervous about link APIs based on experiences we got in the PR from discussions of details. :-) I hope we will continue discussions in proper issue - what is the issue? |
@iSazonov Thanks for all of your input along the way, and we certainly want and invite more input and feedback. By getting this merged in, we will be able to include these APIs in .NET 6.0 Preview 7, and solicit feedback during RC & RC2. |
Fixes #24271
Co-authored with @jozkee
PTAL @iSazonov @mklement0 @tmds @carlreinke