-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Address some System.Formats.Tar TODOs (infra and syscalls) #69107
Conversation
… the test project.
…e the Unix specific ArchivingUtils file too.
…w if destination file exists when creating archive.
…jor and devminor get their correct values.
…s. Simplify FileStreamOptions objects too.
Tagging subscribers to this area: @dotnet/area-system-io Issue DetailsContributes to #68230 Here's the first batch of pending tasks and feedback from the original PR.
|
entry._header._uName = "";// Interop.Sys.GetUName(); | ||
entry._header._gName = "";// Interop.Sys.GetGName(); | ||
entry._header._uName = Interop.Sys.GetUName(status.Uid) ?? string.Empty; | ||
entry._header._gName = Interop.Sys.GetGName(status.Gid) ?? string.Empty; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to cache these in a Dictionary to avoid looking them up each time over?
And, I think the user may want to be able to control the the user id and name that get stored in the tar file.
I think it would be meaningful if the API had option classes that control the creation behavior, and extraction behavior. These classes can be extended later on, rather than adding new overloads to the existing methods.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You probably want to cache these in a Dictionary to avoid looking them up each time over?
Good point. If I already know the uname
and gname
associated to a uid
and gid
, there's no need to retrieve them again.
And, I think the user may want to be able to control the the user id and name that get stored in the tar file.
This code path is only reached if the user calls TarWriter.WriteEntry(string fileName, string? entryName)
. If they wish to choose the uname
, gname
, uid
, gid
to other values than the default, they can create the TarEntry
instance manually, fill out its fields, and use the other overload TarWriter.WriteEntry(TarEntry entry)
.
I think it would be meaningful if the API had option classes that control the creation behavior, and extraction behavior. These classes can be extended later on, rather than adding new overloads to the existing methods.
I would welcome that 🙂. We can discuss it in a separate issue.
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarWriter.Unix.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you for addressing my comments @carlossanlop !
src/libraries/Common/src/Interop/Unix/System.Native/Interop.GNameUName.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarFile.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarWriter.Unix.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Native/Interop.GNameUName.cs
Outdated
Show resolved
Hide resolved
src/libraries/Common/src/Interop/Unix/System.Native/Interop.GNameUName.cs
Outdated
Show resolved
Hide resolved
// Output buffer was too small, loop around again and try with a larger buffer. | ||
outputBufferSize = buffer.Length * 2; | ||
|
||
if (outputBufferSize > 256) // Upper limit allowed for login name in kernel |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we are limiting to 256 anyway, can't we just stackalloc byte[256]
and use that? Why go through the pain of a loop and ArrayPool Rent/Return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unless it's explicitly documented (pretty much, there's some define -- eg NI_MAXHOST for getnameinfo()) I recommend that we use the stackalloc then resize loop, as in this example:
runtime/src/libraries/Common/src/Interop/Unix/System.Native/Interop.GetCwd.cs
Lines 18 to 45 in 43dd0a7
byte* stackBuf = stackalloc byte[DefaultPathBufferSize]; | |
string? result = GetCwdHelper(stackBuf, DefaultPathBufferSize); | |
if (result != null) | |
{ | |
return result; | |
} | |
// If that was too small, try increasing large buffer sizes | |
int bufferSize = DefaultPathBufferSize; | |
while (true) | |
{ | |
checked { bufferSize *= 2; } | |
byte[] buf = ArrayPool<byte>.Shared.Rent(bufferSize); | |
try | |
{ | |
fixed (byte* ptr = &buf[0]) | |
{ | |
result = GetCwdHelper(ptr, buf.Length); | |
if (result != null) | |
{ | |
return result; | |
} | |
} | |
} | |
finally | |
{ | |
ArrayPool<byte>.Shared.Return(buf); | |
} |
I did find at least one case in the pal where it mallocs the buffer and resizes it for you, but it seems our pattern is to do all this in managed code.
Ideally it returns the actual length it needs, or you only loop on ERANGE. I'm not sure what is the pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use a similar loop in ReadLink
:
runtime/src/libraries/Common/src/Interop/Unix/System.Native/Interop.ReadLink.cs
Lines 31 to 68 in c2ec86b
internal static string? ReadLink(ReadOnlySpan<char> path) | |
{ | |
int outputBufferSize = 1024; | |
// Use an initial buffer size that prevents disposing and renting | |
// a second time when calling ConvertAndTerminateString. | |
using var converter = new ValueUtf8Converter(stackalloc byte[1024]); | |
while (true) | |
{ | |
byte[] buffer = ArrayPool<byte>.Shared.Rent(outputBufferSize); | |
try | |
{ | |
int resultLength = Interop.Sys.ReadLink( | |
ref MemoryMarshal.GetReference(converter.ConvertAndTerminateString(path)), | |
buffer, | |
buffer.Length); | |
if (resultLength < 0) | |
{ | |
// error | |
return null; | |
} | |
else if (resultLength < buffer.Length) | |
{ | |
// success | |
return Encoding.UTF8.GetString(buffer, 0, resultLength); | |
} | |
} | |
finally | |
{ | |
ArrayPool<byte>.Shared.Return(buffer); | |
} | |
// Output buffer was too small, loop around again and try with a larger buffer. | |
outputBufferSize = buffer.Length * 2; | |
} | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike the GetCwd
and the ReadLink
examples, which do not have an upper bound on the length of the buffer, the uname and gname case do have a max length. So I am now more inclined to just use the 256 byte limit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is that length documented?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The above page says that:
man useradd
indicates a max length of 32. Confirmed in theuseradd
manual https://linux.die.net/man/8/useradd and thegroupadd
manual: https://linux.die.net/man/8/groupadd- The centos kernel max login length is 256, indicated in the
local_lim.h
file asLOGIN_NAME_MAX
. I found this version of that file that confirms this: https://android.googlesource.com/platform/prebuilts/gcc/linux-x86/host/x86_64-linux-glibc2.7-4.6/+/02075080d51c371ae87b9898bf84a085e436ee27/sysroot/usr/include/bits/local_lim.h#83 - The
/etc/passwd
file takes usernames with a max length of 32. This illumOS man page confirms this: https://illumos.org/man/5/passwd
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The limit is implementation specific, so there should be a resize loop.
256 is a good value for the stackalloc to start with because that is the Linux value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that the Linux kernel only cares about the numeric ids. libc handles the names.
https://github.com/bminor/glibc/blob/master/sysdeps/unix/sysv/linux/bits/local_lim.h#L89-L90 defines this (implementation specific) limit for Linux:
/* Maximum login name length. This is arbitrary. */
#define LOGIN_NAME_MAX 256
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's another upper limit. It's in the code example in the manual: https://pubs.opengroup.org/onlinepubs/9699919799/functions/getgrgid.html
long int initlen = sysconf(_SC_GETGR_R_SIZE_MAX);
size_t len;
if (initlen == -1)
/* Default initial length. */
len = 1024;
else
len = (size_t) initlen;
I can use 1024 as the absolute upper limit that would stop the loop, I think that's large enough (that's the value I got when printing _SC_GETGR_R_SIZE_MAX
in Ubuntu).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use 1024 as the absolute upper limit that would stop the loop, I think that's large enough (that's the value I got when printing _SC_GETGR_R_SIZE_MAX in Ubuntu).
I don't think that's necessarily the upper limit, eg., this man page for getgrnam_r says
The call
sysconf(_SC_GETGR_R_SIZE_MAX)
returns either -1, without changing errno, or an initial
suggested size for buf. (If this size is too small, the call
fails with ERANGE, in which case the caller can retry with a
larger buffer.)
In the man page for getgrgid it says it in different words, again referring to getgrnam_r
A call to sysconf(_SC_GETGR_R_SIZE_MAX) returns either -1
without changing errno or an initial value suggested for the size
of this buffer.
Although I see Mono just used it (or 1024) and did not resize without provision for resize, here is a bug report where it was not large enough. https://bugs.ruby-lang.org/issues/9600
src/libraries/Common/src/Interop/Unix/System.Native/Interop.GNameUName.cs
Outdated
Show resolved
Hide resolved
} | ||
|
||
[LibraryImport(Libraries.SystemNative, EntryPoint = "SystemNative_GetGName", StringMarshalling = StringMarshalling.Utf8, SetLastError = true)] | ||
private static unsafe partial string GetGNameInternal(uint uid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does the StringMarshalling know that it should free the char*
that comes back from the native method?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From looking at the generated code in ILSpy, it looks like it does:
[LibraryImport(/*Could not decode attribute arguments.*/)]
[GeneratedCode("Microsoft.Interop.LibraryImportGenerator", "42.42.42.42")]
private unsafe static string GetGroupNameInternal(uint uid)
{
//IL_000b: Unknown result type (might be due to invalid IL or missing references)
byte* __retVal_gen_native = default(byte*);
Utf8StringMarshaller __retVal_gen_native__marshaller = default(Utf8StringMarshaller);
int __lastError;
string __retVal;
try
{
Marshal.SetLastSystemError(0);
__retVal_gen_native = __PInvoke__(uid);
__lastError = Marshal.GetLastSystemError();
((Utf8StringMarshaller)(ref __retVal_gen_native__marshaller)).FromNativeValue(__retVal_gen_native);
__retVal = ((Utf8StringMarshaller)(ref __retVal_gen_native__marshaller)).ToManaged();
}
finally
{
((Utf8StringMarshaller)(ref __retVal_gen_native__marshaller)).FreeNative(); // <-- HERE
}
Marshal.SetLastPInvokeError(__lastError);
return __retVal;
[DllImport("libSystem.Native", EntryPoint = "SystemNative_GetGroupName", ExactSpelling = true)]
[CompilerGenerated]
static extern unsafe byte* __PInvoke__(uint uid);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AaronRobinsonMSFT @elinor-fung @jkoritzinsky - for my knowledge, if I have a P/Invoke where the returned string is shared and shouldn't be free'd, how would I write that? Just return the byte*
pointer in my method and handle it myself? Or is there some built-in way to tell the marshalling code to not free the buffer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if I have a P/Invoke where the returned string is shared and shouldn't be free'd, how would I write that?
That would be a special case and I'm dubious if it is all that common. There is no support for that with the source generator marshallers nor in the built-in marshallers. Feel free to file an issue on that and we can consider it. Full-disclosure, if there isn't a lot of traction on it we'd likely close it at the end of the .NET 8 cycle.
Just return the byte* pointer in my method and handle it myself?
That would be the easiest way to handle this case. Another way would be to copy the appropriate "StringMarshaller" type from source into the project and modify it as desired.
src/libraries/Common/src/Interop/Unix/System.Native/Interop.GNameUName.cs
Outdated
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/src/System/Formats/Tar/TarWriter.Unix.cs
Show resolved
Hide resolved
src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.WriteEntry.File.Tests.Unix.cs
Outdated
Show resolved
Hide resolved
…teEntry.File.Tests.Unix.cs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Nice work here, @carlossanlop.
) * Fix Design time build errors by removing src project as dependency of the test project. * Add Browser to target platform identifiers. Ensure Browser can consume the Unix specific ArchivingUtils file too. * Remove nullable enable from csproj since it's now default * Use FileStream constructor with FileMode.CreateNew to detect and throw if destination file exists when creating archive. * No error checking on syscalls that do not set errno. * Add RDev field in FileStatus and retrieve it with stat/lstat so devmajor and devminor get their correct values. * Simplify some File.Open calls with direct FileStream constructor calls. Simplify FileStreamOptions objects too. * Implement and consume p/invokes to retrieve uname from uid and gname from gid. * size_t variables should not be checked for negative values * FileStream calls simplified to File.OpenRead * Remove check for RDev > 0 * Use dictionary to preserve repeated unames and gnames mapped to uids and gids * Missing documentation in pal_uid.h new PALEXPORT methods. * Adjust syscalls to thread-safe ones. Start with stackalloc, then use loop. * Make dicts readonly and non-nullable, use TryGetValue * Reuse 'GetNameFromUid' from pal_networking.c, move to pal_uid.c, use similar logic for Gid method. Simplify Interop.Sys method. * Remove unnecessary comment Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com> * Put TargetFrameworks back in the first position of the PropertyGroup. * Address eerhardt suggestions * Update src/libraries/System.Formats.Tar/tests/TarWriter/TarWriter.WriteEntry.File.Tests.Unix.cs * Clarify in pal_uid.h methods comments that new memory is returned Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com> Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
* Address some System.Formats.Tar TODOs (infra and syscalls) (#69107) Re-submitting the changes approved in PR: #69107 Addresses: #68230 Includes an extra change to prevent the android build failure with the addition of the new native call to getgrgid_r. Co-authored-by: carlossanlop <carlossanlop@users.noreply.github.com> Co-authored-by: Adam Sitnik <adam.sitnik@gmail.com>
Contributes to #68230
Here's the first batch of pending tasks and feedback from the original PR.