Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Fix File.ReadAllBytes{Async} for virtual file system files #28388

Merged
merged 1 commit into from
Mar 23, 2018

Conversation

stephentoub
Copy link
Member

Some file systems, like procfs, can return 0 for a file's length, even though it actually contains on-demand computed contents. This breaks File.ReadAllBytes{Async}, which currently assumes that a length of 0 means the file is empty. This commit fixes it by changing the assumption to mean that a length of 0 means we can't depend on the Length and instead need to read until EOF.

This brings File.ReadAllBytes inline with the behavior of File.ReadAllText. Note that this does mean that it'll also start OOM'ing in similar situations, e.g. if you try to do File.ReadAllBytes("/dev/urandom"), it'll eventually OOM whereas today it returns an empty array.

Fixes https://github.com/dotnet/corefx/issues/28343
cc: @JeremyKuhne, @tmds, @VasiliyNovikov

}

[Theory]
[PlatformSpecific(TestPlatforms.Linux)]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar test for OSX maybe?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know of any file paths that present this behavior on macOS?

Some file systems, like procfs, can return 0 for a file's length, even though it actually contains on-demand computed contents.  This breaks File.ReadAllBytes{Async}, which currently assumes that a length of 0 means the file is empty.  This commit fixes it by changing the assumption to mean that a length of 0 means we can't depend on the Length and instead need to read until EOF.
@stephentoub stephentoub merged commit 77bf407 into dotnet:master Mar 23, 2018
@stephentoub stephentoub deleted the readallbytes branch March 23, 2018 03:11
@tmds
Copy link
Member

tmds commented Mar 23, 2018

Thanks @stephentoub!

@@ -341,6 +350,50 @@ public static byte[] ReadAllBytes(string path)
}
}

private static byte[] ReadAllBytesUnknownLength(FileStream fs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I missed something but it doesn't seem to throw new IOException(SR.IO_FileTooLong2GB) in case of newLength > int.MaxValue like it is done in ReadAllBytes for the case where length is known. And it seems like DefaultArrayPool.Rent doesn't throw either

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this stops when Rent throws out-of-memory which will happen when you ask it to allocate more than MaxByteArrayLength (or when there is no more memory).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't see such limit in DefaultArrayPool:


So it will throw only on some runtime limit or out of memory which seem to be inconsistent with known length case

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

MaxByteArrayLength is the max array length supported by .NET.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Thanks! That's what I actually didn't realize

uint newLength = (uint)buffer.Length * 2;
if (newLength > MaxByteArrayLength)
{
newLength = (uint)Math.Max(MaxByteArrayLength, buffer.Length + 1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't there be signed integer overflow in buffer.Length + 1 at some point? We don't seem to have any hard limit for this whole loop. Maybe you actually meant Math.Min here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For buffer.Length + 1 > MaxByteArrayLength case, Rent will end up throwing (I think).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do I understand it correctly - when buffer.Length becomes MaxByteArrayLength on all further iterations it grows by one byte?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tmds That's the thing - DefaultArrayPool.Rent implementation doesn't seem have any limits (or maybe I missed one)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I see this is how it behaves:
On each iteration size doubles;
Then it tries MaxByteArrayLength;
And on the next iteration (buffer.Length + 1), Rent throws.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I'm looking into DefaultArrayPool.Rent implementation and in case requested length is greater than supported by pool it just creates an array of requested length:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So newermind. I didn't realize that MaxArrayLength is the actual maximum supported by .NET

@VasiliyNovikov
Copy link
Contributor

Q. Didn't we suppose to limit this fix only for Unixes?

@stephentoub
Copy link
Member Author

Didn't we suppose to limit this fix only for Unixes?

I considered that but decided not to special case because a) this only applies to the minor case where the file length is 0, b) it adds minimal cost to that case (one read operation), c) it actually saves some cost in that case as well by using Array.Empty<byte>() rather than allocating a new byte[0], and d) with the advent of the Windows Subsystem for Linux it seems feasible that at some point we could find ourselves in a similar situation on Windows, or potentially with other Windows file system drivers in general, though I don't have a concrete example.

If others disagree, though, it's trivial to change it to just be for non-Windows. @JeremyKuhne?

@VasiliyNovikov
Copy link
Contributor

VasiliyNovikov commented Mar 23, 2018

d) with the advent of the Windows Subsystem for Linux it seems feasible that at some point we could find ourselves in a similar situation on Windows

Strictly speaking when it runs under WSL it recognizes itself running on Linux so Windows-only code is not executed there

@stephentoub
Copy link
Member Author

When it runs under WSL it recognizes itself running on Linux so Windows-only code is not executed there

I understand that. I'm saying that this forward-proofs it against a potential future where you can access those paths from outside of the subsystem. Or against other arbitrary file system drivers that behave similarly.

@VasiliyNovikov
Copy link
Contributor

I see. Thanks

@JeremyKuhne
Copy link
Member

it's trivial to change it to just be for non-Windows. @JeremyKuhne?

I'm fine with it being included on Windows for the purposes you mentioned.

@karelz karelz added this to the 2.1.0 milestone Mar 27, 2018
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…refx#28388)

Some file systems, like procfs, can return 0 for a file's length, even though it actually contains on-demand computed contents.  This breaks File.ReadAllBytes{Async}, which currently assumes that a length of 0 means the file is empty.  This commit fixes it by changing the assumption to mean that a length of 0 means we can't depend on the Length and instead need to read until EOF.

Commit migrated from dotnet/corefx@77bf407
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Linux - FileStream.Length, FileStream.Seek & File.ReadAllBytes misbehaves with files in procfs
6 participants