Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux - FileStream.Length, FileStream.Seek & File.ReadAllBytes misbehaves with files in procfs #25571

Closed
VasiliyNovikov opened this issue Mar 21, 2018 · 18 comments · Fixed by dotnet/corefx#28388
Assignees
Labels
area-System.IO bug os-linux Linux OS (any supported distro)
Milestone

Comments

@VasiliyNovikov
Copy link
Contributor

I did the following experiment:

using System;
using System.IO;

namespace TestProcStatFile
{
    class Program
    {
        static void Main(string[] args)
        {
            const string procStat = "/proc/stat";
            using (var file = File.OpenRead(procStat))
            {
                Console.WriteLine($"Stream length: {file.Length}");
                Console.WriteLine($"Actual length (via File.ReadAllBytes): {File.ReadAllBytes(procStat).Length}");
                Console.WriteLine($"Actual length (via File.ReadAllText): {File.ReadAllText(procStat).Length}");

                Console.WriteLine($"Can seek: {file.CanSeek}");
                Console.WriteLine($"Seek(8192), Position: {file.Seek(8192, SeekOrigin.Begin)}");
                try
                {
                    file.Seek(0, SeekOrigin.End);
                }
                catch (Exception e)
                {
                    Console.WriteLine($"Seek from end: {e}");
                }
            }
        }
    }
}

Which reports the following:

Stream length: 0
Actual length (via File.ReadAllBytes): 0
Actual length (via File.ReadAllText): 5085
Can seek: True
Seek(8192), Position: 8192
Seek from end: System.IO.IOException: Invalid argument
   at System.IO.FileStream.CheckFileCall(Int64 result, Boolean ignoreNotSupported)
   at System.IO.FileStream.Seek(Int64 offset, SeekOrigin origin)
   at TestProcStatFile.Program.Main(String[] args) in /home/netem/RiderProjects/TestProcStatFile/Program.cs:line 21

https://github.com/dotnet/corefx/blob/a8cfc6a498f0455c59b0030f74cba4dabeee86e1/src/System.IO.FileSystem/src/System/IO/File.cs#L320
This guy relays on FileStream.Length. I'm thinking if FileStream.Length should actually throw for files in procfs and CanSeek should return false for them and File.ReadAllBytes should check CanSeek and if it is false then don't pre-allocate buffer of the final size but allocate it dynamically while reading

@tmds
Copy link
Member

tmds commented Mar 22, 2018

Almost all files in proc have zero size and should be read till EOF.
These files represent some live state of the system, and they don't have a content until you request the state (i.e. read).
You can seek these files. Seeking to the start is a way of refreshing the content of the file.

Since this is a special file system, I suggest to not make FileStream more complex to support this.
We could do some internal implementation that could be shared between various readers of proc in the corefx repo.

@VasiliyNovikov
Copy link
Contributor Author

VasiliyNovikov commented Mar 22, 2018

Shouldn't we at least fix File.ReadAllBytes to not return empty result?

@tmds
Copy link
Member

tmds commented Mar 22, 2018

The challenge is that for proc (and similar) files we need to allocate a buffer up-front (and deal with it maybe being too small). Whereas for real empty files we don't want to be allocating a buffer at all.
And there isn't a good way afaik to distinguish between these files.

@VasiliyNovikov
Copy link
Contributor Author

Maybe there is some API for that... E.g. stat -f /proc/stat shows it:

  File: "/proc/stat"
    ID: 0        Namelen: 255     Type: proc
Block size: 4096       Fundamental block size: 4096
Blocks: Total: 0          Free: 0          Available: 0
Inodes: Total: 0          Free: 0

@tmds
Copy link
Member

tmds commented Mar 22, 2018

You mean, stat shows the Type 'proc'? This behavior isn't limited to the proc file system, any file system can do this.

@VasiliyNovikov
Copy link
Contributor Author

You mean, stat shows the Type 'proc'?

Yes

This behavior isn't limited to the proc file system, any file system can do this.

What else can behave this way? Here is only proc mentioned:

For most files under the /proc directory, stat() does not return the file size in the st_size field; instead the field is returned with the value 0.

And I also tried to get size of files under /sys - it returns non-0 size

@tmds
Copy link
Member

tmds commented Mar 22, 2018

What else can behave this way? Here is only proc mentioned:

Any file system can. I have seen application that export a file system via fuse (https://github.com/libfuse/libfuse) which behave the same way.

@VasiliyNovikov
Copy link
Contributor Author

VasiliyNovikov commented Mar 22, 2018

I see. Thanks
What if we have some hardcoded list of well-know file systems - for all of them we know the behavior of file size. If file system is not in that list then assume file size is not reliable .... or vice versa?

@VasiliyNovikov
Copy link
Contributor Author

Would it make sense to fix it not 100% of cases but 99 - only for known file systems?

@svick
Copy link
Contributor

svick commented Mar 22, 2018

@tmds

Whereas for real empty files we don't want to be allocating a buffer at all.
And there isn't a good way afaik to distinguish between these files.

Would it work if you tried to read a single byte (or read into a small stack-allocated buffer)? If that succeeds, you know the file size is lying and you have to deal with it in a special way.

This would mean some seemingly unnecessary allocations and copying for proc and similar cases, but I think that's still better than returning the wrong result. And for actual empty files, there would be no heap allocations, and the total overhead should be fairly small.

@VasiliyNovikov
Copy link
Contributor Author

Another thing we could try is to use the fact that seeking from the end fails for such files (but this needs to be tested with other file systems that return 0 size to make sure they behave the same way)

@tmds
Copy link
Member

tmds commented Mar 22, 2018

@svick it depends on the file system, but I think it should work for special file systems from the kernel. A small stack-allocated buffer would be good. Then we may have the full data already and we only need to copy it into a byte array.

@VasiliyNovikov
Copy link
Contributor Author

And we can combine 2 approaches - for well-known file systems use hardcoded behavior, for unknown - test the file itself

@VasiliyNovikov
Copy link
Contributor Author

VasiliyNovikov commented Mar 22, 2018

Maybe I'm going too crazy with checking file system type. Probably reading just small buffer is even cheaper than determining file system type and it is definitely much simpler in terms of implementation
Q. Can Windows & OSX have the same behavior for some special file systems/files?

@tmds
Copy link
Member

tmds commented Mar 22, 2018

Probably reading just small buffer is even cheaper than determining file system type and it is definitely much simpler in terms of implementation

Yes, a stackalloced read should be enough.
If you implement this, please verify it gives the same result for every file in proc (when using a tiny stackalloced buffer).

Q. Can Windows & OSX have the same behavior for some special file systems/files?

I don't know for sure, but I'd assume this is Linux specific behavior.

@VasiliyNovikov
Copy link
Contributor Author

B.T.W. You mentioned FUSE and I found this: https://osxfuse.github.io/ . Could this mean that OSX can also have the same behavior for those file systems?

@tmds
Copy link
Member

tmds commented Mar 22, 2018

Probably it can. It's fine if the change applies to all Unixes.

@VasiliyNovikov
Copy link
Contributor Author

Thanks!

@stephentoub stephentoub self-assigned this Mar 22, 2018
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 17, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO bug os-linux Linux OS (any supported distro)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants