Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Win32 FileStream will issue a seek on every ReadAsync call #16354

Closed
ayende opened this issue Feb 11, 2016 · 15 comments
Closed

Win32 FileStream will issue a seek on every ReadAsync call #16354

ayende opened this issue Feb 11, 2016 · 15 comments
Assignees
Labels
area-System.IO enhancement Product code improvement that does NOT require public API changes/additions os-windows tenet-performance Performance related issue
Milestone

Comments

@ayende
Copy link
Contributor

ayende commented Feb 11, 2016

Full description is here:
https://ayende.com/blog/173345/the-cost-of-async-i-o?key=227b21e219594be48ca03bc8312b08b9

But the gist of it is that we are trying to call ReadAsync on a file that is located on a remote share.
Using ReadAsync, we saw a really horrible performance degradation.

We tracked it down to this issue:

image

And here is the problematic code:
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/Win32FileStream.cs#L1201

Given that all async calls are always passing the position they use in the overlap, and how expensive this call is, it make for a very bad combination.

@stephentoub
Copy link
Member

cc; @ericstj, @ianhays. @JeremyKuhne

@ericstj
Copy link
Member

ericstj commented Feb 11, 2016

The uber problem here is that FileStream's contract is to update the Length and Position as part completing the synchronous portion of the Read. Folks are seeing the same sorts of problems on Writes where we call setlength before the write.

The fix here is to change that contract. If FileStream kept track of it's own Length and Position in the case of an async handle and was less aggressive about making sure the handle was in sync we'd fix these perf issues. The downside is that this changes the externally observable behavior of the file/handle (if shared) so we'd need to quirk it.

@ayende
Copy link
Contributor Author

ayende commented Feb 11, 2016

Why do you need to change the contract?
You can do that only if the handle is shared. If the handle isn't shared, you know that no one is going to look at its state, so you don't need to update it.

The code is already tracking this:
https://github.com/dotnet/corefx/blob/master/src/System.IO.FileSystem/src/System/IO/Win32FileStream.cs#L54

If _exposedHandle is false, there is no need to set the handle's values.

@ericstj
Copy link
Member

ericstj commented Feb 11, 2016

I was referring more to the write side of this bug. The length of the file is observable from an external process.

@ayende
Copy link
Contributor Author

ayende commented Feb 11, 2016

Still the same thing, though. It is the OS responsibility to update the file length if the file length changed.

What is the scenario where an external process require you to set the file pointer if the handle isn't available externally?

@ericstj
Copy link
Member

ericstj commented Feb 11, 2016

It is the OS responsibility

Ideally yes, but we set a precedent in desktop otherwise (set length before write), thus the need to be careful about this fix. I'm not saying we don't do this, just saying we need to be careful about existing externally observable behavior.

@omariom
Copy link
Contributor

omariom commented Feb 12, 2016

seek() + write() vs pwrite()
though not on Windows.

@ayende
Copy link
Contributor Author

ayende commented Feb 13, 2016

My test was run on Windows.
Both the source machine and the network drive are windows machines.

*Hibernating Rhinos Ltd *

Oren Eini* l CEO l *Mobile: + 972-52-548-6969

Office: +972-4-622-7811 *l *Fax: +972-153-4-622-7811

On Fri, Feb 12, 2016 at 7:03 PM, OmariO notifications@github.com wrote:

seek() + write() vs pwrite()
https://www.lmax.com/blog/staff-blogs/2015/07/03/seek-write-vs-pwrite/
though not on Windows.


Reply to this email directly or view it on GitHub
https://github.com/dotnet/corefx/issues/6039#issuecomment-183413646.

@jgowdy
Copy link

jgowdy commented Feb 19, 2016

@omariom You can simulate pwrite on Windows by using WriteFile with an OVERLAPPED structure with hEvent set to 0, and the 64bit file pointer set via Offset and OffsetHigh.

@JeremyKuhne JeremyKuhne changed the title Win32FileStream will issue a seek on every ReadAsync call Win32 FileStream will issue a seek on every ReadAsync call Jan 15, 2020
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 23, 2020
@JeremyKuhne JeremyKuhne removed the untriaged New issue has not been triaged by the area owner label Mar 3, 2020
@adamsitnik adamsitnik self-assigned this Jan 15, 2021
@adamsitnik adamsitnik modified the milestones: Future, 6.0.0 Jan 15, 2021
@adamsitnik
Copy link
Member

adamsitnik commented Jan 15, 2021

Another problem with seek and read is that it's not a single operation, so it's not atomic (and not thread-safe as at least on Linux the file offset is shared between all threads and processes).

An excellent read for Linux: https://learning.oreilly.com/library/view/The+Linux+Programming+Interface/9781593272203/xhtml/ch05.xhtml#ch05lev1sec06

I am going to try to figure a way to use pread instead (and find a similar solution for Windows implementation)

@stephentoub
Copy link
Member

it's not a single operation, so it's not atomic

Why is that a problem?

@adamsitnik
Copy link
Member

Why is that a problem?

We perform the extra seek to verify that nobody else has changed the file offset in the meantime (because we want to throw in such situations). Despite doing it and paying the perf price for it we can still miss the situation when it got modified after the call to seek and before the call to read?

@stephentoub
Copy link
Member

because we want to throw in such situation

Personally, I think this existing attempt at protection is inherently flawed in purpose. The implementation doesn't try to make it work, but rather simply tries to detect misuse, and as you say, such protection is broken, anyway. Maybe I'm misunderstanding the suggestion, but switching to pread suggests to me you're not just trying to be more accurate in the misuse detection, but actually aim to make the scenario work (rather than detect misuse). I don't think that needs to be a goal; personally I think we can just delete the flawed attempt at protection. What did you have in mind for pread?

@ayende
Copy link
Contributor Author

ayende commented Jan 17, 2021

It is allowed to have multiple concurrent ReadAsync ?

On Unix, it looks like there is a semaphore for it:

// The options available on Unix for writing asynchronously to an arbitrary file
// handle typically amount to just using another thread to do the synchronous write,
// which is exactly what this implementation does. This does mean there are subtle
// differences in certain FileStream behaviors between Windows and Unix when multiple
// asynchronous operations are issued against the stream to execute concurrently; on
// Unix the operations will be serialized due to the usage of a semaphore, but the
// position /length information won't be updated until after the write has completed,
// whereas on Windows it may happen before the write has completed.

So that will ensure that the position is changed in step. That make me think that on Linux, you can just drop this (assuming that the semaphore applies to write as well).

But on Windows, there is this comment that is interesting:

// WriteFile should not update the file pointer when writing
// in overlapped mode, according to MSDN. But it does update
// the file pointer when writing to a UNC path!
// So changed the code below to seek to an absolute
// location, not a relative one. ReadFile seems consistent though.

Seems to indicate that the position may change unexpected with writes.
However, this is already handled because you are using overlapped I/O.

It looks like the Seek is there because it wants to set the position for the next call. But all the calls on the FileStream are actually using the overlapped, no? So the scenario in question is trying to keep it in sync for someone who passed a handle and is also using that in conjunction with async reads?

Seems far fetched.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 4, 2021
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 17, 2021
@adamsitnik
Copy link
Member

Fixed by #49975, please expect a blog post with all the details and perf numbers in April (before we ship preview 4)

@ghost ghost locked as resolved and limited conversation to collaborators Apr 22, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO enhancement Product code improvement that does NOT require public API changes/additions os-windows tenet-performance Performance related issue
Projects
None yet
10 participants