-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
os: "async" file IO #6817
Comments
I don't think there is asynchronous stat syscall available, and IIRC, most event based web servers take great pain to optimize the stat(2)-taking-up-a-thread problem (e.g. dedicated stat(2) thread pools) Similarly for readdir, is there a pollable version available? I don't know if readdir/stat is contributing to the godoc problem, but I think they might be a problem if the GOPATH is large enough. |
This continually bites me. I have an interface that has both network and filesystem implementations and the network one works great (firing off a bounded number of goroutines: say, 1000) but then the filesystem implementation of the interface kills the OS, and my code has to manually limit itself, which feels like a layering violation. The runtime should do the right thing for me. runtime/debug.SetMaxThreads sets the max threads Go uses before it blows up. If we can't do async filesystem IO everywhere (and I don't think we can), then I think we should have something like runtime/debug.SetMaxFSThreads that controls the size of the pool of thread doing filesystem operations but blocks instead of crashes. That means for the read/write syscalls, we'd have to know which fds are marked non-blocked (network stuff) and which aren't. Or we put all this capping logic in pkg/os, perhaps opt-in. pkg os func SetMaxFSThreads(n int) |
(This came up in a conversation today and I wanted to make sure people don't start off with incorrect assumptions.) I want to be really clear on this: there is no such thing as regular file or dir I/O that wouldn't wait for disk on cache miss. I am not talking about serial ports or such here, but files and directories. Regular files are always "ready" as far as The network poller / epoll has nothing to contribute to here. There is no version of People have been talking about extending Linux to implement non-waiting file IO (e.g. http://lwn.net/Articles/612483/ ) but that's not realistic today. I don't see Go having much choice beyond threads doing syscalls. The real question in my mind is, is there a way to limit syscall concurrency to avoid swamping the CPU/OS with too many threads, while still avoiding deadlocks. And just to minimize chances of confusion, file AIO ("Async I/O") is something very different, and not applicable to this conversation. It's a very restrictive API (actually, multiple), bypasses useful features like caching, and doesn't necessarily perform well at all. |
What is wrong with io_submit? |
@dvyukov io_submit is the Linux AIO API (as opposed to POSIX AIO). It's a separate codepath and dependent on the filesystem doing the right thing; the implementations have been problematic, and using aio introduces a whole bunch of risk. The original implementation assumed O_DIRECT and this air remains; non-O_DIRECT operation is even more problematic. O_DIRECT is not safe to use for generic file operations because others accessing the file will use buffer cache. Without O_DIRECT e.g. the generic version of io_submit falls back to synchronous processing. Some filesystems don't handle unaligned accesses well. In some circumstances (e.g. journaling details, file space not preallocated, etc), io_submit has to wait until the operation completes, instead of just submitting an async request; this tends to be more typical without O_DIRECT. The default limit for pending requests is only 128; after that io_submit starts blocking. Finally, io_submit only helps with the basic read/write workload, no I'm not saying it won't work, but I also would not be surprised if a change moving file IO to io_submit got reverted within a few months. |
OK, then everybody should switch to Windows :) |
Why are we talking about POSIX AIO here? Go's syscall package needs to
use raw syscalls and shouldn't depend on any interfaces partly implemented
in userspace (e.g. glibc's POSIX AIO is partly implemented in user space.)
|
CL https://golang.org/cl/36799 mentions this issue. |
CL https://golang.org/cl/36800 mentions this issue. |
This will make it possible to use the poller with the os package. This is a lot of code movement but the behavior is intended to be unchanged. Update #6817. Update #7903. Update #15021. Update #18507. Change-Id: I1413685928017c32df5654ded73a2643820977ae Reviewed-on: https://go-review.googlesource.com/36799 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: David Crawshaw <crawshaw@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
This changes the os package to use the runtime poller for file I/O where possible. When a system call blocks on a pollable descriptor, the goroutine will be blocked on the poller but the thread will be released to run other goroutines. When using a non-pollable descriptor, the os package will continue to use thread-blocking system calls as before. For example, on GNU/Linux, the runtime poller uses epoll. epoll does not support ordinary disk files, so they will continue to use blocking I/O as before. The poller will be used for pipes. Since this means that the poller is used for many more programs, this modifies the runtime to only block waiting for the poller if there is some goroutine that is waiting on the poller. Otherwise, there is no point, as the poller will never make any goroutine ready. This preserves the runtime's current simple deadlock detection. This seems to crash FreeBSD systems, so it is disabled on FreeBSD. This is issue 19093. Using the poller on Windows requires opening the file with FILE_FLAG_OVERLAPPED. We should only do that if we can remove that flag if the program calls the Fd method. This is issue 19098. Update #6817. Update #7903. Update #15021. Update #18507. Update #19093. Update #19098. Change-Id: Ia5197dcefa7c6fbcca97d19a6f8621b2abcbb1fe Reviewed-on: https://go-review.googlesource.com/36800 Run-TryBot: Ian Lance Taylor <iant@golang.org> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Russ Cox <rsc@golang.org>
Just to add some numbers to the discussion, we're facing this problem as well. Running fio on an Amazon EC2 i3.large instance, we're able to get 64K IOPS on a 4K block size, using 8 jobs, file size = 4G for random reads. (Other times, I've seen it go close to 100K IOPS) We created a small Go program to do the exact same thing using Goroutines. And it doesn't budge above 20K IOPS. In fact, the throughput won't increase any further once the number of Goroutines reach the number of cores. This strongly indicates that Go is paying the cost of context switching, because of doing blocking read in every iteration of the loop. Full Go code here: https://github.com/dgraph-io/badger-bench/blob/master/randread/main.go It seems like using an async IO is the only way to achieve IO throughput in Go. SSDs are able to push more and more throughput with every release; so there has to be a way in Go to realize that. |
@manishrjain, what Btw, your benchmark has a global mutex (don't use |
This is the fio command on my computer, and the output:
And the corresponding Go program: I switched from using global rand to local rand, and it doesn't show up in block profiler or cpu profiler. Fio is getting 43.8K IOPS. My program in Go is giving me ~25K, checked via |
@tv42 @rsc Sorry for jumping late in the dead/old discussion.
Would it be acceptable to expose the file IO semantics (DIO/AIO) and let programers decide? It is a hard decision for golang because compiler/runtime can't know the underlying storage media speed. But programers especially those who write purpose-built storage components with go, should know better the targeting storage. As a targeting example, it would be possible to write a similar program like |
@bergwolf In a sense the semantics are already exposed via the golang.org/x/sys/unix package, which lets you do anything the system supports. I don't see how it would make sense to expose these semantics in the os package. That would add a lot of complexity for the benefit of very very few users. I've got nothing against rewriting the os package to use a different underlying mechanism, such as |
New development: io_uring is two ringbuffers used to message requests and completions about file I/O. It might be promising. Only filesystem files supported right now, can't use it on sockets, pipes etc at this time. |
It's probably dated, but there's a nice little paper at https://pdfs.semanticscholar.org/d4a6/852f0f4cda6cf0431e04b81771eea08f88e2.pdf: "An Attempt at Reducing Costs of Disk I/O in Go" |
The text was updated successfully, but these errors were encountered: