-
-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std: add io_uring library #6356
Conversation
This brings io_uring helper methods to Zig for kernels >= 5.4. We follow liburing's design decisions so that anyone who is comfortable with liburing (https://unixism.net/loti/ref-liburing/index.html) will feel at home. Thanks to @daurnimator for the first draft. Refs: #3083 Signed-off-by: Joran Dirk Greef <joran@coil.com>
This is my first time coding in Zig, and it's been great. Would appreciate as many eyes on this as possible. |
Code coverage should be close to 100%, with the exception of |
The build is failing at |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Im pretty sure that access through unnamed unions can be solved with compile time reflection or if its not enough then with a builtin. By the way c style x->y
can be translated into Zig simply as x.y
instead of x.*.y
Thanks for the tip @Rocknest, done! Another by the way, both |
This looks very useful, but also looks like a pretty apt candidate for being a third party package. It's quite clean, only depending on the std lib, and the std lib has no dependencies on it. (The addition of IORING_SQ_CQ_OVERFLOW alone would be merged immediately). I didn't look too closely, but I get the impression that it makes some implementation decisions on behalf of the API user, which makes it both (1) useful and (2) good candidate for being a third party package. For example, I think when we rework the std lib event loop implementation to additionally support io_uring, it will likely duplicate parts of this code rather than using it strictly as an API user. Any objections to maintaining this outside the std lib? |
Well, but if you consider that the Linux kernel is moving a lot of it's syscalls to this new ioring interface, then it should be part of the Zig std, since this also the case for the current syscall interface. |
Can you explain that a little bit more? We already have the syscalls in the zig std lib: Lines 1194 to 1204 in 3672a18
|
oh I see, you're saying that newer syscalls are being added only exposed via io_uring. So a convenient way to call those syscalls would be needed. |
Yeah so basically that allows to put syscalls like read, write etc in the io_ring queue, and then the kernel process this queue and executes the syscalls, while the process can do other stuff (asynchronous syscalls basically). There is some article regarding this @daurminator might know what I'm referring to. Edit: I think it is this link: https://lwn.net/Articles/810414/ |
Thanks @andrewrk!
No, in fact, the idea was to follow the interface and implementation decisions taken by liburing. liburing is not just any third party package but the defacto userland implementation of io_uring maintained by Jens Axboe, also serving as the test suite for the kernel. This PR contains only the core of what you would need to use io_uring safely, with correct memory barriers and consideration for SQ and CQ overflow and different poll modes, but without exposing the entire surface area of liburing. This is the bare minimum. The io_uring syscalls are not enough. I literally worked through the kernel source and liburing's source full time over three weeks, so I don't think we take any implementation decisions beyond liburing, except for copy_cqes(), which has an open issue in liburing already and which I plan to submit there. But if you look closely and think we do, please let me know and I can always unravel them!
Again, this is almost exactly what the event loop would need and nothing more. For example, with this, you could drop the ugly std lib code needed for the I/O threadpool on linux and make single-threaded mode event loops non-blocking to fix #1908, also solving #5962.
No, but io_uring is the future of I/O in linux. I believe it makes sense to have a first-class io_uring implementation in the std lib, and furthermore something that follows liburing's design decisions. |
@jorangreef make every field in the struct a union. That would be future proof. Be creative instead of waiting that some feature gets added into zig so you can do it the C way. |
It does not, but it is ugly, it does not accomplish what you claim it to be able. |
Ensures that the wakeup flag is read after the tail pointer has been written. It's important to use memory load acquire semantics for the flags read, otherwise the application and the kernel might not agree on the consistency of the wakeup flag, leading to I/O starvation. Refs: axboe/liburing@6768ddc Refs: axboe/liburing#219
Decouples SQE queueing and SQE prepping methods to allow for non-sequential SQE allocation schemes as suggested by @daurnimator. Adds essential SQE prepping methods from liburing to reduce boilerplate. Removes non-essential .link_with_next_sqe() and .use_registered_fd().
Removes non-essential .hardlink_with_next_sqe() and .drain_previous_sqes().
I'm looking forward to reviewing this within a couple days, now that I finished the big branch I was focusing on :-) |
For anyone interested in how this performs, the Coil team put together a range of file system and networking benchmarks, comparing syscalls through io_uring with blocking syscalls or epoll, and specifically benchmarking various Zig implementations as well as C contenders: https://github.com/coilhq/tiger-beetle/tree/master/demos/io_uring Some highlights:
Thanks to @masterQ32 for writing the blocking networking echo server candidate. Please take these benchmarks with a pinch of salt and let us know what can be improved! |
This now supports the
With the exception of But we also go further with networking syscalls that no longer need epoll thanks to
And tests for everything. |
pub fn cq_advance(self: *IO_Uring, count: u32) void { | ||
if (count > 0) { | ||
// Ensure the kernel only sees the new head value after the CQEs have been read. | ||
@atomicStore(u32, self.cq.head, self.cq.head.* +% count, .Release); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want an atomic rmw here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe this is already exactly what @axboe does in liburing? Could you explain why you would want something else? If so, then that's probably a bug in liburing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To understand why we use @atomicStore
here:
The cq.head
pointer is owned by the application, not the kernel.
The way this works is that:
- the kernel only ever pushes to the end of the CQ ring by incrementing the
cq.tail
pointer, which is owned by the kernel, and - the application only ever shifts from the front of the CQ ring by incrementing the
cq.head
pointer, again owned by the application.
Thus, the kernel only reads cq.head
(and never writes), and the application only reads cq.tail
(and never writes). It's symmetric, and the same logic is true for the SQ ring, but inverted.
This means that the application is free to read and then increment cq.head
here anytime without an atomic read/modify/write, since the application is the only process that will write to cq.head
when it shifts from the queue.
The reason then that we use the @atomicStore
here is because the CPU can reoder memory accesses, i.e. the kernel might read the newly written cq.head
and then overwrite CQEs whose memory we are still reading.
What we are saying is that the kernel should only see the store to cq.head
after the CQEs involved have been read.
If an older kernel fails the `openat` test because of `AT_FDCWD` then we don't want to skip the `close` test.
This is super important for writing a web server that can take 1st place in techempower benchmarks. |
Thanks @jorangreef and everyone who helped review. My goal is to do whatever fixups are needed to this today and get it merged into master branch. |
Nice, this is already mergeable. Great work everyone. |
Thanks @andrewrk. Awesome. |
As per: lib/libc/musl/arch/mips/bits/syscall.h.in ...and as promised: ziglang#6356 (comment) Thanks @daurnimator again for the help with ziglang#6356.
This brings io_uring helper methods to Zig for kernels >= 5.4.
We follow liburing's design decisions so that anyone who is comfortable with
liburing (https://unixism.net/loti/ref-liburing/index.html) will feel at home.
Thanks to @daurnimator for the first draft.
Refs: #3083
Signed-off-by: Joran Dirk Greef joran@coil.com