-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Event Loop #14996
base: master
Are you sure you want to change the base?
Event Loop #14996
Conversation
While trying to integrate the evloop with the experimental shard for RFC 2, I noticed a race condition between I solved it with the requirement that to resume an IO event with a timeout we must successfully dequeue the event from both queues (poll descriptor waiters and timers) and a bias on timers: they always win. This allowed to keep the Notes:
|
This pull request has been mentioned on Crystal Forum. There might be relevant details there: https://forum.crystal-lang.org/t/new-event-loop-unix-call-for-reviews-tests/7207/1 |
I just noticed one difference in I'm not sure if this has any impact on this evloop. If the |
I'm unable to build this on Debian Sid with full log
FWIW, I can build up to 2821fbb just fine and I have plenty of available memory while building this branch
edit: (it builds successfully with |
Sadly the log doesn't tell much because this is a runtime error in a macro run (to generate Crystal code from ECR). The only new |
Can't remember if I set it that high or if it was the default :/ I did some debugging of the macros if it helps at all, new ev, evloop_libevent edit: here's the diff (though it'd be easier to view with Meld) |
The large number of fds is the problem. The new evloop is trying to virtually allocate ~40GB of memory and refuses to. What's weird is that this is only reserved memory, not allocated memory 🤔 It's also working on macos despite the hardware limit being infinite (capped to Int32::MAX). What's the software limit for |
|
@GeopJr I made a change to select the soft limit instead of the hard limit. That should workaround the mmap issue —until we improve the arena to allocate individual blocks dynamically, instead of a single region at once. |
Can confirm that it compiles successfully, thanks! (Can also confirm that I didn't face any issues with the new ev + GTK) |
While integrating with the execution_context shard I identified a race condition in |
Keeps information about the event that a fiber is waiting on, can be a time event and/or an IO event.
Keeps waiting reader and writer events for an IO. The event themselves keep the information about the event and the associated fiber.
A simple, unoptimized, data structure to keep a list of timed events (IO timeout, sleep or select timeout).
The foundation for the system specific epoll and kqueue event loops.
Specific to Linux and Android. It might be working on Solaris too through their Linux compatibility layer.
For BSD and Darwin.
This is now only required for the libevent event loop, and the wasi pseudo event loop.
Also includes the `:evloop_libevent` flag to fallback to libevent.
We can merely return the dequeued variable, which is the intent of the method: to report whether the timer was dequeued (and thus own) or not.
32f34c5
to
375253c
Compare
While trying the evloop with execution contexts on freebsd I noticed that Now this is working. In FreeBSD at least. |
We got a report from LavinMQ that an evloop run can crash with an There's a race condition to understand. It is likely that T1 gets events or a timer about |
Fixes race conditions in the evloop when another thread closes the fd, and thus frees the poll descriptor (waitlists): - After waiting for events and processing the events: processing events and timers would fail to get the index and raise an exception. - After trying to read or write but before we could actually get the poll descriptor: we'd fail to get the index and raise an exception. - After trying to read/write and after we allocated the poll descriptor, but before we could wait: we'd be adding to a freed poll descriptor. - Resetting the arena index on IO (__evloop_data) allowed a parallel wait to re-allocate the poll descriptor (despite the IO being closed). - ... Preventing parallel accesses to an arena object may be a bit drastic, but at least we know for sure that we don't have any parallelism issue. It also allows to drop the lock on poll descriptor as well as on each waiting list: we don't need them anymore. A disadvantage is that parallel read/write/close and evloop process event/timeout may conflict... though they already competed for the 3 aforementioned locks.
The last issue was fixed in 6985080. Please see the commit description for details. The patch looks more complex than it really is: it's mostly indentation changes because all the public arena methods now yield instead of returning a value + the locks got removed in Waiters and PollDescriptor. |
Avoids an issue with the timecop shard that overrides Time.monotonic.
We must cast the actual LibEvent types because the type signatures return the abstract Crystal::EventLoop interface that don't implement the necessary methods.
@straight-shoota the check format CI action fails, but I can't reproduce on my local (neither with crystal 1.14.0 nor a compiler from this branch) 😕 |
The failure happens on the |
Almost identical to #14959 but with cleaner history and more documentation that led me to identify issues in the arena.
Overall design
The logic of the event loop doesn't change much from libevent: we try to execute an operation (e.g. read, write, connect, ...) on nonblocking file descriptors; if the operation would block (EAGAIN) we create an event that references the operation along with the current fiber; we eventually rely on the polling system (epoll or kqueue) to report when an fd is ready, which will dequeue a pending event and resume its associated fiber (one at a time).
Unlike libevent that will add and remove the fd to and from the polling system every time we'd block, this event loop adds it once and will only remove it when closing the fd. The theory is that being notified for readiness (once thanks to edge-triggered) is less expensive than always modifying the polling system. In practice a purely IO-bound benchmark with long running sockets, we notice up to 20% performance improvement. Actual applications should see less improvements.
Implementation details
Unlike the previous attempts to integrate epoll and kqueue directly that required a global events object and a global lock to protect it, this PR only needs fine-grained locks for each IO object and operation (read, write) to add the events to the waiting lists. In practice, the locks should never be contented (unless you share a fd for the same read or write operation in multiple fibers).
Caveat: timers are still global to the event loop, and we need a global lock to protect it. This means that IO with timeout will still see contention. Improving the timers data structure will come later (e.g. lock free, more precise, faster operations).
To avoid keeping pointers to the IO object that could prevent the GC from collecting lost IO objects, this PR introduces "poll descriptor" objects (the name comes from Go's netpoll) that keep the list of readers and writers and don't point back to the IO object. The GC collecting an IO object is fine: the finalizer will close the fd and tell the event loop to cleanup the
fd resourcesassociated poll descriptor (so we can safely reuse the fd).To avoid pushing raw pointers into the kernel data structures, and to quickly retrieve the poll descriptor from a mere fd, but also to avoid programming errors that would segfault the program, this PR introduces a Generational Arena to store the "Poll Descriptors" (the name is inherited from Go's netpoll) so we only store an index into the polling system. Another benefit is that we can reuse the existing allocation when a fd is reused. If we try to retrieve an outdated index (the allocation was freed or reallocated) the arena will raise an explicit exception.
The poll descriptors associate a fd to an event loop instance, so we can still have multiple event loops per processes, yet make sure that an fd is only ever in one event loop. When a fd will block on another event loop instance, the fd will be transferred automatically (i.e. removed from the old one & added to the new one). The benefits are numerous: this avoids having multiple event loops being notified at the same time; this avoids having to close/remove the fd from each event loop instances; this avoids cross event loop enqueues that are much slower than local enqueues in RFC 2.
A limitation is that trying to move a fd from one evloop to another while there are pending waiters will raise an exception. We can't move timeout events along with the fd from one event loop instance to another one, but that would also break the "always local enqueues" benefit.
Most application shouldn't notice any impact because of this design choice, since a fd is usually not shared across fibers (concurrency issues), except maybe a server socket with multiple accepting fibers? In that case you'll need to make sure the fibers are on the same thread (
preview_mt
) or same context (RFC 2).Availability
We may want to have a compile time flag to enable the new event loop before we merge? For the time being this PR uses the new shiny evloop by default (to have CI runs) and introduces the
:evloop_libevent
flag to fallback to libevent.TODO
A couple changes before merge (but not blocking review):
-Devloop=libevent
to return to libevent (in case of regressions or to test/benchmark);-Devloop=epoll
to useepoll
(e.g. Solaris);-Devloop=kqueue
to usekqueue
(on *BSD);epoll_wait
orkevent
with infinite timeout;#try_run?
and#try_lock?
methods that are no longer needed.REGRESSIONS/ISSUES
Timers are noticeably slower than libevent (especially
Fiber.yield
): we should consider a minheap (4-heap) or a skiplist;DragonFlyBSD: running
std_spec
is eons slower than libevent; it regularly hangs on evloop.run until the stack pool collector timeout kicks in (i.e. loops on 5s pauses);OpenBSD: running
std_spec
is noticeably slower (4:15 minutes) compared to libevent (1:16 minutes); it appears that the main fiber keeps re-enqueueing itself from the evloop run (10us on each run);NetBSD: the evloop doesn't work with
kevent
returning ENOENT for the signal loopfd
and EINVAL when trying to set anEVFILT_TIMER
.For the reasons above the kqueue evloop is disabled by default on DragonFly(BSD), OpenBSD and NetBSD.