-
-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Win32 event loop #11776
WIP: Win32 event loop #11776
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I've noticed can create invalid fibers is if a fiber is enqueued that has already run to completion. Are you properly cleaning up all pending events?
|
||
Crystal::Scheduler.reschedule | ||
|
||
Crystal::EventLoop.dequeue(timeout_event) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't run_once
already dequeue? If not, seems weird that it dequeues sometimes, but not always.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
run_once
deques when a timeout happens. This operations handles the case when there's no timeout, i.e. the operation completes in time and the fiber resumes. Then we need to dequeue the timeout event.
overlapped_entries = uninitialized LibC::OVERLAPPED_ENTRY[1] | ||
|
||
if timeout > UInt64::MAX | ||
timeout = LibC::INFINITE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not relevant right now probably as this is the single threaded version of crystal (right?), but at some point this would probably would be clamped downwards towards a short duration rather than towards infinity. This because otherwise if a fiber is woken up from waiting on a channel, then if the fiber is assigned to a thread that is in an infinite wait won't process the woken fiber until the completion return. Alternatively there could be some way to way to wake a specific waiter. I've seen a way to break out of the completion wait but only for any arbitrary waiter when i looked up how to do Thread#notify
on win32, not for a specific thread. But perhaps there are some better way.
end | ||
|
||
# Frees the event | ||
def free : Nil | ||
Crystal::EventLoop.dequeue(self) | ||
end | ||
|
||
def delete | ||
free |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here is another dequeue
. From where is this called?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's called from Fiber#cancel_timeout
which is used for the select
implementation.
I could get all the specs passing by allocating diff --git a/src/io/overlapped.cr b/src/io/overlapped.cr
index 0d0ea2cc0..da4fb6c83 100644
--- a/src/io/overlapped.cr
+++ b/src/io/overlapped.cr
@@ -81,7 +81,6 @@ module IO::Overlapped
false
end
- @[Extern]
struct OverlappedOperation
@overlapped : LibC::WSAOVERLAPPED
@fiber : Void*
@@ -98,12 +97,12 @@ module IO::Overlapped
def create_operation
overlapped = LibC::WSAOVERLAPPED.new
- OverlappedOperation.new(overlapped, Fiber.current)
+ Box.new(OverlappedOperation.new(overlapped, Fiber.current))
end
def get_overlapped_result(socket, operation)
flags = 0_u32
- result = LibC.WSAGetOverlappedResult(socket, pointerof(operation).as(LibC::OVERLAPPED*), out bytes, false, pointerof(flags))
+ result = LibC.WSAGetOverlappedResult(socket, pointerof(operation.object.@overlapped), out bytes, false, pointerof(flags))
if result.zero?
error = WinError.wsa_value
yield error
@@ -132,7 +131,7 @@ module IO::Overlapped
def overlapped_operation(socket, method, timeout, connreset_is_error = true)
operation = create_operation
- result = yield pointerof(operation).as(LibC::OVERLAPPED*)
+ result = yield pointerof(operation.object.@overlapped)
if result == LibC::SOCKET_ERROR
error = WinError.wsa_value
@@ -157,9 +156,9 @@ module IO::Overlapped
def overlapped_connect(socket, method)
operation = create_operation
- yield pointerof(operation).as(LibC::OVERLAPPED*)
+ yield pointerof(operation.object.@overlapped)
- schedule_overlapped(read_timeout || 1.seconds)
+ schedule_overlapped(read_timeout || 5.seconds)
get_overlapped_result(socket, operation) do |error|
case error
@@ -177,7 +176,7 @@ module IO::Overlapped
def overlapped_accept(socket, method)
operation = create_operation
- yield pointerof(operation).as(LibC::OVERLAPPED*)
+ yield pointerof(operation.object.@overlapped)
unless schedule_overlapped(read_timeout)
raise IO::TimeoutError.new("accept timed out") Though I don't know if making |
Now if I run the specs with
For some reason the stack trace doesn't show here, hence the |
I just wanted to share some insights that I gained during debugging this PR. LibC::OVERLAPPEDOn the Heap
On the StackEven allocating If the current fiber is leaving ProposalSee the latest commit on my branch win32-event_loop which is based on feature/win32-event_loop-wip with a rebase to master. I introduced a global hash
For multi threaded environments Dead fibersAfter these changes, I had a lot of problems to run Hopefully my insights will be helpful in moving the event implementation on Windows forward. |
@wonderix Good to see someone looking into this. However, the whole point of So Dead fibers being queued up sounds like the runtime is trying to wake up a fiber that has already been woken up. That absolutely must never happen. My guess are timeouts (or sleeps?) - are perhaps not all fibers removed from the event loop timeout queue when they are activated? If not, then that would mean they would get woken up again once the timeout hits, and that would trigger all described symptoms. Figuring out where this happens can be really challenging though. |
The normal case looks like this: But I'm quite sure that That's the point where things get complicated. In this case the In the mean time, I realized that my implementation is also not save in this point. Perhaps it's necessary to allocate The dead fiber problem is probably caused by my rebase on master. I will try to describe this in more detail. The fiber is started in channel_spec.cr with a timeout condition |
I had a look at the ruby code. They are cancelling requests using After this, I used a different approach, which is closer to the original PR. I'm now quite happy with this approach. The only thing, which needs to be fixed is the The changes are available as last commit on win32-event-loop-2. Unfortunately, I'm not able to run all specs on windows. > .\bin\crystal build --verbose spec/win32_std_spec.cr
Error: while requiring "./std/llvm/aarch64_spec.cr" The 2 specs mentioned in the comments are working fine @straight-shoota, how should we proceed? Would you like to cherry-pick my changes? Should I open a PR including your changes? |
That's awesome! Great to see this basically resolved. We might still need some enhancements, but now we have a working condition to start from. At this point I'd prefer to open a fresh PR. |
Just to avoid confusion, it's not clear who should open the PR. (or maybe yes? Does the "I'd" means you are going to do it, straight-shoota?) |
Good point 😆 |
That's absolutely OK for me. |
This patch adds an implementation of the event loop for windows.
It seems to be working in general, but there is some bug which I haven't figured out yet. I've had this branch sitting around for a long time but have not been able to finalize it. I'm posting it here to maybe get a set of fresh eyes on it. Maybe someone else would be able to spot what's going on.
More detailed history and some debug output is available in the branch on my fork: https://github.com/straight-shoota/crystal/tree/feature/win32-event_loop-wip
This is the example debug output from
tcp_socket_spec
. Execution stops while printing a fiber's inspect. The reason seems to be that the fiber value is invalid. It is boxed inOverlappedOperation#@fiber
for passing around to the Win32 API. For some reason, it returns an invalid value.