-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix hang in KqueueExecutorScheduler
#65
Conversation
if (!noCallbacks) { | ||
val timeoutSpec = | ||
if (timeoutIsInfinite || timeoutIsZero) null | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if there are no callbacks, but there is 0 < timeout < infinity
? In that case poll
should still sleep for the requested time. This is not necessary for correctness, but it is important for performance (if we don't sleep, we are busy-looping). That is why the test suite currently doesn't catch this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the timely review and tag-team development.
re: 0 < timeout < infinity
& busy looping
Isn't prevention of busy looping the responsibility of the caller? I do not know
that scheduler code. What is the usually timeout? Zero, something else.
The hang that I saw was large for a scheduler. I know I let it run two or
three minutes whilst I did code fixes in another window. In a debug
branch I can put some code in to see if it is 0 (which can mean either
return immediately (do not wait), or wait-forever) or a largish number.
I thought that poll() was supposed to minimize wait and return to
the caller, so the caller could schedule something else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, perhaps the documented contract for poll
needs to be clarified.
I thought that poll() was supposed to minimize wait and return to
the caller, so the caller could schedule something else.
When the caller invokes poll(...)
with a timeout of x
seconds, it means that the caller has no work scheduled for x
seconds and thus it is fully delegating to poll(...)
. If poll
returns before timeout
(and no new work was scheduled due to an I/O event), then the caller will simply invoke poll(...)
again for the remaining time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The event-loop algorithm is implemented here.
https://github.com/typelevel/cats-effect/blob/7ca03db50342773a79a01ecf137d953408ac6b1d/core/native/src/main/scala/cats/effect/unsafe/PollingExecutorScheduler.scala#L77-L123
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That info helps. The poll is being called with INFINITE timeout, 2 events change, and 1 callback.
There was a open & close which came and went before poll() got called. I suspect the math
is not working out and an extra event-change is sneaking in.
I am going to move this to WIP, do more tracing, and read the URL you posted.
I'll post the provoking code in the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You are right. I believe your general approach is correct, we just need to adjust the conditional to account for the case when 0 < timeout < infinity
in which case it should proceed with the kevent64()
call.
I was able to replicate this in a very simple test. Give it a try, and if it reproduces the issue for you as well, I propose we add it to the test suite. test("immediately closing a socket does not hang") {
IOSocketChannel.open.use_
} |
Skipping over for the moment, but not ignoring. Please see next reply.
Thank you for the succinct test. I added it to I had gotten 90% of the way towards submitting the |
To synchronize: Should I proceed with this WIP and add the code which you suggested, still as a WIP? Should I put this on the backburner, or out to feed the dog because another approach Please advise. I am invested in overall progress, not any one particular WIP/PR. Given what you have said, I think I can work around this for the "IPv6 initialization" work I will read the "event-loop algorithm" URL, hopefully this weekend. |
This is a legitimate issue, I'd like to fix it and get a patch out. If you don't mind I'll take the steering wheel on this one since it's interacting with the event loop. Do you mind if I push to your branch or shall I open a new PR? |
Btw thanks for manually testing against the matrix, that was really helpful. Always good to have a test in place. |
Understood & agreed, you have the conn (phew!). Time for the A team. I will end this recursive edit and return to "IPv6 initialization".
Since this is WIP and scratch work, I will stop pushing to it and hand it over to you. We probably concur that the final submission probably wants to be a "clean" PR. |
A "grasshopper" question, if you know of the top of your head. Not worth you spending time else-wise. Where do I find the documentation for ".use" and ".use_". I have been searching the "cats effect 3.0" and "fs2" You had given me the location for ".async_". I stumbled across it today in plain sight and bold as brass in |
I added the test for ServerSocket because it causes a |
They are methods on
No problem, great addition!
Eh, I don't really care. I think it's good to preserve work-product so we can retrace our steps more easily. |
re: kernel/Resource.scala URLs Thank you. That is what I needed to get unstuck. Helps a number of pieces |
KqueueExecutorScheduler
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😏
Arman, Thank you for sorting this out, and so quickly. |
Thank you for catching this bug and your initial investigation! I'm glad we could get a patch out. |
Fix #64
KqueueExecutorScheduler#poll no longer hangs when there is no work to do.
The fix was actually to check for the existence of callbacks, rather than of
new events.
Manually tested on macOS M1, Monterrey 12.6 with test-cond-in-development
for Issue #52. I will be exercising this fix as I continue to work on Issue #52.