-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Setting event priorities to work around level based events #11829
Comments
cc @antoniovicente to advise. My general preference would be to not make things more complicated for the rest of the system because windows doesn't support edge triggered, so would love to see if we can keep this within the windows code until windows does support edge triggered. |
I'm wondering is the actual problem is ConnectionImpl::onWriteReady() pushing back the delayed_close_timer_ on every out event processed even in cases where onWriteReady didn't make forward progress. I'm not sure that the addition of priorities would help and I'm worried that it would open up a big can of worms. I guess we could use a high priority for all timers so timers always run before fd events, but that's likely to expose some bugs in libevent library internals since I think that the event loop does not go back to invoke high priority events added by lower priority events in the current event loop iteration. |
Makes sense to me
That is my understanding of the problem. You can also replicate the issue by setting the event as level on Unix
I did a quick and dirty implementation using priorities and the particular scenario worked. I did not go through seeing how this would work throughout Envoy as I felt this was a "risky" change and I wanted feedback before implementing it. Would any particular tests or validation make you feel more comfortable in that change? One question that I still have is the importance of the scenario here. Do customers or major flows rely on this feature to work? |
See if this patch helps: $ git diff |
Thanks for the suggestion, I just tried it out and it does not seem to fix the issue. |
I think the use of priorities would serve as a work around for this specific test, but not address the underlying issue. Real fd Write events which are delivered in LEVEL mode past the point of the final flush refreshing the delayed close timer forever, resulting in a leak of server connections if delayed write timeouts are configured. Not resetting the timer in cases where doWrite did no useful work should help. Changing the fd registration to stop listening for Write events after flush in DelayedCloseState::CloseAfterFlushAndWait could also help and be more performant in LEVEL mode. But fd LEVEL mode performance may never be amazing when handling large numbers of connections. I worry that some future possible changes may improve edge-trigger performance at the expense of level-trigger performance (example draft change: master...antoniovicente:optimize_read_disable ). Is edge-trigger support a possible long term option?
The introduction of priorities would add another layer of complexity to Envoy's use of libevent. I haven't looked at libevent closely enough to describe all implications of priorities or the scheduling order of high-priority callbacks vs low-priority callbacks vs fds vs timers. I worry about the potential for a low-priority callback scheduling a high-priority callback resulting in the high-priority callback not executing until the next iteration of the event loop even though execution in the current iteration is necessary. |
After thinking about this some more, I realized that there is an inconsistency in behavior of real timers and simulated timers. SimulatedTimeSystem::setMonotonicTime adds expired timers directly to the work list, so they end up executing before fd events in the event loop. Real timers execute after fd events. I think that the PR that made this test start misbehaving on Windows was the fix to some edge case behavior of simulated time in #11563; after that PR both timer and fd orderings would break in the presence of spurious Write events. Ideally simulated time timers would execute after fd callbacks. I think I can achieve that by delaying the computation of which timers have expired until after fd events are added to the work list. This can be done by scheduling a current iteration callback to add expired timers instead of adding expired timers to the event loop directly in SimulatedTimeSystem::setMonotonicTime |
Thanks for the answers, all of them make sense to me.
My understanding is that it will be in the future but we would have to work with level based events for some time.
This part of the code (before then patch) also does not work due to the level based triggers. In this case the timeout event gets activated before the read event. As a result
I could try this out and report back if it fixes the original issue. |
It would be good to know what change made the test start failing. I'm guessing it was #11563 which fixed an issue with simulated timer re-scheduling. |
…ytes from the output buffer. (#11833) Commit Message: connection: Do not reset delayed closed timer if doWrite consumes 0 bytes from the output buffer. Additional Description: Only reset the delayed close timer if the write attempt made progress. This works around spurious fd Write events which are delivered to a connection even after it manages to fully drain the output buffer and should be waiting for client FIN or the delay close timer to expire. This is known to happen when listening for level events instead of edge-trigger fd events. Risk Level: medium, changes to timeout behavior. Testing: unit Docs Changes: n/a Release Notes: n/a Fixes #11829 Signed-off-by: Antonio Vicente <avd@google.com>
…ytes from the output buffer. (envoyproxy#11833) Commit Message: connection: Do not reset delayed closed timer if doWrite consumes 0 bytes from the output buffer. Additional Description: Only reset the delayed close timer if the write attempt made progress. This works around spurious fd Write events which are delivered to a connection even after it manages to fully drain the output buffer and should be waiting for client FIN or the delay close timer to expire. This is known to happen when listening for level events instead of edge-trigger fd events. Risk Level: medium, changes to timeout behavior. Testing: unit Docs Changes: n/a Release Notes: n/a Fixes envoyproxy#11829 Signed-off-by: Antonio Vicente <avd@google.com> Signed-off-by: scheler <santosh.cheler@appdynamics.com>
Description
Currently Windows does not support edge based events. As a result we rely on level based events which causes various issues.
For example in testlist //test/common/network:connection_impl_test
FlushWriteAndDelayCloseTimerTriggerTest
the scenario that depends on setting asetDelayedCloseTimeout
on the server.This event on Windows is added to the queue in Windows but it gets buried by the Level write events. Some logs that indicate this behavior are here:
This causes the test to hang and exposes a limitation for envoy on windows, to set
Scheduled Callbacks
andTimer
events.Proposed Solution: Set event priorities
We could utilize libevent
event_base_priority_init
to create more granular priorities on jobs.Create an enumeration with priorities and use it when we create new
SchedulableCallbackImpl
This behavior could be limited only to Windows (or to both if we would think it has wider value)
Questions
Timers
andScheduled Callbacks
?@nigriMSFT @sunjayBhatia @wrowe
The text was updated successfully, but these errors were encountered: