-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashedWheelTimer
startup crash on .NET 6+
#7174
HashedWheelTimer
startup crash on .NET 6+
#7174
Conversation
More issues with the `PeriodicTimer` - this time it's a bit of a spooky heisenbug. Added a spec that can reproduce it but it must be run continuously in order to catch it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe our problems were related to the combination of Interlocked
and volatile
both trying to access the same field across multiple cores - but I don't think I can prove it other than to say that since I made this change to the HashedWheelTimerScheduler
I've run this test continuously for over 30 minutes (it runs about once a second) without any errors. Previously, it would fail in about 5 minutes.
I'll keep the thing running for a while longer and report back if the problem isn't fixed, but I think this should resolve it.
Took a while but was able to make this crash even without the edit: came up with a safer, less complicated solution |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed this change by doing only one volatile
read at the start of the if
statement - been running my test case at a rate of once per second for about 2 hours and haven't seen the problem occur since.
// </copyright> | ||
// ----------------------------------------------------------------------- | ||
|
||
#nullable enable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Enabled nullable
for just this test - on my quest to ensure that we start doing that generally everywhere.
@@ -147,25 +147,27 @@ private static int NormalizeTicksPerWheel(int ticksPerWheel) | |||
|
|||
private void Start() | |||
{ | |||
if (_workerState == WORKER_STATE_STARTED) | |||
// only read the worker state once so it can't be a moving target for else-branch | |||
var workerStateRead = _workerState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the key fix - cache the value once so we're not doing a volatile
read on each branch of the if
..else
- that's what's lead to the problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
The solution is good for merge now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@@ -147,25 +147,27 @@ private static int NormalizeTicksPerWheel(int ticksPerWheel) | |||
|
|||
private void Start() | |||
{ | |||
if (_workerState == WORKER_STATE_STARTED) | |||
// only read the worker state once so it can't be a moving target for else-branch | |||
var workerStateRead = _workerState; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Can't check the box from mobile but looks good
…On Fri, Apr 26, 2024, 5:50 PM Gregorius Soedharmo ***@***.***> wrote:
***@***.**** approved this pull request.
LGTM
------------------------------
In src/core/Akka/Actor/Scheduler/HashedWheelTimerScheduler.cs
<#7174 (comment)>:
> @@ -147,25 +147,27 @@ private static int NormalizeTicksPerWheel(int ticksPerWheel)
private void Start()
{
- if (_workerState == WORKER_STATE_STARTED)
+ // only read the worker state once so it can't be a moving target for else-branch
+ var workerStateRead = _workerState;
LGTM
—
Reply to this email directly, view it on GitHub
<#7174 (review)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AC7UYVP5YCOMXRQ6AJLPWP3Y7LD3VAVCNFSM6AAAAABG3ICIBKVHI2DSMVQWIX3LMV43YUDVNRWFEZLROVSXG5CSMV3GSZLXHMZDAMRVHEZDAMZUG4>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
this was wrong, btw - looks like it was the concurrent read / write access to volatile causing us to jump into the |
Not related to my changes - I have an open PR for investigating this one at home. |
Port akkadotnet#7174 to .NET Standard 2.0
Changes
More issues with the
PeriodicTimer
- this time it's a bit of a spooky heisenbug. Added a spec that can reproduce it but it must be run continuously in order to catch it.Originally reproduced at petabridge/TurboMqtt#55
Checklist
For significant changes, please ensure that the following have been completed (delete if not relevant):