-
-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: race condition between heartbeat and releasing lock #35
base: master
Are you sure you want to change the base?
fix: race condition between heartbeat and releasing lock #35
Conversation
Releasing of a lock could fail (putItem would fail the condition) when there was a heartbeat running at the same time. The release of a lock now waits until the heartbeat is finished.
Thank you @bfraterman-tkhsecurity . What you described does look like a valid race condition for FailOpen lock, which can cause longer wait times (maximum There is no problem with FailClosed lock because there are no heartbeats on those. I think what we need is a semaphore. Now that you've pointed out the race condition, it is insufficient for release to wait for heartbeat, release and heartbeat can only happen one at a time. Failure case interleaving with release only waiting for heartbeat to start:
|
Basically, the The failure scenario that you describe won't happen in this PR. Node is single-threaded, so no 2 pieces of code will run simultaneous. But whenever a So the problem in the original code was:
In the PR this is fixed by the following:
But maybe the code would become better readable when using something like https://www.npmjs.com/package/async-mutex. What do you think? And do you want me to refactor the code to be able to test this in a unit test? This will grow the PR considerably I think, since all heartbeat and releasing code will have to be split up and moved around. |
Now that AWS is deprecating the V2 SDK, we're looking at migrating to the V3 SDK. I wonder if/how this PR can be merged? Otherwise we'll have to keep using our fork for our project. |
I got distracted and haven't come back to this repo since my last comment. With a migration to v3, it probably makes sense to use your fork. I don't intend to spend more time on this module in the near future, so it may be worth publishing your v3 migrated fork for the community. Thank you again for the PR. Cheers |
Releasing of a lock could fail (putItem would fail the condition) when
there was a heartbeat running at the same time.
When the 2 calls to dynamoDB happen synchronously, the GUID used by the heartbeat would be put into dynamodb, while the previous GUID would be used to release the lock. Releasing of the lock would therefore fail. I've only tested this for FailOpen locks, where there were longer waits because of this.
Perhaps with FailClosed locks the problem would be even worse.
The release of a lock now waits until the heartbeat is finished, using a Promise.
Since it relies on very specific timing it's hard to write a unit test for this.