-
Notifications
You must be signed in to change notification settings - Fork 200
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SqlDistributedLock timeout issue? #5
Comments
@clement911 Thanks for your interest in the library! I have used this technique successfully for relatively long-running operations. I've even seen cases where a bug caused a lock to be held for days (to my dismay!). Obviously, if the database were shut down then you would lose the lock. In your case, the example code looks fine (although note that you can use This article lists out some of the Azure limitations that could be related:
In this case, it seems like you might be running into the "idle" for 30 minutes limitations, although I'm not sure what the exact definition of "idle" would be. The 24-hour limitation also seems potentially problematic because of .NET connection pooling (although maybe the Azure pooling works around this for you under the hood somehow). One approach would be to try out some of the different connection management options (https://github.com/madelson/DistributedLock#connection-management, introduced in 1.2) and see if the problem is specific to the transaction approach that was the default in 1.1. You might be able to avoid the idle issue by using a lock scoped to an explicit SqlConnection object and launching an async task which pings on the connection periodically to prevent it from going idle. The thing to be careful of is to make sure the polling task finishes BEFORE we attempt to dispose the lock, since SqlConnections are not thread-safe. Something like:
Please let me know if you try any of these and what you learn. For example, if the polling solution fixes the problem then we could easily add a new Azure-specific connection strategy as a built-in library feature. |
You put me on the right track and I found this: This pretty much confirms your theory of the sql azure 30 minute idle limit, and they fixed it in Hangfire pretty much the same way that you suggested. I guess that DistributedLock could use the same approach. The trippy thing is this comment:
If the connection is broken, then the lock is released and there is not much that can be done, but I think there should at least be an event that should be issued, so that, as a user of the app, I can trace this error. If we wanted to be smarter we could try to re-acquire the lock automatically when it is lost, but I think that's probably a can of worms to open later. What do you think? |
@clement911 it's great to have that Hangfire example to confirm the theory. Were you able to try out the workaround code in your codebase to see if that fixed the issue for you (assuming it is reproducible)? I definitely think it makes sense to support this at least via connection strategy. I'm more hesitant to support this by default since running extra queries and background threads adds overhead that, until your use-case, I hadn't seen the need for. I think it probably makes sense to start by making this an option with the possibility of making it default over time (I'm taking a similar approach with the new ConnectionMultiplexing strategy which is probably a performance win most of the time). At minimum, locks created with explicit connections or transactions won't be able to do keepalive without violating thread-safety (since presumably the caller is continuing to use their connection as well). FWIW, this is the approach MSFT took for EF's Azure connection resiliancy. An alternative idea would be to turn this behavior on based on detecting Azure connection strings. As someone who hasn't worked with Azure, I'm not sure if this is possible or not. Is there any identifying characteristic of azure connection strings? As to your comment about wanting to know if the lock is dropped out from under you, I agree that this is a concern and it is something I have thought about in the past. As you say, re-acquisition is pretty risky. We may already have lost the lock, and even if we do successfully re-aquire there's no guarantee that someone else wasn't holding it in the meantime. Even with pure notification, there are challenges:
|
No I haven't tried this work around yet, but I'd stay it is likely to work. You may be right, let's have it as an opt-in strategy for now. Regarding detecting azure connection string, I'm not too sure.
|
Unfortunately, the StateChanged event doesn't do what you'd want it to: it just fires when you call Open() or Close() on the connection object itself (see http://stackoverflow.com/questions/37442983/when-is-dbconnection-statechange-called). Pretty frustrating. I'll try to get a new version out with an option for keepalive. As mentioned, supporting connection monitoring would require a breaking API change and thus has to wait for the next major version (hopefully not too far in the future!). In the meantime, it would be possible to try implementing a monitoring layer outside the library by passing in a custom SQL connection (similar to the keepalive workaround I showed above). |
Shame about the ConnectionChanged event... |
@clement911 I've released a new 1.3 version containing a keepalive implementation. This can be used by passing in the For the connection monitoring API, I've created a separate issue to track (#6). |
You're a legend! The code looks really good too. |
I haven't hit that issue since I deployed to prod so I guess it works well. |
Glad it's working and thanks for letting me know. |
Hi there.
Thank you for creating a cool library!
Good code too...
I use it in my web app with sql azure to make sure that certain long running operations can only be run once at a time.
It's been working fine but I just had a weird case and I thought maybe you may know something about it.
I have a block like this:
doLongRunningOperationAsync can take quite a long time and I had a case in production where 2 requests managed to both acquire the exclusive lock!
The second one acquired the lock about 45 minutes after the first one had acquired the lock (it can take over an hour for this operation to complete...)
I use version 1.1.0.0 and as I understand it, when providing a connection string, an new connection and transaction will be created and the transaction will be the owner of the lock.
So it got me thinking. Maybe transactions have a timeout after which they are automatically closed? Maybe even the connection? In particular, sql azure can suffer transient issues and it is always a best practice to retry operations since they may fail because of temporary internal azure stuff.
So long story short, is it ok to use the SqlDistributedLock for long running operations???
The text was updated successfully, but these errors were encountered: