DISK IO wait due to the millions of update statement #1962

sfung-absorb · 2021-10-29T21:20:59Z

We update from 1.7.16 to 1.7.25 and have noticed high waits on IO. Any ideas?

We update our SQL storage options per recommendations to these settings:

CommandBatchMaxTimeout = TimeSpan.FromMinutes(5),
SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,
PrepareSchemaIfNecessary = prepareSchema,
UseRecommendedIsolationLevel = true,
UsePageLocksOnDequeue = true,
DisableGlobalLocks = true

Previously, was these settings:
QueuePollInterval = TimeSpan.FromSeconds(queuePollInterval),
PrepareSchemaIfNecessary = prepareSchema,
UseRecommendedIsolationLevel = true,
UsePageLocksOnDequeue = true,
DisableGlobalLocks = true

Our DPA has captured this as the cause of high waits on IO:

/* (comment inserted by DPA)
Character Range: 308 to 595
Waiting on statement:

UPDATE top (1) JQ
SET FetchedAt = GETUTCDATE() output INSERTED.Id,
INSERTED.JobId,
INSERTED.Queue,
INSERTED.FetchedAt
FROM [HangFire].JobQueue JQ
WITH
(
forceseek,
paglock,
xlock
)
WHERE Queue in (@queues1)
AND (FetchedAt is null
OR FetchedAt < DATEADD(second, @Timeoutss, GETUTCDATE()))

/
(@queues1 nvarchar(4000),@Timeoutss int,@delayms int,@ENDms int)
set nocount on;
set xact_abort on;
set tran isolation level read committed;
declare
@EnD datetime2 = DATEADD(ms, @ENDms, SYSUTCDATETIME()),
@delay datetime = DATEADD(ms, @delayms, convert(DATETIME, 0));
WHILE (SYSUTCDATETIME() < @EnD)
BEGIN
/ BEGIN ACTIVE SECTION (comment inserted by DPA) /
UPDATE top (1) JQ
SET FetchedAt = GETUTCDATE() output INSERTED.Id,
INSERTED.JobId,
INSERTED.Queue,
INSERTED.FetchedAt
FROM [HangFire].JobQueue JQ
WITH
(
forceseek,
paglock,
xlock
)
WHERE Queue in (@queues1)
AND (FetchedAt is null
OR FetchedAt < DATEADD(second, @Timeoutss, GETUTCDATE()))
/ END ACTIVE SECTION (comment inserted by DPA) */
;
IF @@rowcount > 0
RETURN;
WAITFOR DELAY @delay;
END

odinserj · 2021-11-01T08:15:32Z

There were no changes in this area since 1.7.16, so the problem isn't related to the upgrade itself. Theoretically UsePageLocksOnDequeue option can cause higher disk usage in this case, because phantom rows removal in this case isn't batched so you can try removing this option (I also don't recommend using it anymore because it may cause additional delays in some corner cases).

I also see you've enabled the SlidingInvisibilityTimeout and set QueuePollInterval to zero as recommended in the docs, and this is the actual reason of changed behaviour. However these options shouldn't cause millions of updates per second – in case of the zero value for the QueuePollInterval, 50 milliseconds will be used instead, limiting to 20 update queries per second per server regardless of their worker count.

Can you please tell me more about millions of updates per second – how did you diagnose this number and what's the number of updates per second? Also, what's the number of background job servers you have and what's the number of workers they have?

sfung-absorb · 2021-11-01T23:14:26Z

Hi, I had to go find some information. Our DBA has pointed this out in DPA.

From the stats gather it's pointing to the Update statement I mentioned above.

Another stat from DPA:

I'm not sure if this helps. They are stats collected by SolarWinds Database Performance Analyzer. It's something our DBA pointed out to us (devs). I don't think I said millions per second, just quoting from our DBA seeing millions of updates. Sorry for the confusion. We're just trying to understand if there's something that can cause this (the Update statement above) to occur. Do you think the removal of UsePageLocksOnDequeue (that you mention) could help?

odinserj · 2021-11-02T09:06:56Z

Thank you for providing the screenshots! First of all, UsePageLocksOnDequeue may reduce the number of LCK_M_X and PAGEIOLATCH_EX waits if they are caused by fetch queries, however this is completely unrelated to the millions of updates.

Hangfire is write-heavy, because it's primary task is to update background job state data. However it should cause millions of updates only when you have millions background jobs. Can you tell me whether it is possible to get "Query → Executions" breakdown to understand what queries are executed most of time?

Also I see that heartbeat of your servers executed 7 hours ago. Is it due to stale data (e.g. you didn't refresh that page for 7 hours) or even the refreshed page show 7 hour old heartbeats?

sfung-absorb · 2021-11-03T19:32:55Z

Oh, the heartbeats was not refreshed, I had left that page open...It was more to show the number of workers you asked...

Ok, here are a few more screenshots. This one is where the Hangfire was still on 1.7.16 (I'm just show you this for reference)

Here is the day of the release of our product with Hangfire upgraded to 1.7.25
I want to call out this particular SQL ID did not show up previously on any days, but since the upgrade it has been every day ever since (which is the Update statement in question)

I'm drilling into that SQL ID for details. You can see at about 10 AM when we released our product, this hangfire update statements is executed quite alot.

second tab in there shows the hangfire update statement

Also, we noticed a sudden uptick in transaction/second, right at the same time and has been very high every since.

This situation got pretty bad for us last Friday. 2 Billion.

This update statement did not occur prior to the Hangifre upgrade, but started afterwards.

UPDATE top (1) JQ
SET FetchedAt = GETUTCDATE() output INSERTED.Id,
INSERTED.JobId,
INSERTED.Queue,
INSERTED.FetchedAt
FROM [HangFire].JobQueue JQ
WITH
(
forceseek,
paglock,
xlock
)
WHERE Queue in (@queues1)
AND (FetchedAt is null
OR FetchedAt < DATEADD(second, @Timeoutss, GETUTCDATE()))

So, we're just really puzzled by this, and what maybe causing it. We understand Hangfire is write-heavy (we do quite a fair amount of jobs), but this update statement seems new as we never detected it before. Is there maybe a setting we don't have correct?

Thanks you for your time and any suggestions.

odinserj · 2021-11-04T09:24:24Z

Thank you a lot for your help and all the screenshots. The new query you are talking about appeared due to configuration changes you've mentioned in the first comment, and the fetching implementation switch was caused by enabling the following options. When both of them enabled and QueuePollInterval is set to a sub-second value, polling technique is replaced with some kind of long-polling to reduce the delay between time background job was created and fetched by a worker.

SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,

However I haven't seen before that in some cases it can cause billions of queries, and this is a bug I'm not aware of and can't even tell you yet what's happened. I will investigate this issue and will let you know once it's resolved and would be happy if you are able to answer some more questions.

As a workaround, you can roll back QueuePollInterval value to use queuePollInterval variable to disallow the long-polling feature, while I'm investigating the issue. I'd recommend to leave SlidingInvisibilityTimeout option enabled anyway, since values for QueuePollInterval higher than 1 second will not cause long-polling feature to be enabled:

SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.FromSeconds(queuePollInterval),

Regarding the investigation, I see three symptoms – waits caused by the new query, high execution counts of the new query and much higher transaction rate. Waits and PAGE IO are likely caused by the UsePageLocksOnDequeue option which is already removed from the recommended ones, so at least it can be explained. But other ones can't – execution count of the new query should be about 20 * ServerCount per second, that gives us maximum ~900,000 execs/hour metric, and permanent transaction rate increase should also be additional 250 queries/sec, not 3 thousands.

I'd also like to see the bigger picture, because some queries can affect other ones. Of course those queries can't increase the number of executions, but anyway. Could you answer the following questions?

Could you please tell me more about the "Requeue Failed Hangfire Jobs" query? I see it takes a lot of resources either, but don't understand its nature.
Could you also tell show me the 4619364175 and 4863681399 queries (marked as yellow and purple)? They also take a lot of resources.
What hardware are you using for SQL Server? 2 billions executions per hour means that it can run 555,555 transactions per second, and it's relatively high value.

sfung-absorb · 2021-11-05T15:25:02Z

Ah, thank you for seeing "Requeue Failed Hangfire Jobs" query. That one will need to be deprecated in our system. It was some update statement to re-queue failed jobs. Since then, we have developed a JobFilter to do that. I will contact the right developers to look at removing that.

SQL: 4619364175 - Similar for deleting duplicate jobs. Also looks like one I will follow up with developers.

SQL: 4863681399 - hmm, strange. It's a simple Select-statement from a Province table. Will need to investigate from my side.

Instance type is x1e.2xlarge
244 GB memory
8 cpu
https://aws.amazon.com/ec2/instance-types/x1e/

Okay, thanks Sergey. We're going to try removing UsePageLocksOnDequeue and set the SlidingInvisibilityTimeout/QueuePollInterval as recommended. Probably won't know for a couple of weeks, as our production releases are scheduled. Will report back if we see improvements.

sfung-absorb · 2021-11-15T16:31:24Z

Hi Sergey, we've released to Production with the changes you recommended. I just want to give you an update.

As mentioned, in our prior release we noticed a higher transaction rate and a particular hangfire update statement.

After our Production release yesterday, the transaction rates have drop:

Also the SQL update statement for the hangfire has also stopped.

Seems the settings you recommended has resolved the issue.

We will still investigate our other SQL, however for this one particular issue is now resolved for us. Not sure if this something you will investigate, but will leave that to you.

Again thanks for your help!

odinserj · 2021-11-16T08:34:25Z

Thank you so much for the update, great the issue is resolved for you! However not at all, because I still understand what's happened with TimeSpan.Zero value for the QueuePollInterval option – it should work great without creating millions of queries per hour, and unfortunately I was unable to replicate the issue. Also I don't have any right to ask you to continue experimenting with your production, so let's leave this issue open – may be some another team will be able to confirm the issue and provide more details.

arielbvargas · 2021-12-13T02:43:23Z

Hi Sergey,

Unfortunately, after a VM migration, we´re experiencing the same behavior in our SQL Server. Running version

Actual configuration:

We´re detecting more than 600 processes in the SQL Server like this:

Those processes seem to be executing:

Do you have a workaround or suggestion to avoid this issue?

Thanks in advance for your help!

odinserj · 2021-12-13T14:25:28Z

Thanks for reporting this @arielbvargas! Could you run the stdump utility to observe stack traces of managed threads when your application is struggling from this issue and post the output here? This utility uses ClrMD to connect to your application (or parse mini dump file) with a debugging session and dump stack traces of all the running threads (printing them to the console output). With its help I will be able to understand how many worker threads issue that query and think what to do based on the output.

arielbvargas · 2021-12-13T16:29:04Z

Hi @odinserj ! Thanks for your quick reply. I´m attaching the requested dump file of the PID that runs our Hangfire instance in IIS.

dump.txt

Please let me know if there is something else I can do in order to help mitigate this issue.

odinserj · 2021-12-13T17:05:06Z

Thanks a lot! How many concurrent queries in the sleeping state you see at the moment? I'd like to compare their number to the number of managed threads that execute that query in the process.

arielbvargas · 2021-12-13T17:31:56Z

Hi again. More than 770. I´m attaching the list. I forgot to mention that we have two IIS running Hangfire. The first IIS is from the dump previously sent. The second one has several application pools (all pointing to the same code). Do you also need that dump ? Thanks!

…

On Mon, Dec 13, 2021 at 2:05 PM Sergey Odinokov ***@***.***> wrote: Thanks a lot! How many concurrent queries in the sleeping state you see at the moment? I'd like to compare their number to the number of managed threads that execute that query in the process. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#1962 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAFTC7EJJXYGFHIYBLMLLLUQYRU3ANCNFSM5HAGCH3A> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>.

odinserj · 2021-12-13T18:33:40Z

Sure, it would be very useful to see the whole picture. Appreciate your assistance!

arielbvargas · 2021-12-13T19:15:27Z

Ok, I´m attaching two more dumps from the second IIS.

w3wp_dump.txt
w3wp_15792.txt

Thanks!

Previous implementation was too relaxed, and the actual number of concurrent executions were much higher than expected "1". Relates to #1962

When hosted in IIS, when deploying new app version, when using autostart providers (probably) and when appdomain unload was delayed (probably), OperationCancelledException thrown in the while(true) or while loop with some external condition, for some reason doesn't lead to loop exit, and leads to new iteration instead without any delay. This behavior was observed when BackgroundExecution class was written and looked really weird, and it's better to avoid calling ThrowIfCancellationRequested in such loops to avoid bad things. Relates to #1962

odinserj · 2021-12-24T10:34:35Z

@arielbvargas thank you so much for the dumps. I've fixed two possible reasons for this strange behaviour and released version 1.7.28. Please upgrade to it and perform application pool recycle after performing the upgrade to avoid old instances from being running in an AppDomain. If the issue persists feel free to re-open this issue.

Don't depend on IsCancellationToken return value, depend on wait absence and exception presence. Relates issues: #1209 #1962

gregpakes · 2022-09-18T20:01:13Z

We are seeing the same issue in 1.7.25.

I am going to upgrade to latest to see if it resolves the issue.

Should I keep my config like this or change as per this ticket?

SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.Zero,

odinserj · 2022-09-19T02:41:14Z

Please try installing the latest version of Microsoft.Data.SqlClient package (latest stable version at the moment is 5.0.0) and use the following connection factory in your Hangfire configuration logic in order to use SqlConnection class from the newly installed package.

.UseSqlServerStorage(
    () => new Microsoft.Data.SqlClient.SqlConnection(connectionString));

All the recent strange issues with queries and connectivity were resolved by using a new package from Microsoft. Please let me know if it solves your issue!

odinserj · 2022-09-19T02:43:08Z

But please note there may be problems with authentication, Microsoft.Data.SqlClient changed defaults in 4.0.0 and sometimes some new keywords like TrustServerCertificate should be included in the connection string, please see #2065 (comment) for details.

gregpakes · 2022-09-19T20:29:26Z

@odinserj

Thanks - Sadly changing the connection string for us is pretty tricky - thousands of on-premise installations.

What are your thoughts on just upgrading Hangfire as a solution until we can change the conn strings? Will that work?

gregpakes · 2022-09-19T20:41:54Z

@odinserj - I have put the latest Hangfire onto my machine and monitored SQL. It does not appear to resolve the issue.

I will try changing the settings as per above.

gregpakes · 2022-09-19T22:51:30Z

So I have changed the config to:

SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.FromSeconds(60)

As you can see from QueryStore the two Update queries have stopped completely:

Is this expected? I was expecting to see it run once per minute?

odinserj · 2022-09-20T06:02:31Z

@gregpakes it is also possible to use Microsoft.Data.SqlClient package of version 3.1.1, in this case no connection string changes required, because breaking changes were introduced only in version 4.0.0 (confirmed both on Windows and Linux).

.UseSqlServerStorage(
    () => new Microsoft.Data.SqlClient.SqlConnection(connectionString));

Unfortunately it's not possible to provide a general fix of this issue in Hangfire itself. Upcoming version 1.8.0 will contain breaking change for this and will use Microsoft.Data.SqlClient by default when this package is referenced to avoid issues with millions of queries or connection pool issues. But it is possible that some runtimes will be broken after this, but at least this problem will be documented in the upgrade guide and in most cases this problem will be raised in testing environments.

System.Data.SqlClient can't be used by default anymore, because it's difficult to monitor this issue and difficult to understand what happens. But unfortunately I can't modify the patch versions (1.7.X) to use Microsoft.Data.SqlClient by default, because it's not expected by anyone that breaking changes required when upgrading between patch versions.

Please try to install and use the Microsoft.Data.SqlClient package of version 3.1.1, because this packages resolved all the recent problems with SQL Server-based storage.

gregpakes · 2022-09-20T06:15:22Z

Thanks @odinserj

I've resolved the issue with:

SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
QueuePollInterval = TimeSpan.FromSeconds(60)

Is this ok?

odinserj · 2022-09-20T06:37:37Z

I'm afraid in this case the probability of a problem is reduced, but the problem itself doesn't go away – I've seen the same symptoms with other queries as well, like with the ServerWatchdog component that's dead simple and contains single query and wait method call. This problem happens rarely in general, but is reproducible for particular environments. I was adding more and more protection layers to ensure wait routine can't work improperly, and only this summer I was able to narrow down the problem to System.Data.SqlClient.

Every customer who changed it to Microsoft.Data.SqlClient reported that all their problems gone.

gregpakes · 2022-09-20T11:38:51Z

@odinserj - Sadly when I use Microsoft.Data,SqlClient 3.1.1 - I get the following issue:

https://stackoverflow.com/questions/66036971/opening-sql-server-connection-causes-system-accessviolationexception-attempted

odinserj · 2022-09-20T12:07:59Z

@gregpakes access violation, on .NET 😩 May I ask you to post here the full exception with stack trace, maybe you saved it somewhere? It would be very useful, because looks like that the referenced SO question relates to System.Data.SqlClient.

gregpakes · 2022-09-20T18:43:07Z

Stacktrace atached:

[InvalidOperationException: Timeout expired.  The timeout period elapsed prior to obtaining a connection from the pool.  This may have occurred because all pooled connections were in use and max pool size was reached.]
   Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\ProviderBase\DbConnectionFactory.cs:365
   Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\ProviderBase\DbConnectionInternal.cs:768
   Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\ProviderBase\DbConnectionClosed.cs:71
   Microsoft.Data.SqlClient.SqlConnection.TryOpenInner(TaskCompletionSource`1 retry) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlConnection.cs:2117
   Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlConnection.cs:2105
   Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides) in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlConnection.cs:1658
   Microsoft.Data.SqlClient.SqlConnection.Open() in D:\a\_work\1\s\src\Microsoft.Data.SqlClient\netfx\src\Microsoft\Data\SqlClient\SqlConnection.cs:1626
   Hangfire.SqlServer.SqlServerStorage.CreateAndOpenConnection() +208
   Hangfire.SqlServer.SqlServerStorage.UseConnection(DbConnection dedicatedConnection, Func`2 func) +49
   Hangfire.SqlServer.SqlServerConnection.GetAllItemsFromSet(String key) +95
   Hangfire.Storage.StorageConnectionExtensions.GetRecurringJobs(IStorageConnection connection) +23
   xxx.BackgroundTasks.Client.HangfireScheduleService.GetAllScheduledTasks() +28
   xxx.BackgroundTasks.Client.BackgroundTasksClient.GetAllScheduleTasks() +14
   xxx.Web.App_Start.ScheduledTasksConfig.Schedule() in C:\dev\ado\git\xxx\xxx.New\xxx.Web\xxx.Web\App_Start\ScheduledTasksConfig.cs:243
   xxx.Web.Startup.Configuration(IAppBuilder app) in C:\dev\ado\git\xxx\xxx.New\xxx.Web\xxx.Web\App_Start\Startup.cs:97

[TargetInvocationException: Exception has been thrown by the target of an invocation.]
   System.RuntimeMethodHandle.InvokeMethod(Object target, Object[] arguments, Signature sig, Boolean constructor) +0
   System.Reflection.RuntimeMethodInfo.UnsafeInvokeInternal(Object obj, Object[] parameters, Object[] arguments) +168
   System.Reflection.RuntimeMethodInfo.Invoke(Object obj, BindingFlags invokeAttr, Binder binder, Object[] parameters, CultureInfo culture) +105
   Owin.Loader.<>c__DisplayClass19_1.<MakeDelegate>b__0(IAppBuilder builder) in /_/src/Owin.Loader/DefaultLoader.cs:429
   Owin.Loader.<>c__DisplayClass9_0.<LoadImplementation>b__0(IAppBuilder builder) in /_/src/Owin.Loader/DefaultLoader.cs:127
   Microsoft.Owin.Host.SystemWeb.<>c__DisplayClass5_0.<InitializeBlueprint>b__0(IAppBuilder builder) in /_/src/Microsoft.Owin.Host.SystemWeb/OwinHttpModule.cs:49
   Microsoft.Owin.Host.SystemWeb.OwinAppContext.Initialize(Action`1 startup) in /_/src/Microsoft.Owin.Host.SystemWeb/OwinAppContext.cs:82
   Microsoft.Owin.Host.SystemWeb.OwinBuilder.Build(Action`1 startup) in /_/src/Microsoft.Owin.Host.SystemWeb/OwinBuilder.cs:63
   Microsoft.Owin.Host.SystemWeb.OwinHttpModule.InitializeBlueprint() in /_/src/Microsoft.Owin.Host.SystemWeb/OwinHttpModule.cs:46
   System.Threading.LazyInitializer.EnsureInitializedCore(T& target, Boolean& initialized, Object& syncLock, Func`1 valueFactory) +119
   Microsoft.Owin.Host.SystemWeb.OwinHttpModule.Init(HttpApplication context) in /_/src/Microsoft.Owin.Host.SystemWeb/OwinHttpModule.cs:24
   System.Web.HttpApplication.RegisterEventSubscriptionsWithIIS(IntPtr appContext, HttpContext context, MethodInfo[] handlers) +523
   System.Web.HttpApplication.InitSpecial(HttpApplicationState state, MethodInfo[] handlers, IntPtr appContext, HttpContext context) +176
   System.Web.HttpApplicationFactory.GetSpecialApplicationInstance(IntPtr appContext, HttpContext context) +220
   System.Web.Hosting.PipelineRuntime.InitializeApplication(IntPtr appContext) +303

[HttpException (0x80004005): Exception has been thrown by the target of an invocation.]
   System.Web.HttpRuntime.FirstRequestInit(HttpContext context) +657
   System.Web.HttpRuntime.EnsureFirstRequestInit(HttpContext context) +89
   System.Web.HttpRuntime.ProcessRequestNotificationPrivate(IIS7WorkerRequest wr, HttpContext context) +189

gregpakes · 2022-09-20T19:25:43Z

@odinserj

I have gone back to Microsoft.Data.SQLClient 5.0.0 and the issue has returned.

I am back to 60,000 requests per hour for this query.

update top (1) JQ set FetchedAt = GETUTCDATE()
	output INSERTED.Id, INSERTED.JobId, INSERTED.Queue, INSERTED.FetchedAt
	from [tasks].JobQueue JQ with (forceseek, readpast, updlock, rowlock)
	where Queue in (@queues1,@queues2) and (FetchedAt is null or FetchedAt < DATEADD(second, @timeoutSs, GETUTCDATE()))

I am definitely using Microsoft.Data.SqlClient.

odinserj · 2022-09-21T08:27:31Z

60,000 requests per hour (not millions) is an expected metric for that configuration. You can tune the QueuePollInterval to TimeSpan.FromMilliseconds(300) to get 10,000 requests per hour or ~3 requests per second. Or use any larger interval suitable for you, based on the processing expectations.

gregpakes · 2022-09-21T22:58:16Z

The plot thickens.

We can't use Microsoft.Data.SqlClient 5.0.0 due to this bug: dotnet/SqlClient#1418

They say a fix will be out by the end of the year, but the fix is included in 4.1.1.

Correct me if I'm wrong, but 4.X still contains the Hangfire runaway issues? So I would need to go to 3.1?

odinserj · 2022-09-23T08:05:50Z

Microsoft.Data.SqlClient 4.X or 3.1 is expected to be more stable than System.Data.SqlClient. I have issues related to the "millions of update statements" with the latter, but never heard about such problems with the former.

odinserj added the needs more info label Nov 16, 2021

odinserj added a commit that referenced this issue Dec 18, 2021

Replace semaphores with custom synchronization

e1d31fb

Previous implementation was too relaxed, and the actual number of concurrent executions were much higher than expected "1". Relates to #1962

odinserj added a: sql-server t: bug and removed needs more info labels Dec 18, 2021

odinserj added this to the Hangfire 1.7.28 milestone Dec 18, 2021

odinserj closed this as completed Dec 24, 2021

odinserj added a commit that referenced this issue May 23, 2022

Add and use CancellationToken.WaitOrThrow extension method

756df22

Don't depend on IsCancellationToken return value, depend on wait absence and exception presence. Relates issues: #1209 #1962

djdd87 mentioned this issue Nov 9, 2022

Millions/Billions of HangFire queries being highlight in SSMS Activity Monitor #2120

Open

johnml1135 mentioned this issue Aug 11, 2023

Mongo Pinging 100% sillsdev/serval#84

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DISK IO wait due to the millions of update statement #1962

DISK IO wait due to the millions of update statement #1962

sfung-absorb commented Oct 29, 2021

odinserj commented Nov 1, 2021

sfung-absorb commented Nov 1, 2021

odinserj commented Nov 2, 2021

sfung-absorb commented Nov 3, 2021

odinserj commented Nov 4, 2021

sfung-absorb commented Nov 5, 2021

sfung-absorb commented Nov 15, 2021

odinserj commented Nov 16, 2021

arielbvargas commented Dec 13, 2021

odinserj commented Dec 13, 2021

arielbvargas commented Dec 13, 2021

odinserj commented Dec 13, 2021

arielbvargas commented Dec 13, 2021 via email

odinserj commented Dec 13, 2021

arielbvargas commented Dec 13, 2021

odinserj commented Dec 24, 2021

gregpakes commented Sep 18, 2022

odinserj commented Sep 19, 2022

odinserj commented Sep 19, 2022

gregpakes commented Sep 19, 2022

gregpakes commented Sep 19, 2022

gregpakes commented Sep 19, 2022

odinserj commented Sep 20, 2022

gregpakes commented Sep 20, 2022

odinserj commented Sep 20, 2022

gregpakes commented Sep 20, 2022

odinserj commented Sep 20, 2022

gregpakes commented Sep 20, 2022

gregpakes commented Sep 20, 2022

odinserj commented Sep 21, 2022

gregpakes commented Sep 21, 2022 •

edited

Loading

odinserj commented Sep 23, 2022

DISK IO wait due to the millions of update statement #1962

DISK IO wait due to the millions of update statement #1962

Comments

sfung-absorb commented Oct 29, 2021

odinserj commented Nov 1, 2021

sfung-absorb commented Nov 1, 2021

odinserj commented Nov 2, 2021

sfung-absorb commented Nov 3, 2021

odinserj commented Nov 4, 2021

sfung-absorb commented Nov 5, 2021

sfung-absorb commented Nov 15, 2021

odinserj commented Nov 16, 2021

arielbvargas commented Dec 13, 2021

odinserj commented Dec 13, 2021

arielbvargas commented Dec 13, 2021

odinserj commented Dec 13, 2021

arielbvargas commented Dec 13, 2021 via email

odinserj commented Dec 13, 2021

arielbvargas commented Dec 13, 2021

odinserj commented Dec 24, 2021

gregpakes commented Sep 18, 2022

odinserj commented Sep 19, 2022

odinserj commented Sep 19, 2022

gregpakes commented Sep 19, 2022

gregpakes commented Sep 19, 2022

gregpakes commented Sep 19, 2022

odinserj commented Sep 20, 2022

gregpakes commented Sep 20, 2022

odinserj commented Sep 20, 2022

gregpakes commented Sep 20, 2022

odinserj commented Sep 20, 2022

gregpakes commented Sep 20, 2022

gregpakes commented Sep 20, 2022

odinserj commented Sep 21, 2022

gregpakes commented Sep 21, 2022 • edited Loading

odinserj commented Sep 23, 2022

gregpakes commented Sep 21, 2022 •

edited

Loading