Millions/Billions of HangFire queries being highlight in SSMS Activity Monitor #2120

djdd87 · 2022-11-09T12:42:23Z

Hi, I'm trying to get to the bottom of some performance issues on our production environment and SSMS is highlighting the following query in the activity monitor under expensive queries:

<?query --
update top (1) JQ set FetchedAt = GETUTCDATE()
    output INSERTED.Id, INSERTED.JobId, INSERTED.Queue, INSERTED.FetchedAt
    from [HangFire].JobQueue JQ with (forceseek, readpast, updlock, rowlock)
    where Queue in (@queues1) and (FetchedAt is null or FetchedAt < DATEADD(second, @timeoutSs, GETUTCDATE()))
--?>

We are using the following settings, which are implied by similar issues already:

GlobalConfiguration.Configuration.UseSqlServerStorage(Globals.ConnectionString, new SqlServerStorageOptions
{
   SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
   QueuePollInterval = TimeSpan.Zero
});

The query count seems to deviate between about 23,000 and >1.2 billion (as per the above screenshot). Why is this and have I misconfigured something? Here's a screenshot of all our recurring jobs (we only use HangFire for these recurring jobs):

The text was updated successfully, but these errors were encountered:

odinserj · 2022-11-10T03:18:44Z

Please try installing the latest version of Microsoft.Data.SqlClient package (latest stable version at the moment is 5.0.1) and use the following connection factory in your Hangfire configuration logic in order to use SqlConnection class from the newly installed package.

.UseSqlServerStorage(
    () => new Microsoft.Data.SqlClient.SqlConnection(connectionString));

Please note that Microsoft.Data.SqlClient package uses encryption by default since version 4.0.0, so you might be required in some cases to add TrustServerCertificate=True option to your connection string or disable encryption by Encrypt=false. If nothing works, try to downgrade the Microsoft.Data.SqlClient package to version 3.1.1 that doesn't contain contain changes related to the new defaults.

djdd87 · 2022-11-10T07:50:25Z

Thanks for the reply @odinserj. However, before I make this change, could you please explain what it's doing and why it'll fix the issue? Also, why do I need to remove the SlidingInvisibilityTimeout and QueuePollInterval that other posts mention should be added?

odinserj · 2022-11-10T10:35:31Z

Thanks for the reply @odinserj. However, before I make this change, could you please explain what it's doing and why it'll fix the issue?

I was pursuing this issue for years as it happens from time to time in specific environments under specific conditions. I've added protection to almost every layer of retries in Hangfire to avoid running them in an uncontrolled fashion, e.g. without any delay but nothing worked. But for such high number of queries there should be a retry loop with no delay between attempts for some reason.

This summer I noticed that with Microsoft.Data.SqlClient installed this problem goes away, and every customer reported so. So now I recommend everyone with this problem to use Microsoft.Data.SqlClient instead and gather feedback. I saw in its source code retry loop that potentially cause this, but wasn't able to reproduce the issue yet.

Also, why do I need to remove the SlidingInvisibilityTimeout and QueuePollInterval that other posts mention should be added?

No, you don't need to remove them at all.

djdd87 · 2022-11-10T11:06:12Z

OK, I've left in SlidingInvisibilityTimeout and QueuePollInterval and changed it to Microsoft.Data.SqlClient. It'll take a week and a bit until it's in production, but I'll report back either way.

djdd87 · 2022-12-13T11:46:56Z

Unfortunately the same issue is still present. With the suggested changes, it's still hitting billions of calls. We've been monitoring it for a few weeks. Occasionally it's like 15,000, but peaks to greater than one billion again:

GlobalConfiguration.Configuration.UseSqlServerStorage(() => new Microsoft.Data.SqlClient.SqlConnection(RP_Globals.ConnectionString), new SqlServerStorageOptions()
{
   SlidingInvisibilityTimeout = TimeSpan.FromMinutes(5),
   QueuePollInterval = TimeSpan.Zero
});

odinserj · 2022-12-15T10:39:12Z

Thanks for your feedback @djdd87, working on it. By the way, do you notice high network incoming traffic to your SQL Server when facing this issue?

odinserj · 2022-12-15T10:41:45Z

Also, does the CPU consumption shown by SSMS matches the same metric in the Task Manager dialog?

djdd87 · 2022-12-15T12:04:36Z

Thanks for getting back to me! I'll speak to our production team and see if I can get that information.

What does setting SlidingInvisibilityTimeout and QueuePollInterval actually do? It's not business critical for us that our CRON tasks run exactly at 06:00:00, i.e. they could run at 06:00:15 and it'll still work perfectly.

Am I right in thinking that QueuePollInterval set to 0 just constantly spams SQL server as fast as it can and I could just change it to 15 seconds if I'm not bothered if it starts 06:00:00 or 06:00:15?

odinserj · 2022-12-20T04:48:07Z

Thank you, will appreciate any additional information regarding this problem.

SlidingInvisibilityTimeout option defines how long background job is considered to be "active", when no heartbeats sent for it so serves as a fault detector. QueuePollInterval affects how often queues are polled, for sub second values only a single worker is able to poll the empty queue, and for above 1 second all the workers perform this. TimeSpan.Zero causes minimum value to be used that's 50 ms (maximum 20 queries per second) for 1.7 and it will be increased for 200ms (max 5 queries per second) for 1.8.

You can increase the value to 15 seconds, in this case I expect the probability of such an issue can be decreased. But nevertheless something strange sometimes happens with the connection pool itself so that a lot of open connections (much higher than the maximum configured number) are observed from the one side, and timeout exceptions with inability to get a connection from the pool on the other side.

Current fetching implementation can be simplified and waits on SQL Server's side can be avoided, so I will make these changes in 1.7.33.

rzurlo · 2024-04-21T12:24:03Z

I have installed version 1.8.12, the latest one, and the issue persists. By setting QueuePollInterval = TimeSpan.MaxValue, the queries are executed every 7-8 seconds. I don't have any queues installed and only have one recurring job. I would like to completely disable polling because it's unnecessary. Is it possible?

fiidim · 2024-06-06T15:59:08Z

I'm in a similar boat. Regardless of QueuePollInterval setting value, I can see at least 3 polls per second. I've tried many different combinations of intervals and values, and it seems to ignore the values and poll every 200ms. I've tried the Microsoft SQL Client and no relief there either. My setup is an ASP.NET (MVC5) application, and SimpleInjector IOC containers. I'm not sure what version of Hangfire first induced the issue, but it still exists in current releases. I've ensured schemas are latest version. No other issues are being experienced, other than unable to control the frequency of database polling (SqlServer).

djdd87 · 2024-06-06T16:05:43Z

We've just moved to redis instead.

fiidim · 2024-06-06T18:26:31Z

Ironic that the interval is exactly 200ms, regardless of what I set it to. Not sure why the value is ignored, even with primarily default settings.

odinserj · 2024-06-07T02:00:19Z

@fiidim can you post here your settings related to Hangfire and Hangfire.SqlServer?

Queries used for sliding invisibility timeout-based fetching are unified now, and we can use the same implementation for sub- and over second polling delay configured. Possibly relates to #2120.

fiidim · 2024-06-07T13:45:46Z

Sure... Relevant packages I'm using are:

<PackageVersion Include="Hangfire.Console" Version="1.4.3" />
<PackageVersion Include="Hangfire.Core" Version="1.8.12" />
<PackageVersion Include="Hangfire.SqlServer" Version="1.8.12" />
<PackageVersion Include="Microsoft.Data.SqlClient" Version="5.2.1" />
<PackageVersion Include="SimpleInjector" Version="5.4.6" />

BackgroundJobServerOptions are registered first (I've tried leaving them all at default with no change in behaviour):

            container.Register<BackgroundJobServerOptions>(() => new BackgroundJobServerOptions()
            {
                StopTimeout = TimeSpan.FromSeconds(10),
                Queues = queues.Select(j => j.Name).ToArray(),
                HeartbeatInterval = new System.TimeSpan(0, 0, 30),
                ServerCheckInterval = new System.TimeSpan(0, 0, 30),
                SchedulePollingInterval = new System.TimeSpan(0, 0, 30)
            });

JobStorage registered next:

container.Register<JobStorage>(() => new SqlServerStorage(<connectionString>));

GlobalConfiguration set to mostly default (but have tried various combinations with no success):

            GlobalConfiguration.Configuration
                .SetDataCompatibilityLevel(CompatibilityLevel.Version_180) 
                .UseSimpleAssemblyNameTypeSerializer()
                .UseRecommendedSerializerSettings()
                .UseSqlServerStorage(<connectionStringName>, new SqlServerStorageOptions
                {
                    QueuePollInterval = TimeSpan.FromSeconds(5),
                    SlidingInvisibilityTimeout = null
                })
                .UseSimpleInjectorJobActivator(jobContainer);

Then server registered as such:

            container.Register<BackgroundJobServer>(() =>
            {
                var jobServer = new BackgroundJobServer(
                    container.GetInstance<BackgroundJobServerOptions>(),
                    container.GetInstance<JobStorage>());
                return jobServer;
            });

My result, regardless of how I set any variables results in a poll every 200ms, as can be seen from SQL Server Activity Monitor:

Let me know if you need any additional information that may help. Thanks.

fiidim · 2024-06-09T02:07:00Z

I made some headway on this. Seems that the SqlServerStorageOptions defined in the GlobalConfiguration get overridden by the ones defined on JobStorage (which weren't defined at all), when the BackgroundJobServer is registered with JobStorage parameter. Feels like it's my fault, but is peculiar behaviour to have to declare JobStorage to point to SqlServer, and also have to configure GlobalConfiguration to also use SqlServer, sending in the connection string twice.

fiidim · 2024-06-10T03:50:41Z

Further strange behaviour occurs when using a QueuePollInterval of non-Zero (example 5 seconds). Instead of previously having one worker polling 5 times per second, now all 20 workers want to poll simultaneously at the 5 second interval.

odinserj · 2024-06-11T10:58:41Z

Yes, historically semaphore that limits polling only to a single worker was applied only to sub-second intervals due to their different implementation. Since now implementation is the same for sub-second and above-second intervals, I have changed this logic and since 1.8.14 released today, semaphore is applied in both cases. Please upgrade to the newest version and let me know if it solves the issue.

fiidim · 2024-06-11T13:27:51Z

Yes! That worked perfectly (1.8.14). I am very pleased with the behaviour! Thank you for taking a look and resolving my issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Millions/Billions of HangFire queries being highlight in SSMS Activity Monitor #2120

Millions/Billions of HangFire queries being highlight in SSMS Activity Monitor #2120

djdd87 commented Nov 9, 2022

odinserj commented Nov 10, 2022

djdd87 commented Nov 10, 2022

odinserj commented Nov 10, 2022

djdd87 commented Nov 10, 2022

djdd87 commented Dec 13, 2022 •

edited

Loading

odinserj commented Dec 15, 2022

odinserj commented Dec 15, 2022

djdd87 commented Dec 15, 2022 •

edited

Loading

odinserj commented Dec 20, 2022

rzurlo commented Apr 21, 2024

fiidim commented Jun 6, 2024

djdd87 commented Jun 6, 2024

fiidim commented Jun 6, 2024

odinserj commented Jun 7, 2024

fiidim commented Jun 7, 2024 •

edited

Loading

fiidim commented Jun 9, 2024

fiidim commented Jun 10, 2024

odinserj commented Jun 11, 2024

fiidim commented Jun 11, 2024

Millions/Billions of HangFire queries being highlight in SSMS Activity Monitor #2120

Millions/Billions of HangFire queries being highlight in SSMS Activity Monitor #2120

Comments

djdd87 commented Nov 9, 2022

odinserj commented Nov 10, 2022

djdd87 commented Nov 10, 2022

odinserj commented Nov 10, 2022

djdd87 commented Nov 10, 2022

djdd87 commented Dec 13, 2022 • edited Loading

odinserj commented Dec 15, 2022

odinserj commented Dec 15, 2022

djdd87 commented Dec 15, 2022 • edited Loading

odinserj commented Dec 20, 2022

rzurlo commented Apr 21, 2024

fiidim commented Jun 6, 2024

djdd87 commented Jun 6, 2024

fiidim commented Jun 6, 2024

odinserj commented Jun 7, 2024

fiidim commented Jun 7, 2024 • edited Loading

fiidim commented Jun 9, 2024

fiidim commented Jun 10, 2024

odinserj commented Jun 11, 2024

fiidim commented Jun 11, 2024

djdd87 commented Dec 13, 2022 •

edited

Loading

djdd87 commented Dec 15, 2022 •

edited

Loading

fiidim commented Jun 7, 2024 •

edited

Loading