Triggers are getting blocked permanently #145

shelmling · 2017-05-22T12:08:27Z

Dear Quartz Team,

We are using Quartz 2.2.1 in clustered-mode with JDBC job store to schedule jobs marked as @DisallowConcurrentExecution.

We have observed that occasionally triggers are getting stuck in trigger state BLOCKED without ever recovering automatically. Looking into the job store DB tables, the pattern is always the same:

The TRIGGER_STATE on <PREFIX>_TRIGGERS is in state BLOCKED
There is no corresponding record in <PREFIX>_FIRED_TRIGGERS

Obviously org.quartz.impl.jdbcjobstore.JobStoreSupport.clusterRecover(Connection, List<SchedulerStateRecord>) will not recover such triggers, so the only way to get out of this inconsistent state is to manually set the TRIGGER_STATE back to WAITING.

It is not yet clear under which circumstances this error occurs. However, our log files indicate that jobs getting stuck coincides with temporary database problems.

Below you can find an example of a NullPointerException in org.quartz.impl.jdbcjobstore.JobStoreSupport.triggersFired(List<OperableTrigger>). The exception itself was caused somewhere in the JDBC driver (Sybase jConnect) when trying to invoke rollback() on a JDBC connection. The log entry’s timestamp correlates exactly with the time the trigger got stuck.

2017 05 01 20:20:02#+00#ERROR#org.quartz.core.QuartzSchedulerThread##anonymous#ItOpScheduler_Clustered_QuartzSchedulerThread#Runtime error occurred in main trigger firing loop.java.lang.NullPointerException: while trying to invoke the method com.sybase.jdbc4.tds.TdsCursor.setRowNum(int) of a null object loaded from field com.sybase.jdbc4.tds.CurInfo3Token._cursor of an object loaded from local variable 'this'
	at com.sybase.jdbc4.tds.CurInfo3Token.getMetaInformation(CurInfo3Token.java:85)
	at com.sybase.jdbc4.tds.CurInfoToken.<init>(CurInfoToken.java:130)
	at com.sybase.jdbc4.tds.CurInfo3Token.<init>(CurInfo3Token.java:45)
	at com.sybase.jdbc4.tds.Tds.nextResult(Tds.java:3239)
	at com.sybase.jdbc4.tds.Tds.readCommandResults(Tds.java:4459)
	at com.sybase.jdbc4.tds.Tds.doCommand(Tds.java:4444)
	at com.sybase.jdbc4.tds.Tds.endTransaction(Tds.java:2602)
	at com.sybase.jdbc4.jdbc.SybConnection.rollback(SybConnection.java:1953)
	at sun.reflect.GeneratedMethodAccessor492.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at com.sap.core.persistence.jdbc.trace.TraceableBase$1.invoke(TraceableBase.java:44)
	at com.sun.proxy.$Proxy17.rollback(Unknown Source)
	at com.sap.core.persistence.jdbc.trace.TraceableConnection.rollback(TraceableConnection.java:239)
	at org.apache.commons.dbcp.DelegatingConnection.rollback(DelegatingConnection.java:368)
	at org.apache.commons.dbcp.DelegatingConnection.rollback(DelegatingConnection.java:368)
	at org.apache.commons.dbcp.PoolingDataSource$PoolGuardConnectionWrapper.rollback(PoolingDataSource.java:323)
	at sun.reflect.GeneratedMethodAccessor492.invoke(Unknown Source)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.quartz.impl.jdbcjobstore.AttributeRestoringConnectionInvocationHandler.invoke(AttributeRestoringConnectionInvocationHandler.java:73)
	at com.sun.proxy.$Proxy143.rollback(Unknown Source)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.rollbackConnection(JobStoreSupport.java:3658)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.executeInNonManagedTXLock(JobStoreSupport.java:3817)
	at org.quartz.impl.jdbcjobstore.JobStoreSupport.triggersFired(JobStoreSupport.java:2908)
	at org.quartz.core.QuartzSchedulerThread.run(QuartzSchedulerThread.java:336)
|

Please let me know if you need additional details.

Thanks for your support,
Sebastian

The text was updated successfully, but these errors were encountered:

shelmling · 2017-05-22T15:09:11Z

I was able to reproduce the issue in the debugger.

If org.quartz.impl.jdbcjobstore.JobStoreSupport.triggerFired(Connection, OperableTrigger) throws a RuntimeException after the trigger state has been set to BLOCKED, the trigger will get stuck. The reason for this is that QuartzSchedulerThead.run() will call org.quartz.spi.JobStore.releaseAcquiredTrigger(OperableTrigger) in case of RuntimeExceptions, which will delete the record from <PREFIX>_FIRED_TRIGGERS but will not set back the trigger state from BLOCKED to WAITING.

Probably org.quartz.impl.jdbcjobstore.JobStoreSupport.releaseAcquiredTrigger(Connection, OperableTrigger) should set back the trigger state to WAITING from both ACQUIRED (which it already does) and from BLOCKED.

Best Regards,
Sebastian

shaoxt · 2017-10-12T04:44:54Z

Is there anyone who can verify and release this fix?
I'm using 2.2.1 also. The job got stuck for same reason.

mstead · 2017-11-06T16:13:11Z

I would also like to see this bug addressed. Any update on when the fix will get merged and built?

AntonSemenovKazan · 2017-11-07T09:44:26Z

Actual bug for me too.
And no visible solution.

We are going to change quartz.

pbuckley · 2017-12-01T18:55:14Z

👍 looks like there's a PR to fixit

sschwenker · 2017-12-13T16:54:47Z

We're having this same issue. Any idea when this will get approved and get a patch out? It seems pretty significant.

yagarwals · 2018-04-27T04:00:11Z

As we can see that this has been around for a while now and this bug is fixed in dot net libraries of quartz. I t will be helpful for us to know on when this will be fixed, as we are affected by this.

(This pull request has been raised to address that on 22 May 2017 #146)

MSudheer87 · 2018-10-15T07:41:54Z

I am facing the same issue with 2.2.1, Is there a fix provided by Quartz for this ?
@shelmling @dx-pbuckley

AntonSemenovKazan · 2018-10-15T12:00:02Z

@MSudheer87 we reduced frequency of this problem when we created sepatared db.
Before we had only one DB for App and Quartz.

MSudheer87 · 2018-10-15T14:08:14Z

@AntonSemenovKazan Thanks for your reply.
you mean to say you have created a separate schema for Quartz alone, correct ?

Let me give you few more details about my issue,

I have integrated Quartz with Spring Batch (The actual batch processing happens in Spring Batch,
Scheduling alone is being taken care by Quartz)
I have a single schema where i have Spring batch tables and Quartz specific tables.
I am facing this issue only in clustered environment (4 node cluster), while i am running on a single/
two node cluster, it works very well.

based on above facts, Do you suspect anything else other than separating out the DB Schema ? Please provide your suggestions, Thank you.

AntonSemenovKazan · 2018-10-16T20:32:48Z

@MSudheer87
Our case:
We have only one DB scheme where tables of our Application and Quartz tables live.
We have noticed that triggers become blocked when SQLException (like SQL Timeout exception) happened or transaction become disconnected.
These problems happen when SQL Server has long hard operations wih App data.

So we decided to move Quartz tables to their own separate scheme.
Also we thought about to move to separate node (SQL Server).

For example, you also can take sources and extend logging.
May be you can notice something special.

But all my team think that Quartz is a little bit strange and we quite often have some troubles.
But sometimes everything OK.

We have written monitoring system to control Quartz behavour.

Unfortunately I cannot say anything about Spring Batch.

sunildabburi · 2018-11-22T00:20:01Z

Going by @shelmling fix, a work-around for Spring managed datasource till his PR is merged:

Create a CustomJobStore class that extends LocalDataSourceJobStore and override releaseAcquiredTrigger(Connection conn, OperableTrigger trigger)
Create a CustomSchedulerFactory class that extends StdSchedulerFactory and override initialize(Properties props)
Set schedulerFactoryClass during SchedulerBean creation

public class CustomSchedulerFactory extends StdSchedulerFactory {

	@Override
	public void initialize(Properties props) throws SchedulerException {
		props.put(StdSchedulerFactory.PROP_JOB_STORE_CLASS, CustomJobStore.class.getName());
		super.initialize(props);
	}
}

public class CustomJobStore extends LocalDataSourceJobStore {

	public CustomJobStore() {
		super();
	}

	@Override
	protected void releaseAcquiredTrigger(Connection conn, OperableTrigger trigger) throws JobPersistenceException {
		try {
			getDelegate().updateTriggerStateFromOtherState(conn, trigger.getKey(), STATE_WAITING, STATE_ACQUIRED);
			getDelegate().updateTriggerStateFromOtherState(conn, trigger.getKey(), STATE_WAITING, STATE_BLOCKED);
			getDelegate().deleteFiredTrigger(conn, trigger.getFireInstanceId());
		} catch (SQLException e) {
			throw new JobPersistenceException("Couldn't release acquired trigger: " + e.getMessage(), e);
		}
	}
}

SchedulerFactoryBean scheduler = new SchedulerFactoryBean();
...
scheduler.setSchedulerFactoryClass(CustomSchedulerFactory.class);

This should unblock you as of now 👍

zemian · 2019-02-12T16:25:41Z

Thank you @shelmling for the PR! It's now merged!

* relates to quartz-scheduler/quartz#146 , quartz-scheduler/quartz#145 * relates to #741 #800

carstenartur · 2020-07-17T12:05:56Z

I cannot see this issue 145 to be fixed in any release at https://github.com/quartz-scheduler/quartz/releases .
Does this mean it is not fixed or there has never been a bug?

AntonSemenovKazan · 2020-07-17T14:01:01Z

I should say that eventually we changed Quartz to Hangfire and now we live happily.

We found out that Quartz stuck when HDD had high I/O operations.
We easily reproduced that case.

So then we compared Hangfire+MS SQL with high I/O operations and it worked without problems.
So ... we changed Quartz and now we have been using Hangfire more than one year.

jnehlmeier · 2020-07-17T14:07:22Z

I cannot see this issue 145 to be fixed in any release at https://github.com/quartz-scheduler/quartz/releases .
Does this mean it is not fixed or there has never been a bug?

@carstenartur It is fixed in 2.3.1+ as part of pull request #146

see commit: 3f65b28

sunildabburi · 2020-12-11T14:08:42Z

For those who are still seeing this issue and if you implemented the JobListener interface, make sure you handle the exception yourself within jobWasExecuted method as quartz does not handle exception thrown in that method and that could leave your job state in BLOCKED and never get recovered. We experienced it with Quartz version 2.3.0

fernandoRSS · 2021-04-13T15:33:46Z

For those who are still seeing this issue and if you implemented the JobListener interface, make sure you handle the exception yourself within jobWasExecuted method as quartz does not handle exception thrown in that method and that could leave your job state in BLOCKED and never get recovered. We experienced it with Quartz version 2.3.0

After 6 hours trying to find a solution I bumped into your answer and it was exactly what was happening in my code.
Thank you

sww0825521xy · 2021-09-29T03:02:55Z

For those who are still seeing this issue and if you implemented the JobListener interface, make sure you handle the exception yourself within jobWasExecuted method as quartz does not handle exception thrown in that method and that could leave your job state in BLOCKED and never get recovered. We experienced it with Quartz version 2.3.0

Most of these blocked triggers change the trigger state from BLOCKED to WAITING automatically since added the JobListner and upgrade quartz from 2.3.0 to the latest version 2.3.2.
But still exists one or two BLOCKED triggers in my case.

kevinamasur · 2022-04-13T15:38:59Z

@AntonSemenovKazan and @MSudheer87 our team was noticing issues similar to yours. The quartz queries on SQL Server would slow down significantly to the point of timing out when the database was under heavy load and waits started to increase (either CPU, Memory, or Disk).

One of the things we noticed was the query plans for quartz queries were very inefficient. When we deep dived we found something like the following query was being executed on the DB:
exec sp_executesql N'SELECT TRIGGER_NAME FROM test_localhost_TRIGGERS WHERE SCHED_NAME = ''QuartzScheduler_test_localhost'' AND TRIGGER_NAME = @P0 AND TRIGGER_GROUP = @P1',N'@P0 nvarchar(4000),@P1 nvarchar(4000)',N'Trigger.AnExampleTrigger',N'test_localhost'

The issue with the above is that parameters @P0 and @P1 are being passed to SQL Server as nvarchar, but the columns on the tables are actually varchar. This can cause the database to use very inefficient query plans when running queries.

We found we were using the SQL Server JDBC driver, and it has a setting for setSendStringParametersAsUnicode which defaults to on. This causes all string parameters to be sent as nvarchar, even if the column is varchar.

The quartz tables don't have any nvarchar columns, and based on Microsofts own documentation:

For optimal performance with the CHAR, VARCHAR, and LONGVARCHAR JDBC data types, an application should set the sendStringParametersAsUnicode property to "false"

We only recently found this out and deployed it to our production environment. I don't know if it will fully fix the quartz issues we have seen but we haven't had any issues since we made this switch last month, so I thought I would post it out incase it helps anyone else. The fix was simply setting the sendStringParametersAsUnicode on the jdbc url for the quartz connection pool.

* applying commit quartznet/quartznet@05fd35c * related to quartz-scheduler/quartz#146 , quartz-scheduler/quartz#145 , #741, #800

asookazian · 2023-08-28T16:17:32Z

Our team has deployed latest version 2.3.2 quartz JAR last Friday in prod server. But we are still immediately experiencing BLOCKED triggers in the qrtz_triggers table for our email notification after catalina restart. Is there any followup/advice for this behavior? Would enabling TRACE logging on "org.quartz" package help? We have a clustered option set in the quartz.properties and the quartz tables are in the same db schema as our app tables.

Has anyone enabled JMX remote access to mbeans as a potential workaround to reset trigger state to WAITING and then immediately firing trigger? https://dzone.com/articles/how-manage-quartz-remotely

Herman1998CHAN · 2023-09-25T10:22:26Z

I encountered a similar issue in our team's new project. We have two app servers, but we only deployed the war file, which includes the Quartz job, on app A. In the related cluster configuration on the Quartz job XML, we set it to false.

Initially, the scheduled jobs functioned properly, but at certain times, they would become "blocked" and fail to resume.

We observed that only the jobs related to updating the status in the database would rerun and return to normal functioning. However, we couldn't find any relevant error logs on the app server. Does anyone have any insights into this issue?

On the other hand, we previously implemented the same configuration (two app servers, war file hosted only on app A) in another project without encountering similar issues. It's worth noting that the previous project used MSSQL, while the new project uses MYSQL.

shelmling mentioned this issue May 22, 2017

Release BLOCKED triggers in releaseAcquiredTrigger #146

Merged

icyerasor mentioned this issue Aug 14, 2018

Trigger getting BLOCKED - prevents the scheduler to run the job again #134

Closed

zemian closed this as completed Feb 12, 2019

bcokca mentioned this issue May 21, 2019

Simple trigger is not triggered due to job persistence exception: Couldn't acquire next trigger: ERROR: prepared statement "S_1" does not exist #447

Closed

lahma mentioned this issue Nov 6, 2019

Trigger not fires in v.3.0.7 quartznet/quartznet#741

Closed

lahma added a commit to quartznet/quartznet that referenced this issue Nov 7, 2019

release BLOCKED triggers in releaseAcquiredTrigger

05fd35c

* relates to quartz-scheduler/quartz#146 , quartz-scheduler/quartz#145 * relates to #741 #800

manjush mentioned this issue Oct 15, 2020

Scheduler doesn't trigger the Job on time #603

Closed

SilviaDGregorio mentioned this issue Nov 7, 2022

release BLOCKED triggers in releaseAcquiredTrigger Oriflame/cosmosdb-quartznet#25

Merged

qnkhuat mentioned this issue Jun 14, 2024

Data Sync Issue - Scheduled Job Sync Fails - 'Triggers for metabase.task.sync-and-analyse.job' metabase/metabase#44194

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triggers are getting blocked permanently #145

Triggers are getting blocked permanently #145

shelmling commented May 22, 2017

shelmling commented May 22, 2017

shaoxt commented Oct 12, 2017

mstead commented Nov 6, 2017 •

edited

Loading

AntonSemenovKazan commented Nov 7, 2017

pbuckley commented Dec 1, 2017

sschwenker commented Dec 13, 2017

yagarwals commented Apr 27, 2018

MSudheer87 commented Oct 15, 2018 •

edited

Loading

AntonSemenovKazan commented Oct 15, 2018

MSudheer87 commented Oct 15, 2018

AntonSemenovKazan commented Oct 16, 2018 •

edited

Loading

sunildabburi commented Nov 22, 2018

zemian commented Feb 12, 2019

carstenartur commented Jul 17, 2020

AntonSemenovKazan commented Jul 17, 2020

jnehlmeier commented Jul 17, 2020

sunildabburi commented Dec 11, 2020

fernandoRSS commented Apr 13, 2021

sww0825521xy commented Sep 29, 2021

kevinamasur commented Apr 13, 2022

asookazian commented Aug 28, 2023 •

edited

Loading

Herman1998CHAN commented Sep 25, 2023

Triggers are getting blocked permanently #145

Triggers are getting blocked permanently #145

Comments

shelmling commented May 22, 2017

shelmling commented May 22, 2017

shaoxt commented Oct 12, 2017

mstead commented Nov 6, 2017 • edited Loading

AntonSemenovKazan commented Nov 7, 2017

pbuckley commented Dec 1, 2017

sschwenker commented Dec 13, 2017

yagarwals commented Apr 27, 2018

MSudheer87 commented Oct 15, 2018 • edited Loading

AntonSemenovKazan commented Oct 15, 2018

MSudheer87 commented Oct 15, 2018

AntonSemenovKazan commented Oct 16, 2018 • edited Loading

sunildabburi commented Nov 22, 2018

zemian commented Feb 12, 2019

carstenartur commented Jul 17, 2020

AntonSemenovKazan commented Jul 17, 2020

jnehlmeier commented Jul 17, 2020

sunildabburi commented Dec 11, 2020

fernandoRSS commented Apr 13, 2021

sww0825521xy commented Sep 29, 2021

kevinamasur commented Apr 13, 2022

asookazian commented Aug 28, 2023 • edited Loading

Herman1998CHAN commented Sep 25, 2023

mstead commented Nov 6, 2017 •

edited

Loading

MSudheer87 commented Oct 15, 2018 •

edited

Loading

AntonSemenovKazan commented Oct 16, 2018 •

edited

Loading

asookazian commented Aug 28, 2023 •

edited

Loading