Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quartz - dependent bean jobs should reuse instance for interruption #42096

Closed
wants to merge 1 commit into from

Conversation

manovotn
Copy link
Contributor

Follows up on #42072
Related to #42034

This comment has been minimized.

Comment on lines 40 to 43
public void interrupt() throws UnableToInterruptJobException {
Instance.Handle<? extends Job> handle = jobInstance.getHandle();
// delegate if possible; throw an exception in other cases
if (InterruptableJob.class.isAssignableFrom(handle.getBean().getBeanClass())) {
try {
((InterruptableJob) handle.get()).interrupt();
} finally {
if (handle.getBean().getScope().equals(Dependent.class)) {
handle.destroy();
}
}
((InterruptableJob) handle.get()).interrupt();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may call interrupt() on already destroyed instance. Though, not sure if this should be handled somehow...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something the bean (the job) should take into consideration. I don't think we can prevent it because the way interrupt and execute interact depend on the the job implementation anyway.
ATM we are delaying the destruction as far as we can - after the execute() is done.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is something the bean (the job) should take into consideration. I don't think we can prevent it because the way interrupt and execute interact depend on the the job implementation anyway.

Job implementation can control only internals of CustomInterruptableJob.execute(context), and CustomInterruptableJob.interrupt(). It cannot control work of the CdiAwareJob wrapper. For example,

  1. CustomInterruptableJob.execute(context) is finished and goes out of control of the CustomInterruptableJob.
  2. Instance.Handle.destroy() is finished.
  3. Accidentally, CdiAwareJob.interrupt() is invoked and fails with "Instance already destroyed". The control is not even passed to the CustomInterruptableJob.interrupt().

So this is not a matter of custom interrupt and execute interaction, it is a matter of CdiAwareJob's interrupt and execute interaction.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm, Quartz impl only calls interrupt() if it finds a running job but it doesn't seem to care about race conditions which is what you imply IIUIC.

In that case, for a dependent bean that has been destroyed prior to interrupt we could:

  • Not invoke interrupt at all
  • Keep contextual reference and invoke it anyway even though bean's @PreDestroy was already triggered

I assume the latter would be more user friendly as there can be logging or general logic not related to this bean that you want to happen when interruption is invoked.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I was wrong, this is more complex because of how re-firing jobs work.
A job throwing a proper exception can trigger a re-fire (see here) which will use the same job instance and we want a new bean for that.

@manovotn
Copy link
Contributor Author

Ok, second attempt.

After some digging I realized that we cannot just pass Instance.Handle into CdiAwareJob - we must have Instance as jobs can potentially re-fire (as shown in DependentBeanJobTest) which will use the same CdiAwareJob instance.

So, the current solution:

  • Passes an Instance into CdiAwareJob
  • Keeps track of Instance.Handle for a single job execution so that we can destroy the bean after the execute() method is done
  • Keeps track of contextual instance as we need that for interrupt() method which can, in a rare race condition, occur after `execute()
  • Creates new Handle and contextual ref. in the execute() method if not present and nullifies the Handle it afterwards. For most jobs, this does nothing but for re-fired jobs backed by dep. beans it makes sure we get a fresh bean on every invocation.

@mkouba @ilia1243 feedback welcome :)

handle.destroy();
}
}
((InterruptableJob) contextualReference).interrupt();
Copy link

@ilia1243 ilia1243 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approximately the same rare race condition: the job can be interrupted before contextualReference is set...

  • I suppose custom interrupt() should be called anyway, so need to initialize bean.
  • Need not destroy the bean because execute() should be run on the same instance.
  • The only left is to believe that execute() will really be called to have a chance to destroy the bean...

Some alternative thought about interrupt() after execute().

Since CustomInterruptableJob.execute(context) is already finished in this case, there is seemingly nothing that can control whether CustomInterruptableJob.interrupt() will be called or not. I mean, just a moment later the job will leave the list of currently running job, Scheduler.interrupt() will not find it and return false.

If considering this as an undefined behavior, one could stick to one more option - always return false, but this requires support by Quartz inside QuartzScheduler.interrupt(), like this:

if (job instanceof InterruptableJob) {
    try {
        ((InterruptableJob)job).interrupt();
        interrupted = true;
    } catch (JobWasNotReallyInterrupted) {
        interrupted = false;
    }
}

and raise JobWasNotReallyInterrupted in CdiAwareJob if the job is (being) destroyed.

EDIT the same logic for not yet started to execute() jobs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approximately the same rare race condition: the job can be interrupted before contextualReference is set...

Hm, I was trying to avoid using locks here as it would affect all the jobs and make them more expensive for the sake of a corner case...
We could initialize the first state in constructor which would eliminate this issue for anything but re-firing jobs where this slim chance still exists.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was trying to avoid using locks here as it would affect all the jobs and make them more expensive

Not dramatically more expensive as locks are already used, e.g. in LazyInstanceHandle.

We could initialize the first state in constructor

You mean to call Instance.getHandle() inside constructor? This will in particular rely on if execute() is called eventually to be able to destroy it. I am not sure if this is always the case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EDIT the same logic for not yet started to execute() jobs.

I am not sure that's the case, quartz wouldn't find that job as running to invoke interrupt on it, would it?

Copy link

@ilia1243 ilia1243 Jul 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I see from JobRunShell.run(), it

  1. Calls notifyListenersBeginning() that in particular registers the job as running.
  2. Calls execute()
  3. Calls notifyJobListenersComplete() that in particular unregisters the job.

The problem in what should happen with interrupt() in periods between 1 and 2, and between 2 and 3 (plus some time after 3, because Quartz iterates over copy of running jobs out of lock to find the job for interrupting).

In comment I thought about if behavior in these periods can be considered undefined, so that it would be possible to choose the specific one mentioned in that comment.

Copy link

quarkus-bot bot commented Jul 24, 2024

Status for workflow Quarkus CI

This is the status report for running Quarkus CI on commit 44c8cfa.

✅ The latest workflow run for the pull request has completed successfully.

It should be safe to merge provided you have a look at the other checks in the summary.

You can consult the Develocity build scans.

@manovotn
Copy link
Contributor Author

Superseeded by #42223
I will move backport labels there as well.

@manovotn manovotn closed this Jul 30, 2024
@quarkus-bot quarkus-bot bot added triage/invalid This doesn't seem right and removed triage/backport-3.8 triage/backport labels Jul 30, 2024
@manovotn manovotn deleted the 42034_improve branch July 30, 2024 13:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/scheduler triage/invalid This doesn't seem right
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants