workflow recovery

Understanding Workflow Recovery Mechanism

DISCLAMER : This guide is not relevant to Workflow Recovery when on cluster mode.

This guide will help you understand how the workflow recovers, after the workflow server crashes or there is a restart.

A boot script has been created, which at boot time, checks which all Workflow Instances that are pending and then check the state of the tokens that are also pending (running) state. Based on the implementation of tokens, they are handled properly and process is continued.

Recovery doesn't worry about safe checkpoints, instead it will re-run the activity no matter what because we can't know the exact point of failure. The only case where recovery might be an issue is Service Task, so its always advisable to define the services that are idempotent in nature.

Special Scenarios covered as part of workflow recovery -

In case of SubProcess/CallActivity, child process got complete but parent token was not informed (corner-case). In this scenario, we check if the sub process is complete so that we just don't just rely on the sub process to trigger the parent process.
In case of Timer Event, if 'time duration' is defined, token creation time will be taken into account and if the time has already elapsed, we will just emit the token completion event at the very moment.
For Multi-Instance Parallel Activity, token has a property 'number of active instances' and we will emit the token that many times on boot.
For all other BPMN elements, we execute the 'token arrived event' which internally executes the logic pertaining to that node again and continues the flow forward.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

workflow recovery

Understanding Workflow Recovery Mechanism

Clone this wiki locally