Skip to content
This repository has been archived by the owner on Apr 24, 2023. It is now read-only.

Cook Concepts

wyegelwel edited this page Dec 2, 2016 · 1 revision

Job / instance life cycle

In Cook there are jobs which signify the intent to run a command on a machine and an instance which is an instantiation of the job on a specific machine. A job may have multiple instances if previous instances failed. This section describes the cycle of states that occur for jobs and instances.

A job in Cook can exist in one of three states, "waiting", "running" or "complete". A state of "complete" means that the job is done running, whether it succeeded or failed depends on the state of the job's instance(s). An instance can exist in one of four states, "unknown**", "running", "success", "failed". If any instance of a job is "success" then the job is considered to have completed successfully. If all the instances "failed" then the job is considered to have failed.

A job is "waiting" if it has no instances running. Jobs that are "waiting" will be considered for scheduling by Cook. When Cook schedules a job to a machine, an instance of the job is said to run on that machine. At the point that Cook schedules the instance to the machine, the job is set to "running" and the instance is set to "unknown". The instance is "unknown" because Cook has not yet received confirmation from Mesos that the instance is in fact running. Once Cook receives confirmation the instance is running, the instance transitions to state "running". Alternatively, Cook may learn from Mesos that the instance was not started or started and quickly failed, in which case, the instance will transition directly to "failed".

After the instance is "running" it may transition to "success" or "failed" depending on the outcome of the process running on the machine. If the instance transitions to "success" then the job transitions to "complete". If the instance "failed", then the job will transition to "waiting" if it has retries remaining or "complete" if all of the job's retries have been exhausted.

Clone this wiki locally