Atomizer is built with an expectation failure in atoms it executes. This approach allows the system to adapt to non-critical failures, such as panics generated by Atoms or Conductors in the system within defined limits. For example, a system with only one Conductor and which fails has nothing to relay processing requests and should panic, whereas an instance with multiple conductors should fail only the singular conductor and continue processing on the other registered conductors in the system.
There are two primary failure modes: Critical and Non-Critical.
-
Critical areas of the Atomizer cause a panic and crash the application with information rich errors in order to alert the user to issues in the Atomizer.
-
Non-Critical areas of the Atomizer capture the panic and push the error along the event channel so that anything monitoring for events can pick them up. Rather than crashing the Atomizer, though, the system attempts to re-initialize the element that failed (self-heal).
In the event that a message from the message queue is not acknowledged and the connection between the message queue and a node in the cluster is terminated, the message queue should attempt to deliver the message to a different node in the cluster for processing (likely round robin with current rabbitmq implementations).
Atoms have several different processing types: Singleton, Spawner (Wait and Free), and Atomic.
-
Singleton Atoms are Atoms where there is a single instance of that Atom on the node at one time. Singleton Atoms will generally be maintenance Atoms for the distributed system or long running process for the system users. Examples of singleton Atoms would be the plugin system, which will monitor for new plugins or live Atom updates from the message queue and deploy those plugins live in the running environment. Singleton Atoms will also likely maintain an internal static state. Singleton Atoms can complete their processing and close down, but there should never be more than one running instance on a node at a time.
-
Spawner Atoms come in two forms.
-
The Wait spawner initiates 1 -> N instances of different Atoms in the cluster for processing and awaits their results in order to complete the algorithm. Monte Carlo pi estimation - where the results are returned and aggregated to calculate the final result - is an example of this kind of spawner.
-
The Free spawner is one in which the Atom initiates one to N additional Atoms but does not wait for the response. Instead, the results of those additional Atoms are monitored somewhere else by another process or not monitored at all.
-
-
Atomic is the third form of an Atom and, since it is created for a singular purpose, is the lowest level form of an Atom. It receives an Electron, executes the Atom using the Electron information, and returns the result back to the Conductor.
Individual Atom instances have several failure modes that will need to be configured as part of the Atom’s Electron.
These failure modes are: Self-Healing / Retry, Log / Fail, and Re-Queue.
-
Self-Heal / Retry will push the Electron for the Atom back onto the Atomizer and attempt to re-run the Atom’s processing method. Self-Heal will also pass failure information, along with the Electron, for reprocessing to ensure that the Atomizer can identify systemic failures so that an Atom that has already been self-healed will not be attempted too many times or create an infinite loop. Retry configuration would be part of the Electron information and would tell the Atomizer how many times to heal a failing Atom.
-
Log / Fail will log the failed Atom information and return the failure state back to the message queue as the Electron’s response.
-
Re-Queue will follow a similar path as Self-Heal / Retry. However, instead of re-attempting the processing on the node internally, it will fail the acknowledgement back to the message queue or push the Electron back to the message queue for distribution to a different node in the cluster for processing. This process will use the original Conductor the Electron was received from in the event that there are multiple Conductors on the node.