Attach metrics to ProcessExecutor instead of the simulation Coordinator #5

PetrosPapapa · 2018-09-22T16:38:07Z

It would be useful to be able to track metrics for any executed workflow, not only simulated ones.

We can have ProcessExecutors send events of when things are happening: workflow start, process called, process returned, workflow completed
We can then register handlers (such as a MetricsAggregator) to record data from these events

This can allow us to have a real time timeline of the workflows that are executing, as well as other visualisations and monitoring capabilities.

This is quite a significant refactoring that should not break the existing executors (Akka, Mongo, Kafka).

PetrosPapapa · 2018-09-24T20:47:30Z

This is much more complicated than it sounds. Generic workflows don't have any notion of Task or TaskResource or Workflow in them. Some implementations may just not need this!

One idea is to create a subclass of AtomicProcess (and perhaps CompositeProcess as well?) that is aware of persistent resources. The Executors will be responsible for reporting their activity to the metrics trackers.

If these are a subclass of AtomicProcess, we will preserve compatibility for execution of generic workflows. However, this will make the Executor implementation more complex and potentially slower. We definitely don't want multiple implementations of the same Executor.

We could have Executors throw more generic events that make sense in any workflow and then introduce special handlers to play with resource-aware processes. This might be the way to go, and it will keep executors nice and modular without redundant implementation. We can also provide some metrics for generic workflows as well.

PetrosPapapa · 2018-10-08T13:24:32Z

This naturally evolved into a much more generic problem of workflow monitoring. Here's an excerpt from the relevant team email:

In terms of how the system will be used:

I) Sometimes you run a workflow as a procedure and want to simply wait for a result. A listener will have to listen to events about a particular ID and fulfil a promise when done. You don't need persistence.

II) Sometimes you want to monitor a simulation, so everything that the executor is doing. You want to use custom listeners that capture additional metadata from the simulation. You don't need persistence, but it could help in the future.

III) Other times you want to monitor an executor through long periods of time, keep a log of all the events, but also be able to react to things happening. Here you might want both of the above types of listeners.
Some listeners are just logging what is happening. A user should be able to open a browser, connect to the executor via a listener and get a log and/or a visualisation in real time (or even back in history through the logs).
Other listeners want to capture the results and perhaps make additional decisions (e.g. a business rule that runs a "maintenance" workflow after 5 "machine usage" workflows have executed).
You need persistence for all of this!
I was hoping to make this platform-independent. If you decide to use the AkkaExecutor for simulation, maybe the Akka Streams (or simply the eventStream) is ideal. If you are using the KafkaExecutor, might as well use the Kafka Streams which are made for this.

What I'm going for now is a pub/sub API included in the ProcessExecutor trait. Each executor can decide how to implement this and what type of stream (or simple observer pattern) internally.

I then have events:
sealed trait PiEvent[KeyT] { ... }
(with a bunch of case classes extending this, and handlers/listeners:
trait PiEventHandler[KeyT] extends (PiEvent[KeyT]=>Unit) { ... }

The user can decide how to implement a listener without worrying about the underlying infrastructure.

Here's a promise handler that can be used for scenario (I):
T is the type of ID for a PiInstance (also platform-dependent).

class PromiseHandler[T](val id:T) extends PiEventHandler[T] {  
  val promise = Promise[Any]()
  def future = promise.future
 
  override def apply(e:PiEvent[T]) = if (e.id == this.id) e match {
    case PiEventResult(i,res) => promise.success(res)
    case PiEventFailure(i,reason) => promise.failure(reason)
    case PiEventException(id,reason) => promise.failure(reason)
    case PiEventProcessException(id,ref,reason) => promise.failure(reason)
    case _ => Unit
  } 
}

@JeVaughan raised the question of persistence for observers and the event stream. This is still under discussion.

This is work in progress - code is broken AkkaExecutor is now complying, but there is more to do and the tests are broken. See issue #5 for more information

PetrosPapapa added the feature New feature or request label Sep 22, 2018

PetrosPapapa mentioned this issue Oct 9, 2018

Executors are now Observable #8

Merged

PetrosPapapa mentioned this issue Nov 7, 2018

Metrics and timeline output for any generic workflow execution #15

Merged

PetrosPapapa self-assigned this Nov 7, 2018

PetrosPapapa closed this as completed in #15 Nov 7, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attach metrics to ProcessExecutor instead of the simulation Coordinator #5

Attach metrics to ProcessExecutor instead of the simulation Coordinator #5

PetrosPapapa commented Sep 22, 2018

PetrosPapapa commented Sep 24, 2018

PetrosPapapa commented Oct 8, 2018 •

edited

Loading

Attach metrics to ProcessExecutor instead of the simulation Coordinator #5

Attach metrics to ProcessExecutor instead of the simulation Coordinator #5

Comments

PetrosPapapa commented Sep 22, 2018

PetrosPapapa commented Sep 24, 2018

PetrosPapapa commented Oct 8, 2018 • edited Loading

PetrosPapapa commented Oct 8, 2018 •

edited

Loading