-
Notifications
You must be signed in to change notification settings - Fork 172
Feature Highlight: Dataflow engine
Minerva's design goal is to offer users more flexibility and yet preserve the efficiency during runtime. Therefore, Minerva decides to provide the numpy-like NArray
interface for user to write any kind of algorithm as they wish (hopefully). We directly map these NArray
operators to efficient CPU and GPU kernels which meets the basic requirements of speed as many other tools outside did. But this is far from perfect. In fact, there are lots of parallelisms within the algorithm structure which a tool could utilize to further speed up the algorithm
Consider the back propagation of a multi-layer perception below (written in Minerva's owl
package and the complete example is in mnist_mlp.py).
# bp
s2 = self.w2.trans() * s3
# grad
gw2 = s3 * a2.trans() / num_samples
gb2 = s3.sum(1) / num_samples
Line s2 = self.w2.trans() * s3
is to calculate the error of hidden layer by backpropagating from the classifier layer. Line gw2 = s3 * a2.trans() / num_samples
and gb2 = s3.sum(1) / num_samples
are calculating the gradient of the weight and bias respectively.
Not that the data dependencies among those NArray
s are as follows:
-
s2
-> {w2
,s3
} -
gw2
-> {s3
,a2
} -
gb2
-> {s3
}
Therefore, s2
, gw2
and gb2
are independent and thus could be executed in paralellel without any worry about data race issues.
By this observation, Minerva tries to extract these data dependencies automatically. And we try to keep this transparent for users as much as possible so that you could write your program as usual.
The idea is to peek into the future. For the above example, if when s2 = self.w2.trans() * s3
is executed, we could know in advance that there will be two independent operations in the following, then they could be executed in the same time. Such fortune-telling technology is called lazy evaluation.
Basically, when executing code like s2 = self.w2.trans() * s3
, Minerva does not evaluate the concrete value, but instead records the operator in a dataflow graph.
[dataflow graph]
Minerva's underlying engine is a dataflow engine that executes such dataflow graph using multiple threads on multiple GPUs. User thread (python thread when using owl
or main thread when using c++) is in fact generating the dataflow graph for Minerva's dag engine. However, such laziness could not last forever. When the concrete value needs to be extracted from the NArray
data structure for printing, checkpoiting, debugging and so on, the user thread should wait until the dag engine to finish computing. Minerva provides an wait
member function of an NArray
for this. When some_array.wait()
is called, the user thread will block until some_array
is concretely evaluated. A similar get
/to_numpy
(c++/python) function is provided except that it returns the content to the user.
A complete work-flow is shown in the following figure.
[work flow]
UNDER CONSTRUCTION