Homework and final project of the Parallel and Distributed Systems: Paradigms and Models course of the Computer Science and Networking Master's Degree @ University of Pisa.
Homework number | Language/Framework/Tool | Description |
---|---|---|
1 | C++ | A program that given a stream of std::vector<double> elements applies two functions (f1(x) = x + 1 and f2(x) = 2 * x ) over all the items of each vector in the stream. The implementation uses only C++ base mechanisms and libraries. The computation has been implemented both in a sequential way and in a parallel way. The parallel version has been implemented as a four stage pipeline on a shared memory multicore: STAGE1 generates a stream of m vectors of n items each (randomly filled), STAGE2 increases all the items in each input vector, STAGE3 doubles all the items in each input vector, STAGE4 prints the input vectors contents on screen. The performance has been measured in terms of scalability, speedup and efficiency. |
2 | C++ | A program that computes in parallel a set of independent tasks, initially stored in a shared data structure, and delivers results using a second shared data structure. The program has been implemented using only C++ standard mechanisms and threads. An input task is given by an integer number N and the result to be computed is the number of prime numbers included in range [1, N]. The initial set of tasks is picked up randomly in the range [1, 10K]. Two different implementations of the workers has been provided:
Load balancing of the workers has been implemented as well. |
3 | C++, FastFlow | A program that finds the prime numbers in a range [n1, n2]. Four implementations have been provided:
|
The application implements the parallel scan algorithm developed by Guy E. Blelloch. The schema adopted in the parallel implementation is the master-worker one, where a distributor module acts as both scatter and gather and several worker modules do the computation in parallel. A performance model has been derived by analyzing the complexity of the computation phases and it is used in the tests in order to have an estimation of the time needed to complete the computation. Actual performances are measured on a Xeon Phi KNL machine with 64 cores (256 threads) and compared with the expected values deducted from the model.
Two implementations are provided, one that uses C++ threads and mechanisms only and the other that exploits low level Fast Flow building blocks. Both share the common structure of a class that given an input vector, an associative operation and its identity value computes the output vector containing the result of the scan.
Three tests are provided:
- The first test executes the parallel prefix class on a vector of integers using the addition as associative operation
- The second one executes the parallel prefix class on a vector of integers using the multiplication as associative operation
- The third one executes the parallel prefix class on a vector of strings using the concatenation as associative operation