Student Name | Student ID |
---|---|
Kieu Huy Hai | BI12-149 |
Vu Duc Hieu | BI12-162 |
Bui Cong Hoang | BI12-169 |
Ngo Quang Hung | BI12-191 |
Pham Khuong Cuong | BI12-070 |
Hoang The Duy | BI12-129 |
- Install
openjdk-17-jre
andopenjdk-17-jdk
usingapt
:sudo apt-get install openjdk-17-jre openjdk-17-jdk -y
- Download the source code (
.tar.gz
file) of the OpenMPI at the homepage. - In the source code folder after extracting, configure the source code:
./configure --enable-mpi-java
and install MPI:make && make install
- Add environment variable:
export LD_LIBRARY_PATH=/usr/local/lib
- Run this project using
make
Our system will use 5 Docker containers to connect and communicate via a Docker network. Each container uses MPI to communicate in the network and uses the MapReduce
model to achieve the goal: calculate the average point of each different subject from a large amount of input.
In this context, the MapReduce
task would be distributed across the Docker containers, with each handling a portion of the data processing, and the MPI processes within the containers would communicate with each other to do the MapReduce
tasks, ensuring that data is correctly processed and aggregated.
Our system will use 2 phases of MapReduce
:
- Phase 1: Map from
SIDPts
toSID, List(pts)
and reduce to an array of triplet[SID, numPtsTillNow,totalPtsTillNow]
: - Phase 2: Map from the array
[SID, numPtsTillNow,totalPtsTillNow]
toSID, List(numPtsTillNow,totalPtsTillNow)
and reduce to[SID, avgPts]
:
Calculate the average point for 4 subjects: Math, English, French and Literature in the National High School Graduation Examination (We will use the points from the CSV file here)