This is a Lambda Architecture project to ingest, store ans analyse tweets. The tweets are saved in a data lake for later batch processing. Meanwhile, they are aggregated into an NoSQL database in real time using streaming technologies.
The VMs are created on a bare-metal (2 NIC) server with two 12 core CPUs, 256Gb of memory and a 2Tb SSD internal drive. The VM disks are located on a NAS (4 NIC) with eight 8Tb disks in a RAID 6 volume group (39Tb space) mounted using ISCSI via DM-Multipath. The ISCSI LUN is configured with multi session enabled so that all I/O are load balanced across the 6 NICs. All 6 NICs are connected to a high speed Netgear switch.
Role | Host | Alias | CPU# | Memory | Disk |
---|---|---|---|---|---|
Schema Registry | k8s-node01 | schema | 1 | 15Gb | 2Tb |
Kafka Connect | k8s-node02 | connect | 1 | 15Gb | 2Tb |
Zookeeper | k8s-node03 | zookeeper | 2 | 15Gb | 2Tb |
Kafka | k8s-node04 | kafka | 2 | 15Gb | 2Tb |
Namenode/Resource manager | k8s-node05 | namenode/resourcemgr | 2 | 15Gb | 2Tb |
Datanode/Node manager | k8s-node06 | datanode1/nodemgr1 | 4 | 15Gb | 2Tb |
Datanode/Node manager | k8s-node07 | datanode2/nodemgr2 | 4 | 15Gb | 2Tb |
Datanode/Node manager | k8s-node08 | datanode3/nodemgr3 | 4 | 15Gb | 2Tb |
Cassandra | k8s-node09 | cassandra | 2 | 15Gb | 2Tb |
UI node | k8s-node10 | uinode | 1 | 15Gb | 2Tb |
Spark is to be configured on client machines used to start the jobs. The configuration procedure is available here