Skip to content

Lambda Architecture to analyze top 10 trends twitter subjects

Notifications You must be signed in to change notification settings

plawson/lambda-arch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

92 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lambda-arch

This is a Lambda Architecture project to ingest, store ans analyse tweets. The tweets are saved in a data lake for later batch processing. Meanwhile, they are aggregated into an NoSQL database in real time using streaming technologies.

On-Premises setup

The VMs are created on a bare-metal (2 NIC) server with two 12 core CPUs, 256Gb of memory and a 2Tb SSD internal drive. The VM disks are located on a NAS (4 NIC) with eight 8Tb disks in a RAID 6 volume group (39Tb space) mounted using ISCSI via DM-Multipath. The ISCSI LUN is configured with multi session enabled so that all I/O are load balanced across the 6 NICs. All 6 NICs are connected to a high speed Netgear switch.

Role Host Alias CPU# Memory Disk
Schema Registry k8s-node01 schema 1 15Gb 2Tb
Kafka Connect k8s-node02 connect 1 15Gb 2Tb
Zookeeper k8s-node03 zookeeper 2 15Gb 2Tb
Kafka k8s-node04 kafka 2 15Gb 2Tb
Namenode/Resource manager k8s-node05 namenode/resourcemgr 2 15Gb 2Tb
Datanode/Node manager k8s-node06 datanode1/nodemgr1 4 15Gb 2Tb
Datanode/Node manager k8s-node07 datanode2/nodemgr2 4 15Gb 2Tb
Datanode/Node manager k8s-node08 datanode3/nodemgr3 4 15Gb 2Tb
Cassandra k8s-node09 cassandra 2 15Gb 2Tb
UI node k8s-node10 uinode 1 15Gb 2Tb

Spark

Spark is to be configured on client machines used to start the jobs. The configuration procedure is available here

About

Lambda Architecture to analyze top 10 trends twitter subjects

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published