Skip to content

polyglotDataNerd/poly-kinesis-consumer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

13 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

poly-kinesis-consumer

This project uses a multithreaded/parallel processing framework to listen and process event level data from a data stream (Amazon Kinesis) and transform events to be put into the Data Lake. Specifically the stream is only ingesting events used in the Kinesis service. It's generic enough to apply to any Kinesis stream process.

Dependencies:

Consumer

  • Since it's using maven as the build tool you need to install a local repo on machine in order to generate dependant libraries within the pom.xml file.

      Follow this tutorial to setup quickly:
      install: 
       1. manual: https://maven.apache.org/install.html
       2. homebrew (preferred): http://brewformulas.org/Maven
      quick guide: https://maven.apache.org/guides/getting-started/maven-in-five-minutes.html
    
      MAVEN Project to rebuild run:
      1. mvn clean
      2. mvn package
      3. will compile and generate package (.jar) 
    

Notable Classes:

  1. KinConsumer: This object is the entry class to run the pipeline it uses the ExecutorService interface to maintain thread health. This entry point creates the worker and the worker factories that poll and process from the stream.
  2. KinRecordProcess: This object is the processor, it has an internal queue to manage the processing and the polling of data to not maximize resources.

Application Arguments:

Argument Sample Required
environment us-west-2-prod YES: CI pipeline build

When the java app is compiled and built the sample call to run the application in Fargate with arguments would look something like this

java -jar poly-kinesis-consumer-1.0-ecs.jar environment

Infrastructure

Terraform Modules:

  • infra: This builds the current ECS and Cloudwatch infrastructure to house container service.

  • app: This builds the container service and references the docker image in ECR and also the ECS cluster the services will run in.

Build:

  • apply: This shell takes an environment variable and builds the end to end service.

      source ~/poly-kinesis-consumer/infrastructure/apply.sh "us-west-2-prod"
    
  • destroy: This This shell takes an environment variable and destroys the end to end service.

      source ~/poly-kinesis-consumer/infrastructure/destroy.sh "us-west-2-prod"