GitHub - tiburssio/chombo: Big Data ETL and Utilities for Hadoop Map Reduce, Spark and Storm

Introduction

Hadoop based ETL and various utility classes for Hadoop and Storm

Philosophy

Simple to use
Input output in CSV format
Metadata defined in simple JSON file
Extremely configurable with many configuration knobs

Solution

Various relational algebra operation, including Projection, Join etc
Data extraction ETL to extract structured record from unstructured data
Data extraction ETL to extract structured record from JSON data
Data validation ETL with configurable rules and statistical parameters
Data profiling ETL with various techniques
Data transformation ETL with configurable transformation rules
Various statistical data exploration solutions
Data normalization
Seasonal data analysis
Various statistical parameter calculation
Various long term statistical parameter calculation with incremental data
Bulk inset, update and delete of Hadoop data
Bases classes for Storm Spout and Bolt
Utility classes for string, configuration
Utility classes for Storm and Redis

Blogs

The following blogs of mine are good source of details. These are the only source of detail documentation. Map reduce jobs in this projec are used in other projects including sifarish, avenir etc. Blogs related to thos projects are also relevant.

Build

For Hadoop 1

mvn clean install

For Hadoop 2 (non yarn)

git checkout nuovo
mvn clean install

For Hadoop 2 (yarn)

git checkout nuovo
mvn clean install -P yarn

For spark

Build chombo first in master branch with
- mvn clean install
- sbt publishLocal
Build chombo-spark in chombo/spark directory
- sbt clean package

Need help?

Please feel free to email me at pkghosh99@gmail.com

Contribution

Contributors are welcome. Please email me at pkghosh99@gmail.com

Name		Name	Last commit message	Last commit date
Latest commit History 924 Commits
python/app		python/app
resource		resource
spark		spark
src/main/java/org/chombo		src/main/java/org/chombo
.gitignore		.gitignore
README.md		README.md
build.sbt		build.sbt
build.sh		build.sh
manifest.mf		manifest.mf
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Philosophy

Solution

Blogs

Build

Need help?

Contribution

About

Releases

Packages

Languages

tiburssio/chombo

Folders and files

Latest commit

History

Repository files navigation

Introduction

Philosophy

Solution

Blogs

Build

Need help?

Contribution

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages