Name		Name	Last commit message	Last commit date
parent directory ..
Makefile		Makefile
Makefile.meta		Makefile.meta
README.md		README.md
compute-stats.sql		compute-stats.sql
imports		imports
test-rowcount.sh		test-rowcount.sh
test.sh		test.sh
type-mapping.yml		type-mapping.yml

README.md

Sqoop to Kudu Pipeline

This pipeline will sqoop data into a Parquet

Artifacts Created

Impala Parquet table
Impala Kudu table
Sqoop job
Parquet file

Running the pipeline

make first-run :

create Impala tables
create a Sqoop job
execute the Sqoop job, inserting data into Parquet

make update :

execute the Sqoop job

make clean :

Clean up all data on HDFS and Impala DDL. This is non reversible.

Dependency for graphs for all targets can be found here