This pipeline will sqoop data into a Parquet
- Impala Parquet table
- Impala Kudu table
- Sqoop job
- Parquet file
make first-run
:
- create Impala tables
- create a Sqoop job
- execute the Sqoop job, inserting data into Parquet
make update
:
- execute the Sqoop job
make clean
:
- Clean up all data on HDFS and Impala DDL. This is non reversible.
Dependency for graphs for all targets can be found here