Skip to content

Latest commit

 

History

History
17 lines (13 loc) · 731 Bytes

README.md

File metadata and controls

17 lines (13 loc) · 731 Bytes

spark-dataflow

Spark-dataflow allows users to execute dataflow pipelines with Spark. Executing a pipeline on a spark cluster is easy: Depend on spark-dataflow in your project and execute your pipeline in a program by calling SparkPipelineRunner.run.

The Maven coordinates of the current version of this project are: com.cloudera.dataflow.spark dataflow-spark 0.0.1

An example of running a pipeline against a spark cluster in local mode with 2 threads. Pipeline p = Pipeline.create(PipelineOptionsFactory.create()); /** logic for building your pipeline */ EvaluationResult result = new SparkPipelineRunner("local[2]").run()