GitHub - khodeprasad/java-spark: code snippets to write Apache Spark applications using Java

java-spark

The following are the steps to execute the application in Hadoop Eco System

Step 1: Create directory in Hadoop Distributed File System (HDFS) to place the input file to process using the below command.

hadoop fs -mkdir <your_hdfs_path>/input/

Step 2: Copy the input_data.csv file in HDFS by using the below command

hadoop fs -put /root/java-spark/input/input_data.csv <your_hdfs_path>/input/

Step 3: Build the project by using the following command

cd <project_location>/java-spark/

mvn clean install

Step 4: Execute by issuing the spark-submit command

spark-submit --class com.khodeprasad.AppExecutor <project_location>/java-spark/target/java-spark-jar-with-dependencies.jar

Step 5: See the output at location <your_hdfs_path>/output/

hadoop fs -ls <your_hdfs_path>/output/

The same can be run as a Java Application by following the below instructions.

Step 1: Build the project by using the following command

cd <project_location>/java-spark/

mvn clean install

Step 2: Execute by issuing the spark-submit command

java -cp <project_location>/java-spark/target/java-spark-jar-with-dependencies.jar com.khodeprasad.AppExecutor

Step 3: See the output in the console

Note: To execute the application as a Java Application you need to comment the line

resultRDD.saveAsTextFile("output");

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
input		input
src/main/java/com/khodeprasad		src/main/java/com/khodeprasad
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml