Skip to content

Latest commit

 

History

History
17 lines (12 loc) · 1.3 KB

README.md

File metadata and controls

17 lines (12 loc) · 1.3 KB

Spark MapReduce Lab

Description:

This is an introductory lab in using PySpark to perform rudimentary MapReduce jobs. This assumes that the user has prior knowledge of Python and the concept of MapReduce. Furthermore, this assumes that the user has Spark running on a Hadoop cluster. That is, installation details have been omitted.

This was written for the class CSSE434 as a part of our research project.

Instructions:

To work through this lab, please clone the repo. To do so on the command line, execute the following:

  $ git clone https://github.com/lamdaV/SparkMapReduceLab.git

Once the project has been cloned, read through the introduction and work through its example. With the introduction read and the example worked through, attempt to work on the wordCountTask and friendsListTask.

What's Next?

If you would like to learn more about Spark and what it is capable of, try checking out the Spark Machine Learning Lab