Skip to content

jsnowacki/SPARK

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SPARK

Materials for course: Introduction to Big Data with Apache Spark

Structure

  • core - Apache Spark core examples
  • data - data for the exercises
  • docker - Docker used in training
  • exercises - exercise questions
  • notebooks - Jupyter notebooks
  • sql - Apache Spark SQL examples
  • streaming - Apache Spark Streaming examples

Setup

Required software

The below are software packages needed for this course:

  • Git
  • Python 3.4+, installed via Anaconda (contains the majority of necessary packages)
  • PySpark (1.6.0+)

Docker setup

Docker setup requires moderate resources but assures that everyone has a working environment for the training.

Setup steps:

  • Download and install Git https://git-scm.com/downloads
  • Download and install Docker following the instructions:
  • (OS X / Win) Open Docker Quickstart Terminal (use Terminal, not iTerm)
  • Go into this repository
  • Build docker docker-compose build
  • To start Docker run docker-compose up
    • If one of the above docker commands fail, run eval "$(docker-machine env default)" and then the command, e.g. docker-compose build
    • Jupyter runs on port 8888 on localhost on Linux on Docker VM IP available from docker-machine ip on Mac OS X and Windows
    • data and notebooks directories are mounted directly from the host file system
    • Note that the container will close with the current terminal session closure

Potential issues:

  • Setup can take some time as Docker pulls a number of images from the network
  • Docker Toolbox with VirtualBox does not work well with Microsoft HyperV used by the new docker; remove HyperV before installing Docker Toolbox
  • Sometimes Docker has problem with getting IPs on restrictive networks
  • Put this repository into your home directory as Docker can have issues with mounting folders that are places outside of the home directory

Manual setup

This setup requires least resources but can be difficult on Windows machines.

Setup steps:

Building Java

Most of the examples are written in Java 8 apart from Word Count examples, which are written in Java 7 and 8 and Scala; see the file suffixes.

The project is build with Apache Maven (http://maven.apache.org).

mvn clean
mvn install

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published