PySpark ML Crashcourse

This repository contains exercises and solutions for a one-day crash course for PySpark and Spark ML. The repository only contains Jupyter Notebooks which assume a working PySpark kernel with Python 3.5 and Spark 2.1.

Author

All notebooks have been create by Kaya Kupferschmidt @ dimajix. In case you have any questions, feel free to contact me at k.kupferschmidt@dimajix.de

01 - PySpark DataFrame Introduction

This notebook contains some simple snippets to get a basic understanding how to interact with Spark DataFrames in Python.

02 - PySpark Word Count (exercise + solution)

These notebooks contain the classic word count, implemented with DataFrames.

03 - Linear Regression (skeleton + solution)

These notebooks contain a simple linear regression exercise as an introduction to machine learning with Spark.

04 - Text Classification (exercise + solution)

After being exposed to a simple linear regression, these notebooks contain an exercise to perform a simple statistical text classification.

05 - Hyper Parameter Tuning (exercise + solution)

As with many complex algorithms and ML pipelines, the text classification has many hyper parameters. These notebooks show how to perform hyper parameter tuning with PySpark.

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
application		application
notebooks		notebooks
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PySpark ML Crashcourse

Author

01 - PySpark DataFrame Introduction

02 - PySpark Word Count (exercise + solution)

03 - Linear Regression (skeleton + solution)

04 - Text Classification (exercise + solution)

05 - Hyper Parameter Tuning (exercise + solution)

About

Releases

Packages

Languages

dimajix/pyspark-ml-crashcourse

Folders and files

Latest commit

History

Repository files navigation

PySpark ML Crashcourse

Author

01 - PySpark DataFrame Introduction

02 - PySpark Word Count (exercise + solution)

03 - Linear Regression (skeleton + solution)

04 - Text Classification (exercise + solution)

05 - Hyper Parameter Tuning (exercise + solution)

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages