Skip to content

nogibjj/fan_xu_pyspark

Repository files navigation

PySpark

CI

The purpose of this project is to demonstrate PySpark functionality on a dataset about the statistics of NBA Players. The dataset is queried and a transformation is done with the output shown in a markdown file.

Requirements

  • Use PySpark to perform data processing on a large dataset

  • Include at least one Spark SQL query and one data transformation

Project Structure

📦 fan_xu_pyspark
.github
workflows
cicd.yml
Makefile
NBA_24_stats.csv
README.md
__pycache__
script.cpython-312.pyc
gitignore
lib.py
output.md
requirements.txt
script.py
test_lib.py

©generated by Project Tree Generator

Highlights

  • EDA

The first 3 rows are displayed along with summary statistics for the age, assists, and steals columns

  • Query

The top 10 highest-scoring players are queried

  • Transformation

A column is added to show the assist/turnover ratio of the players

Installation

Requirements:

  • Python
  • PySpark
  • Java

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published