NBA stats

1. 💬 Project description
2. 📟 Prerequisites
3. 🔌 Quickstart
4. 🚀 Run
5. 🔗 Internal Architecture
6. 🏆 Code Quality and Formatting
7. 📚 Complementary documentation

1. 💬 Project description

This project aims to build a local database for retrieving NBA data through SQL queries. It consists of two main parts:

Scraping: To get raw data
Data engineering: To manipulate the data using dbt & duckdb

Datawarehouse documentation: link

2. 📟 Prerequisites

The project uses uv (v0.5.10) to handle python version and dependencies.

3. 🔌 Quickstart

To setup and use the project locally, execute the following steps:

curl -LsSf https://astral.sh/uv/0.5.10/install.sh | sh (Install uv v0.5.10. See doc.)
uv sync (Install virtual environment)
uv run pre-commit install -t commit-msg -t pre-commit (Setup pre-commit)

4. 🚀 Run

4.1. ⚙️ Scraping scripts

This is not necessary to execute it again as the data is already extracted

cd ./scraping
Generate game_schedule.csv : uv run python get_games_schedule.py
Generate game_boxscore.csv : uv run python get_games_boxscore.py

The generated data is then transferred to the sources of the dbt project: cp ./scraping/data/*.parquet ./transform/nba_dwh/local_source/

4.2. ⚙️ Create database

The following section describe the steps to create the local duckdb database, leveraging dbt:

cd ./transform/nba_dwh
uv run dbt deps (Install dbt dependencies)
uv run dbt run (Run transformations)
uv run dbt test (Test pipeline)
uv run dbt docs generate (Generate doc)
uv run dbt docs serve (Launch doc)

4.3. ⚙️ Interact with database

Once the database is created:

Open the local db: uv run duckcli ./nba_dwh.duckdb
Request data:

-- Career statistics of Rajon Rondo
select p.player_name, s.years, ps.nb_games, ps.avg_points, ps.avg_assists
from player_season ps
inner join player p on p.id = ps.player_id
inner join season s on s.id = ps.season_id
where p.player_name like 'Rajon Rondo'
order by s.years

5. 🔗 Internal Architecture

Folder /scraping: Contains scripts to generate the raw data
Folder /transform: Contains dbt project to generate the database

6. 🏆 Code Quality and Formatting

The python files are linted and formatted using ruff, see configuration in pyproject.toml
The dbt sql models files are formatted using sqlfmt
Pre-commit configuration is available to ensure trigger quality checks (e.g. linter)
Commit messages follow the conventional commit convention

7. 📚 Complementary documentation

DBT
DuckDB
DBT-DuckDB adapter
See analysis based on this data, and leveraging bayesian statistics here

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

NBA stats

1. 💬 Project description

2. 📟 Prerequisites

3. 🔌 Quickstart

4. 🚀 Run

4.1. ⚙️ Scraping scripts

4.2. ⚙️ Create database

4.3. ⚙️ Interact with database

5. 🔗 Internal Architecture

6. 🏆 Code Quality and Formatting

7. 📚 Complementary documentation

Files

README.md

Latest commit

History

README.md

File metadata and controls

NBA stats

1. 💬 Project description

2. 📟 Prerequisites

3. 🔌 Quickstart

4. 🚀 Run

4.1. ⚙️ Scraping scripts

4.2. ⚙️ Create database

4.3. ⚙️ Interact with database

5. 🔗 Internal Architecture

6. 🏆 Code Quality and Formatting

7. 📚 Complementary documentation