Skip to content

Data project using NBA stats. Model a local datawarehouse using dbt-duckdb.

Notifications You must be signed in to change notification settings

pdgarden/nba-stats

Repository files navigation

NBA stats

1. πŸ’¬ Project description

This project aims to build a local database for retrieving NBA data through SQL queries. It consists of two main parts:

  • Scraping: To get raw data
  • Data engineering: To manipulate the data using dbt & duckdb

Datawarehouse documentation: link

2. πŸ“Ÿ Prerequisites

The project uses uv (v0.5.10) to handle python version and dependencies.

3. πŸ”Œ Quickstart

To setup and use the project locally, execute the following steps:

  1. curl -LsSf https://astral.sh/uv/0.5.10/install.sh | sh (Install uv v0.5.10. See doc.)
  2. uv sync (Install virtual environment)
  3. uv run pre-commit install -t commit-msg -t pre-commit (Setup pre-commit)

4. πŸš€ Run

4.1. βš™οΈ Scraping scripts

This is not necessary to execute it again as the data is already extracted
  • cd ./scraping
  • Generate game_schedule.csv : uv run python get_games_schedule.py
  • Generate game_boxscore.csv : uv run python get_games_boxscore.py

The generated data is then transferred to the sources of the dbt project: cp ./scraping/data/*.parquet ./transform/nba_dwh/local_source/

4.2. βš™οΈ Create database

The following section describe the steps to create the local duckdb database, leveraging dbt:

  1. cd ./transform/nba_dwh
  2. uv run dbt deps (Install dbt dependencies)
  3. uv run dbt run (Run transformations)
  4. uv run dbt test (Test pipeline)
  5. uv run dbt docs generate (Generate doc)
  6. uv run dbt docs serve (Launch doc)

4.3. βš™οΈ Interact with database

Once the database is created:

  • Open the local db: uv run duckcli ./nba_dwh.duckdb
  • Request data:
-- Career statistics of Rajon Rondo
select p.player_name, s.years, ps.nb_games, ps.avg_points, ps.avg_assists
from player_season ps
inner join player p on p.id = ps.player_id
inner join season s on s.id = ps.season_id
where p.player_name like 'Rajon Rondo'
order by s.years

5. πŸ”— Internal Architecture

  • Folder /scraping: Contains scripts to generate the raw data
  • Folder /transform: Contains dbt project to generate the database

6. πŸ† Code Quality and Formatting

  • The python files are linted and formatted using ruff, see configuration in pyproject.toml
  • The dbt sql models files are formatted using sqlfmt
  • Pre-commit configuration is available to ensure trigger quality checks (e.g. linter)
  • Commit messages follow the conventional commit convention

7. πŸ“š Complementary documentation

About

Data project using NBA stats. Model a local datawarehouse using dbt-duckdb.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages