Skip to content

Featuring data engineering/analytics workflows developed with open source tools ✨

License

Notifications You must be signed in to change notification settings

sayantikabanik/DataJourney

Repository files navigation

🚌 DataJourney

Tutorial featuring Data engineering workflow and Open Source tools and technologies. The example datasets are openly available online, metadata info is present in the intake catalog

🛠 Current workflows covered (✨ represents: experimental)

✅ Packaging framework added
✅ Conda environment added
✅ GitHub actions configured
✅ Pre-commit hooks configured for code linting/formatting
✅ Reading data from online sources using intake
✅ Sample pipeline built using Dagster
✅ Building Dashboard using holoviews + panel
✅ Exploratory data analysis (EDA) using mito
✅ Analysing source code complexity using Wily
✅ Web UI build on Flask
✅ Web UI re-done and expanded with FastHTML
✨ [WIP]: Deployment of FastHTML application

📊 Repository stats

⚙️ Managed by GitHub Action: https://github.com/jgehrcke/github-repo-stats
⏳ Configured to run daily at 23:55:00 IST
📬 Checkout daily reports generated: PDF Report
🗳️ Supplementary details regarding stats/reports generated present here

Dataset metadata/citations

van Woesik, R., Burkepile, D. (2022) Bleaching and environmental data for global coral reef sites from 1980-2020. Biological and Chemical Oceanography Data Management Office (BCO-DMO). (Version 2) Version Date 2022-10-14 [if applicable, indicate subset used]. doi:10.26008/1912/bco-dmo.773466.2 [access date]
Terms of Use
This dataset is licensed under Creative Commons Attribution 4.0 (https://creativecommons.org/licenses/by/4.0/)

Codespaces configured

Currently new pre-build images are disabled due to limited storage

Screenshot 2022-08-29 at 3 41 12 PM (2)

Environment setup using conda:

Installing miniconda

Create a conda environment

conda env create -f environment.yml
conda activate journey

Install the package locally

pip install -e .

🔌 About pre-commit-hooks and activating

Just like the name suggests, pre-commit-hooks are designed to format the code based on PEP standards before committing. More details 🗒

pre-commit install

How to run the applications?

Dagster UI

cd analytics_framework/pipeline
dagit -f process.py

Dagit UI output

Panel app

cd analytics_framework/dashboard
python simple_app.py

NOTE: The dashboard generated is exported into HTML format and saved as stock_price_dashboard.html

Panel app output

Mito

Before running the jupyter notebook doc/mito_exp.ipynb, run the below command in your terminal to enable the installer. Might take some time to run.

To explore further visit trymito.io

python -m mitoinstaller install

mito output mito output operation

Display all data sources present via web UI

# Instructions specific to FastHTML app
cd intake/web_ui_fasthtml
python app.py
Link: http://localhost:5001
INFO:     Will watch for changes in these directories: ['../DataJourney/analytics_framework/intake/web_ui_fasthtml']
INFO:     Uvicorn running on http://0.0.0.0:5001 (Press CTRL+C to quit)
INFO:     Started reloader process [20071] using WatchFiles
INFO:     Started server process [20075]
INFO:     Waiting for application startup.
INFO:     Application startup complete.

Screenshot 2024-07-31 at 4 42 44 PM