HOME SOLAR PANEL DATA FROM ETL TO VIZ

MOTIVATION AND CHALLENGE

This is a personal webscrapping project that emulates and end-to-end solution applying data engineering fundamentals and python developing best practices. The goal is to extract, transform, and visualize solar panel data from my family's home system. It is not designed to be replicable.

The overall challenge is to get the historical and current data from the home solar system without a proper API. It is necessary to log in into the website and get to a specific page that enables the API endpoint.

TOOLS

Selenium: Automates the web interaction required for data extraction
Postgres: Database for storing and managing the collected data

TECHNIQUES APPLIED:

OOP Object-Oriented Programming developing;
Data Warehousing and ETL concepts;
Pytest for testing functions;
Logging;

FILE DESCRIPTIONS

The structure is designed to resemble a DW enviroment and focus on prioritizing execution recording
bronze/json_files: Directory stores the extracted json files as a landing zone;
hourly24_production_2024-08-15.json: Json file sample;
missing_dates.csv: CSV file stores the dates that are missing from the json_files directory and writes the dates data was not collected;
silver/csv_files: Directory which the json data transformed is stored as csv;
transformation_status.csv: Logging file to store if transformation was success or not
silver/sql_table_done: Directory that moves the files once they are uploaded to Postgres

It consists of 4 steps:

Missing.py
Cross-checks with the current date to identify missing dates and list them in a CSV file;
Extraction.py
It starts a Webdriver instance, logs into the EmaApp System, navigates to desired page, makes a GET request for daily hourly energy production from the given dates listing in "missing_dates.csv" file;
Transformation.py
Converts Json to csv file, remove unecessary columns, parses data and creates csv files in a processed directory;
Loading.py
Uploads csv file contents from processed directory to Postgres staging table and calls function to insert in the final table, moves the loaded files to a subfolder;
Gather.py Creates a single csv with all the content, for data visualization purposes in Tableau

WORKFLOW

ANALYSIS

The Viz can be accessed in the following link: https://public.tableau.com/app/profile/lucas8230/viz/HOMESOLARPANELPRODUCTION2021-2024/Painel1

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
.venv		.venv
.vscode		.vscode
__pycache__		__pycache__
dags		dags
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
Dockerfile		Dockerfile
README.md		README.md
__init__.py		__init__.py
image-1.png		image-1.png
image-2.png		image-2.png
image-3.png		image-3.png
image-4.png		image-4.png
image.png		image.png
main.py		main.py
missing_days.csv		missing_days.csv
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
test_missing_days.csv		test_missing_days.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HOME SOLAR PANEL DATA FROM ETL TO VIZ

MOTIVATION AND CHALLENGE

TOOLS

TECHNIQUES APPLIED:

FILE DESCRIPTIONS

WORKFLOW

ANALYSIS

About

Releases

Packages

Languages

lksprado/Solar

Folders and files

Latest commit

History

Repository files navigation

HOME SOLAR PANEL DATA FROM ETL TO VIZ

MOTIVATION AND CHALLENGE

TOOLS

TECHNIQUES APPLIED:

FILE DESCRIPTIONS

WORKFLOW

ANALYSIS

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages