This project automates the extraction, transformation, and loading (ETL) process of weather data from the OpenWeatherMap API into a MySQL database using Python, leveraging Apache Airflow for the scheduling of data fetches and Docker for orchestration. Designed to retrieve weather updates for user-configured locations at regular intervals, it processes the data into a structured format and updates a MySQL database every 15 minutes with updated weather forecast for 24 hours ahead. This project integrates a user-friendly interface through the questionary
Python module, abstracting a technical side of the implementation. By incorporating modern technologies and practices such as Docker, Apache Airflow, and programmatic MySQL access, the project stands as a robust example of practical ETL processes in the field of data engineering.
- Automated weather data fetch from OpenWeatherMap API.
- Data transformation and normalization for database storage.
- Scheduled data updates using Apache Airflow.
- MySQL database integration for data persistence.
- Interactive menu for weather data locations management.
- (more to come?)
- Docker
- Docker Compose
- Python 3.8+
- Clone the repository:
git clone https://github.com/JohanTokarskij/OpenWeatherMapAPI-ETL-Docker-Airflow-MySQL-Python
- Navigate to the project directory:
cd [project directory]
- To ensure proper isolation of project dependencies, it is required to create and activate a virtual environment before running this app. Follow these steps:
- Create a new virtual environment:
python -m venv venv
- Activate the virtual environment:
venv\Scripts\activate
on Windows orsource venv/bin/activate
on Linux/macOS
- Install the required dependencies:
pip install -r requirements.txt
- Run
docker-compose up -d
in terminal to download and start the containers. - Run
python setup_mysql.py
to set up MySQL. Proceed to the next step after getting confirmatinon 'Initialization of MySQL is complete'. - Run
python setup_airflow.py
to set up Apache Airflow. After getting confirmation 'Initialization of airflow-webserver is complete' and 'Initialization of airflow-webscheduler is complete' the initial setup is done.
- Navigate to
http://localhost:8080/
. - Login with
admin:password
. - Start the
update_weather_tables.py
DAG to enable (Airflow is confidured to fetch new data for each table every 15 minutes. Adjust paramschedule_interval
inupdate_weather_tables.py
for other needs).
You need to have an API key to be able to use OpenWeatherMaps. Register at https://openweathermap.org/ to obtain it. Add the API key to a file named .env
in the project's root folder. Use the following format: API_KEY='your-api-key'
Run the application by executing the 'app.py' script: python app.py
Follow the on-screen prompts to interact with the application. You can:
- Add new locations & Schedule Airflow: Provide the name of the location you want to track.
- View registered locations in the Database: See a list of all locations currently being monitored.
- Remove location(s) from the Database: Delete locations you no longer wish to track.
All fetched and transformed weather data is stored in the weather_db
database. Each location's data is stored in a distinct table named after the location. The _location_id_table
is a key table in the database, uniquely mapping each location to its ID, name, and geographical coordinates (latitude and longitude).
The database is accessible via MySQL:
- Host:
localhost
- Port:
3306
- User:
root
- Password:
password
This project serves as a practical demonstration of ETL (Extract, Transform, Load) processes, Python scripting, Airflow scheduling, Docker orchestration and programmatically accessing a database. Developed primarily for educational purposes, it illustrates key data engineering techniques through the automated fetching, processing, and storage of weather data into a MySQL database.