Skip to content

Automating Data Scrapers and Analytical Processes using Apache Airflow.

Notifications You must be signed in to change notification settings

Mutai-Gilbert/Automating-Data-Scrapers-Apache-Airflow

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automating Data Scrapers and Analytical Processes using Apache Airflow

Overview

This project aims to automate the process of scraping house listing data from buyrentkenya.com (or any other website of your choice) using Apache Airflow. The project focuses on workflow automation and scheduling, utilizing the following tools:

  • Apache Airflow
  • Python
  • Pandas
  • Numpy

Installation

  1. Install Apache Airflow following the official installation instructions.
  2. Clone this repository:
    git clone https://github.com/Mutai-Gilbert/Automating-Data-Scrapers-Apache-Airflow

Install the required Python packages:

 pip install pandas numpy

Usage:

  • Configure Apache Airflow according to your environment.
  • Copy the DAG (Directed Acyclic Graph) file from the dags directory to your Apache Airflow DAGs folder.
  • Start the Apache Airflow scheduler and web server.
  • Access the Apache Airflow web interface and enable the DAG.
  • Trigger the DAG to start the data scraping and analytical processes.
  • DAG Structure
  • The DAG is structured to perform the following tasks:

DAG Structure

  1. The DAG is structured to perform the following tasks:
  • Scrape house listing data from buyrentkenya.com (or another website).
  • Preprocess the scraped data using Pandas and Numpy.
  • Perform analytical processes on the preprocessed data.
  • Store the results in a desired format (e.g., CSV, database).
  • Customization
  • You can customize the scraping process by modifying the URL or the website to scrape.
  • Modify the preprocessing and analytical processes according to your requirements.
  • Adjust the schedule and frequency of the DAG to meet your needs.
  • Contributions
  • Contributions to this project are welcome. Feel free to fork the repository and submit pull requests with your improvements.

License This project is licensed under the MIT License. See the LICENSE file for more information.

Feel free to customize this README file further based on your specific project details and requirements.

About

Automating Data Scrapers and Analytical Processes using Apache Airflow.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published