Skip to content

A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations

Notifications You must be signed in to change notification settings

bhavanachitragar/zillow-data-analytics

Repository files navigation

Zillow Data Analytics using AWS

Architecture drawio (1)

This architecture leverages:

  • Airflow: For scheduling and orchestration of the data pipeline tasks.
  • EC2: For running the Python scripts for data extraction and transformation.
  • Lambda Functions: For serverless, triggered processing of data transfer between S3 buckets.
  • S3: For storing data at various stages of the pipeline.
  • Redshift: For efficient data warehousing and analytics.
  • QuickSight: For data visualization and exploration.

Steps included:

  1. Python Script: Extracts data from Zillow in JSON format and stores it in an S3 bucket.
  2. S3 Bucket (Staging): Stores the initial extracted JSON data.
  3. AWS Lambda Function 1 (Data Transfer): Triggers upon new data in the staging S3 bucket and copies the JSON data to a destination S3 bucket.
  4. S3 Bucket (Processing): Holds the JSON data ready for further processing.
  5. AWS Lambda Function 2 (Data Transformation): Triggers upon new data in the processing S3 bucket, reads the JSON data, converts it to CSV format, and stores the CSV data in a designated S3 bucket.
  6. S3 Bucket (Transformed Data): Stores the final processed data in CSV format.
  7. Amazon Redshift: Stores the CSV data from the transformed data S3 bucket for efficient data warehousing and analytics.
  8. Amazon QuickSight: Connects to the Redshift data warehouse to visualize and analyze the Zillow data.

Airflow

DAG View

Screenshot 2024-06-10 114256

Redshift

Transformed data is loaded into Amazon Redshift

Screenshot 2024-06-10 105008

Quicksight

Creating visualizations and dashboards from data sources

Screenshot 2024-06-10 123457


Guided by: Opeyemi Olanipekun

About

A Python script extracts data from Zillow and stores it in an initial S3 bucket. Then, Lambda functions handle the flow: copying the data to a processing bucket and transforming it from JSON to CSV format. The final CSV data resides in another S3 bucket, ready to be loaded into Amazon Redshift for in-depth analysis. QuickSight for visualizations

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages