Skip to content

This project collects daily temperature data for Los Angeles from the Open-Meteo API for April through May 2024

Notifications You must be signed in to change notification settings

bsrikanth24/la-temperature-collection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Los Angeles Daily Temperature Collection Project

Description

This project collects daily temperature data for Los Angeles from the Open-Meteo API for April through May 2024 and ingests it into an AWS data pipeline. The data is processed and stored in AWS S3, transformed and cleaned using AWS Glue, and made available for querying in AWS Athena. A Grafana dashboard is created to visualize the data, providing insights.

Table of Contents

  1. Prerequisites
  2. Architecture
  3. Data Flow
  4. Setup
  5. AWS Services Used
  6. Steps
  7. Visualization
  8. Troubleshooting
  9. Acknowledgements

1. Prerequisites

2. Architecture

LA_Temperature_de_project

3. Data Flow

  1. Data Ingestion: A Lambda function ingests weather data from the Open-Meteo API and sends it to a Kinesis Data Firehose stream.
  2. Data Storage: Kinesis Data Firehose delivers the data to an S3 bucket.
  3. Data Crawling: AWS Glue crawls the data in S3 to create a table in the AWS Glue Data Catalog.
  4. Data Transformation: AWS Glue jobs transform the data, perform data quality checks, and save the cleaned data as Parquet files in S3.
  5. Data Querying: The transformed data is available for querying in AWS Athena.
  6. Data Visualization: Grafana is used to build a dashboard for visualizing the data.

4. AWS Services Used

  • AWS Lambda: To run the function that ingests data from the Open-Meteo API.
  • AWS Kinesis Data Firehose: To deliver the ingested data to S3.
  • AWS S3: To store raw and transformed data.
  • AWS Glue: To crawl, transform, and clean the data.
  • AWS Athena: To query the transformed data.
  • Grafana: To visualize the data.

5. Setup

  1. AWS Lambda: Deploy the LA_weather_lambda_put_record_batch.py Lambda function in the lambda/ directory using the AWS Lambda Console or CLI.
  2. AWS Kinesis Data Firehose: Create a Kinesis Data Firehose delivery stream to deliver data to your S3 bucket.
    • Example configuration:
      • Source: Direct PUT or other sources
      • Destination: S3 bucket
      • S3 bucket ARN: arn:aws:s3:::your-bucket-name
  3. AWS Glue:
    • Create a Glue Crawler to crawl the data in your S3 bucket and create a Glue Data Catalog table.
    • Create and run Glue jobs using the scripts in the glue/ directory to transform data and perform data quality checks.
  4. AWS Athena: Configure Athena to query the data stored in your S3 bucket.
  5. Grafana: Set up Grafana to visualize the data.

6. Pipeline

  1. Trigger the Lambda function to start data ingestion.
  2. Verify that the data is being delivered to your S3 bucket via Kinesis Data Firehose.
  3. Run the Glue crawler to update the Glue Data Catalog.
  4. Execute Glue jobs to transform and clean the data.
  • glue_orchestration
  1. Query the transformed data in Athena to verify the data quality and structure.
  • query
  • result
  1. Use Grafana to visualize the data.
  • grafana query

7. Visualization

visualization

8. Troubleshooting

  • Lambda Function Errors:

    • Check CloudWatch logs for detailed error messages.
    • Verify the IAM role has the necessary permissions.
  • Kinesis Data Firehose Issues:

    • Ensure the Firehose stream is properly configured with the correct S3 bucket.
    • Check Firehose monitoring metrics for delivery failures.
  • Glue Job Failures:

    • Review Glue job logs for errors.
    • Ensure the Glue job script paths and S3 bucket permissions are correct.
  • Athena Query Problems:

    • Verify the Glue Data Catalog table is correctly configured.
    • Check for syntax errors in your SQL queries.

9. Acknowledgements

Special thanks to: David Freitag for his course on Maven: Build Your First Serverless Data Engineering Project

Data source: Weather Data Open Meteo API

About

This project collects daily temperature data for Los Angeles from the Open-Meteo API for April through May 2024

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 100.0%