Skip to content

jacobeturpin/aws-nlp-data-pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

AWS NLP Data Pipeline

Ingest real-time streaming text data with automatic appending of NLP metadata

Architecture Kibana Dashboard

Overview

This project represents a mostly serverless data engineering architecture for ingesting real-time streaming data and automatically appending NLP metadata via managed AWS services. The project may serve as a baseline for implementing complex ingestion pipelines powering NLP services.

The following AWS services are leveraged:

Deployment

This project leverages GitHub Actions for its CI/CD pipeline. If forking, you can deploy via your own Actions by providing the following Secrets in your repository:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_REGION_ID
  • IP_ADDRESS

Example

A dataset for demonstration purposes has been provided. Use the following script to send example data to the Ingest Lambda for processing.

python stream.py

Releases

No releases published

Packages

No packages published

Languages