This project connects to the OpenSEA stream (non-REST) API to steam NFT transaction data to a kafka topic.
- OpenSEA Kafka Producer (Node.js) [This repo]
- Kafka and Jupyer-Spark Docker Stack
- Kafka Streams App (Java)
- Azure Databricks Data Lakehouse
Setup of goals, basic project structure, timeline, virtual environments and version control.
Get all componented connected and the entire pipeline running in whatever form as quick as possible.
All main features implemented. Design Freeze.
Start implementation of refactoring, OOP and modular programming. Build infrastructure for health-checks, cybersecurity, exception and logging if not already implemented.
Assessment of 'Data Product' requirements, design and planning of data transformations required at each point of the pipeline to deliver the required 'Data Product' to the consumer.
Exploration of other data sources, and new data products from these new data sources. ie: X/TikTok sentiment data sources and delivering a sentiment analysis front-end.
Patches and red-teaming on integrity of data pipeline. Cost-benefit analysis of version upgrades.
- Joe Reis and Matt Housley (202), Fundamentals of Data Engineering. O’Reilly Media Inc.
- OpenSEA stream API: https://docs.opensea.io/reference/stream-api-overview
- kafkajs: https://github.com/tulios/kafkajs
- ChatGPT: https://chatgpt.com/share/5e2508d4-cece-4ec4-a80a-df81a7251435