Table of Contents
Pyflink-poc enables real-time data streaming and processing, seamlessly integrating Apache Flink and Apache Kafka. It empowers users to handle data efficiently with Pandas and asyncio, emphasizing scalability and performance. Ideal for developers seeking lightweight, responsive architectures for large dataset management and concurrent operations in their projects.
Feature | Summary | |
---|---|---|
⚙️ | Architecture |
|
🔩 | Code Quality |
|
📄 | Documentation |
|
🔌 | Integrations |
|
🧩 | Modularity |
|
🧪 | Testing |
|
⚡️ | Performance |
|
🛡️ | Security |
|
📦 | Dependencies |
|
└── pyflink-poc/
├── README.md
├── conf
│ ├── conf.toml
│ └── flink-config.yaml
├── data
│ └── data.csv
├── requirements.txt
├── scripts
│ ├── clean.sh
│ └── run.sh
├── setup
│ └── setup.sh
├── setup.py
└── src
├── alerts_handler.py
├── consumer.py
└── logger.py
PYFLINK-POC/
__root__
requirements.txt - Enables real-time data streaming and processing by integrating Apache Flink and Apache Kafka with asynchronous HTTP requests
- Facilitates efficient data handling and manipulation using Pandas library while leveraging asyncio for concurrent operations
- The codebase emphasizes scalability and performance in handling large datasets through its lightweight and responsive architecture.setup.py - Configures project dependencies and packages for STREAM-ON through setup.py
- Sets up various packages for documentation, style checking, and testing, enhancing the overall project structure and management.
setup
setup.sh - Facilitates setup and configuration of project dependencies and environment variables
- Installs Java 11, Python 3.7, and PyFlink, sets environment variables, and creates aliases for zsh
- Enables seamless integration and execution of PyFlink within the development environment.
scripts
run.sh Initiate Flink cluster, execute PyFlink job, and terminate Flink cluster using the provided run.sh script. clean.sh - Clean script file removes backup files, Python caches, build artifacts, Jupyter notebook checkpoints, and pytest cache from the project directory
- It ensures the project remains clutter-free by deleting unnecessary files and directories to streamline development and maintenance processes.
conf
flink-config.yaml Define Flink cluster configuration parameters in the provided YAML file to ensure optimal resource allocation, fault tolerance, and scalability for distributed data processing. conf.toml Define project-wide configuration constants for Kafka and Flink services in the conf.toml file.
src
alerts_handler.py - Handles the sending of alerts to an API in batches using asyncio and aiohttp
- The code serializes alerts using Apache Avro before sending them to the designated API endpoint
- Additionally, it includes functionality to buffer alerts and send them in batches when a certain threshold is reached.logger.py - The Logger class provides structured logging capabilities for the project, enabling different log levels and colored output
- It enhances the codebase architecture by ensuring effective logging of critical information, warnings, and errors, thereby facilitating debugging and monitoring activities across the system.consumer.py - Implements data stream processing with Apache Flink and Python, orchestrating streaming and batch data comparisons for anomaly detection
- Manages state and fault tolerance through checkpointing and processes flagged records to trigger alerts, enhancing the real-time monitoring system.
Before getting started with pyflink-poc, ensure your runtime environment meets the following requirements:
- Programming Language: Python
- Package Manager: Pip
Install pyflink-poc using one of the following methods:
Build from source:
- Clone the pyflink-poc repository:
❯ git clone https://github.com/eli64s/pyflink-poc
- Navigate to the project directory:
❯ cd pyflink-poc
- Install the project dependencies:
❯ pip install -r requirements.txt
Run pyflink-poc using the following command:
Using pip
❯ python {entrypoint}
Run the test suite using the following command:
Using pip
❯ pytest
-
Task 1
:Implement feature one. -
Task 2
: Implement feature two. -
Task 3
: Implement feature three.
- 💬 Join the Discussions: Share your insights, provide feedback, or ask questions.
- 🐛 Report Issues: Submit bugs found or log feature requests for the
pyflink-poc
project. - 💡 Submit Pull Requests: Review open PRs, and submit your own PRs.
Contributing Guidelines
- Fork the Repository: Start by forking the project repository to your github account.
- Clone Locally: Clone the forked repository to your local machine using a git client.
git clone https://github.com/eli64s/pyflink-poc
- Create a New Branch: Always work on a new branch, giving it a descriptive name.
git checkout -b new-feature-x
- Make Your Changes: Develop and test your changes locally.
- Commit Your Changes: Commit with a clear message describing your updates.
git commit -m 'Implemented new feature x.'
- Push to github: Push the changes to your forked repository.
git push origin new-feature-x
- Submit a Pull Request: Create a PR against the original project repository. Clearly describe the changes and their motivations.
- Review: Once your PR is reviewed and approved, it will be merged into the main branch. Congratulations on your contribution!
This project is protected under the SELECT-A-LICENSE License. For more details, refer to the LICENSE file.
- List any resources, contributors, inspiration, etc. here.