Watchdog is a powerful ETL pipeline designed to track subdomains of specified domains in real-time. The goal of this project is to identify new subdomains as soon as they are discovered and alert the user immediately. This is achieved through efficient subdomain generation using multiprocessing, seamless and reliable data streaming with Kafka, flexible and scalable management of subdomains with MongoDB, advanced subdomain processing with PySpark, and effective workflow management and task coordination with Airflow. With the addition of the Telegram Notification feature, Watchdog provides real-time alerts and quick response to potential security threats. This project is ideal for security professionals, system administrators, and anyone who needs to monitor subdomains of specified domains in real-time.
- Efficient Subdomain Generation: Watchdog leverages multiprocessing to generate subdomains quickly and accurately, optimizing performance.
- Real-time Streaming: The pipeline integrates Kafka to provide seamless and reliable data streaming, ensuring up-to-date information.
- Scalable Storage: Watchdog utilizes MongoDB as its storage solution, enabling flexible and scalable management of subdomains.
- Advanced Subdomain Processing and Security Scanning: With the power of PySpark, Watchdog efficiently processes and analyzes subdomains, allowing for sophisticated data manipulation.Watchdog also offers a powerful subdomain scanning capability, This feature also allows for a more comprehensive understanding of the subdomains and their associated IP addresses, which can be useful for identifying potential security threats.
- Robust Orchestration: Watchdog employs Airflow for effective workflow management and task coordination, ensuring smooth execution.
- Telegram Notification: Watchdog supports sending notifications to a Telegram channel or group when a new subdomain is found. This feature allows for real-time alerts and quick response to potential security threats.
- Apache Airflow - Workflow management and task scheduling.
- Apache Spark - Fast and distributed data processing.
- Apache Kafka - Distributed streaming platform.
- MongoDB - Scalable NoSQL database.
- Kafka producer: sends subdomains to the specified Kafka topic.
- Kafka consumer: Spark Streaming consumer to consume subdomains and store them in MongoDB.
- MongoDB: Checking the MongoDB collection snapshot to see the subdomains that have been tracked.
- Airflow: The Airflow DAG logs show the status and progress of the ETL pipeline.
This is an example of how you may give instructions on setting up your WatchDog locally. To get a local copy up and running follow these simple example steps.
Before you can use this project, you'll need to have the following installed on your machine:
- Python above 3.10
- Docker
- Docker Compose
- Airflow
If you don't have these installed, you can follow the installation instructions for each tool:
Once you have these tools installed, you'll be ready to use this project.
- Clone the repo
git clone https://github.com/AmirAflak/WatchDog.git
- Navigate to the project directory:
cd WatchDog/
- Set targets in configs.py:
TARGETS=['caterpillar.com', 'url.com']
- Install the required packages:
make install
- Initialize Docker Compose:
make docker
- Initialize the Spark streaming consumer:
make consumer
- Initialize the Airflow scheduler:
make scheduler
- Initialize the Airflow webserver GUI:
make webserver
- To stop the Docker Compose containers, run:
make stop
That's it! You should now be able to use the project.
Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.
If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!
- Fork the Project
- Create your Feature Branch (
git checkout -b feature/AmazingFeature
) - Commit your Changes (
git commit -m 'Add some AmazingFeature'
) - Push to the Branch (
git push origin feature/AmazingFeature
) - Open a Pull Request
Distributed under the MIT License. See LICENSE.txt
for more information.