Effective log management is crucial for operational excellence in a microservices architecture. This project aims to streamline the collection, storage, and analysis of logs generated by various services, enhancing the ability to quickly track application behavior and identify errors. By capturing relevant metadata alongside each log entry and enabling real-time ingestion and querying, the system improves operational visibility and facilitates proactive responses to potential issues. Ultimately, this distributed logging framework enhances resilience and maintainability in a dynamic application landscape.
- Microservices(Process): Represent distributed nodes that independently generate logs and send heartbeat signals to monitor their status.
- Log Accumulator: Collects log data from each node, structures it, and forwards it to the Pub-Sub model for centralized log management.
- Pub-Sub Model: Acts as a communication layer, facilitating reliable, asynchronous distribution of logs.
- Log Storage: A system for indexing and storing logs in a searchable format for easy access and monitoring.
- Alerting System: Listens for specific log levels (e.g., ERROR, FATAL, etc) in real-time, generating alerts to ensure prompt responses to critical events.
- Heartbeat Mechanism: Provides failure detection by alerting when a node stops sending heartbeats, signaling that the node may have failed.
- Programming Language: Python
- Log Accumulator: Fluentd
- Pub-Sub Model: Apache Kafka
- Log Storage: Elasticsearch
- Visualization: Kibana
- INFO: General system operations information.
- WARN: Indications of potential issues.
- ERROR: Errors that allow continued operation.
- FATAL: Critical issues requiring immediate attention.
Example Structures:
- Microservice Registration Message:
{
"node_id": "<unique node-id>",
"message_type": "REGISTRATION",
"service_name": "PaymentService",
"timestamp": "<timestamp>"
}
- INFO Log:
{
"log_id": "<unique log-id>",
"node_id": "<node-id>",
"log_level": "INFO",
"message_type": "LOG",
"message": "<Log Message>",
"service_name": "<ServiceName>",
"timestamp": "<timestamp>"
}
- WARN Log:
{
"log_id": "<unique log-id>",
"node_id": "<node-id>",
"log_level": "WARN",
"message_type": "LOG",
"message": "",
"service_name": "<ServiceName>",
"response_time_ms": "",
"threshold_limit_ms": "",
"timestamp": "<timestamp>"
}
- ERROR Log:
{
"log_id": "<unique log-id>",
"node_id": "<node-id>",
"log_level": "ERROR",
"message_type": "LOG",
"message": "",
"service_name": "<ServiceName>",
"error_details": {
"error_code": "",
"error_message": ""
},
"timestamp": "<timestamp>"
}
- Heartbeat Message:
{
"node_id": "<node-id>",
"message_type": "HEARTBEAT",
"status": "UP/DOWN",
"timestamp": "<timestamp>"
}
- Microservice Registry:
{
"message_type":"REGISTRATION",
"node_id": "<node-id>",
"service_name": "<ServiceName>",
"status": "UP/DOWN",
"timestamp": "<timestamp>"
}