Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitoring, tracing, observability in DevOps #8

Open
monperrus opened this issue Jul 2, 2018 · 70 comments
Open

Monitoring, tracing, observability in DevOps #8

monperrus opened this issue Jul 2, 2018 · 70 comments
Labels
topic DevOps relevant topics

Comments

@monperrus
Copy link
Member Author

See also icinga (thanks to @henriklb for the suggestion)

@monperrus monperrus changed the title Infrastructure monitoring Infrastructure & application monitoring Sep 18, 2018
@monperrus
Copy link
Member Author

@MatsJonsson
Copy link

We've found Istio ( https://istio.io/ ) to be increasingly useful in this context. KubeSpy ( https://github.com/pulumi/kubespy )is an excellent tool for troubleshooting and diagnosing Kubernetes deployments.

@lsc
Copy link

lsc commented Oct 11, 2018

@MatsJonsson
Copy link

+1 for Prometheus

@bittermandel
Copy link

Sentry for Error Reporting. https://sentry.io/welcome/

@monperrus monperrus changed the title Infrastructure & application monitoring Infrastructure & application monitoring, distributed tracing Oct 26, 2018
@monperrus
Copy link
Member Author

monperrus commented Oct 26, 2018

@monperrus
Copy link
Member Author

See also Runtime application self-protection #18 (comment)

@monperrus
Copy link
Member Author

Analytics

@monperrus
Copy link
Member Author

Tools and Benchmarks for Automated Log Parsing.
http://arxiv.org/abs/1811.03509

@monperrus
Copy link
Member Author

Does the Fault Reside in a Stack Trace? Assisting Crash Localization by Predicting Crashing Fault Residence
https://www.sciencedirect.com/science/article/pii/S0164121218302401

@monperrus
Copy link
Member Author

Having good dashboards is essential in DevOps, see Kibana, etc.

@monperrus
Copy link
Member Author

Made in Alibaba: https://github.com/alibaba/Sentinel

@monperrus
Copy link
Member Author

monperrus commented Feb 22, 2019

JVM Profiler Sending Metrics to Kafka (https://kafka.apache.org/), Console Output or Custom Reporter
https://github.com/uber-common/jvm-profiler

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

Time-series database to store monitoring data
https://en.wikipedia.org/wiki/Time_series_database

@monperrus
Copy link
Member Author

Prometheus - Monitoring system & time series database
https://prometheus.io/

@monperrus
Copy link
Member Author

Netflix Zuul is a gateway service that provides dynamic routing, monitoring, resiliency, security, and more.
https://github.com/Netflix/zuul

@monperrus
Copy link
Member Author

OpenTracing
https://opentracing.io/

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

Sensu is a free and open source monitoring that handles cloud environments. Sensu allows you to monitor servers, services, application health, and business KPIs.
https://xebialabs.com/technology/sensu/

@bbaudry
Copy link
Collaborator

bbaudry commented Mar 5, 2019

Provenance analysis tools

@monperrus
Copy link
Member Author

Framework for instruction-level tracing and analysis of program executions
http://static.usenix.org/event/vee06/full_papers/p154-bhansali.pdf

@monperrus
Copy link
Member Author

DevOps Metrics
https://queue.acm.org/detail.cfm?id=3182626

@monperrus
Copy link
Member Author

Dapper, a large-scale distributed systems tracing infrastructure at Google
http://research.google.com/pubs/pub36356.html

@monperrus
Copy link
Member Author

Contemporary Software Monitoring: A Systematic Literature Review
https://arxiv.org/abs/1912.05878

@gluckzhang
Copy link
Collaborator

A curated list of Chaos Engineering resources.
https://github.com/dastergon/awesome-chaos-engineering/

@gluckzhang
Copy link
Collaborator

Gartner anticipates that 40% of organizations will implement chaos engineering practices as part of DevOps initiatives by 2023, reducing unplanned downtime by 20%.

https://www.gartner.com/smarterwithgartner/the-io-leaders-guide-to-chaos-engineering/

@monperrus
Copy link
Member Author

Contemporary Software Monitoring: A Systematic Mapping Study.
http://arxiv.org/pdf/1912.05878

@monperrus
Copy link
Member Author

Cilium - eBPF-based Networking, Observability, and Security
Cilium's control plane is highly optimized, running in Kubernetes clusters of up to 5K nodes and 100K pod
https://cilium.io/

@monperrus
Copy link
Member Author

Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. Can be used for monitoring events. Can be bridged with MQTT.
https://aws.amazon.com/kinesis/data-streams/

@monperrus
Copy link
Member Author

Micrometer provides a simple facade over the instrumentation clients for the most popular monitoring systems, allowing you to instrument your JVM-based application code without vendor lock-in. Think SLF4J, but for metrics.

Can be used to feed Prometheus.

https://micrometer.io/

@gluckzhang
Copy link
Collaborator

Prometheus client libraries (including both official ones and many third-party ones) can be found here: https://prometheus.io/docs/instrumenting/clientlibs/

@monperrus
Copy link
Member Author

Paper: "Enjoy your observability: an industrial survey of microservice tracing and analysis" http://link.springer.com/10.1007/s10664-021-10063-9

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

@bbaudry
Copy link
Collaborator

bbaudry commented Apr 20, 2022

Sampler is a tool for shell commands execution, visualization and alerting.
Configured with a simple YAML file.
https://sampler.dev/

@monperrus
Copy link
Member Author

Stagemonitor is a Java monitoring agent that tightly integrates with time series databases like Elasticsearch, Graphite and InfluxDB to analyze graphed metrics and Kibana to analyze requests and call stacks

https://github.com/stagemonitor/stagemonitor

cc/ @gluckzhang

@monperrus
Copy link
Member Author

@bbaudry
Copy link
Collaborator

bbaudry commented May 6, 2022

@bbaudry
Copy link
Collaborator

bbaudry commented May 6, 2022

@bbaudry
Copy link
Collaborator

bbaudry commented May 10, 2022

Zabbix open source monitoring solution for network monitoring and application monitoring of millions of metrics.
https://www.zabbix.com/

@monperrus
Copy link
Member Author

strace is a diagnostic, debugging and instructional userspace utility for Linux. It is used to monitor and tamper with interactions between processes and the Linux kernel, which include system calls, signal deliveries, and changes of process state.
https://strace.io/

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

Let's Trace It: Fine-Grained Serverless Benchmarking using Synchronous and Asynchronous Orchestrated Applications
https://arxiv.org/pdf/2205.07696.pdf

@monperrus
Copy link
Member Author

Open Tracing Tools: Overview and Critical Comparison
https://arxiv.org/pdf/2207.06875.pdf

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

Lessons Learned Building a Global Synthetic Monitoring System
Talk at SREcon
https://www.usenix.org/conference/srecon22apac/presentation/sidh

@monperrus
Copy link
Member Author

Elastic Observability
https://www.elastic.co/observability

@monperrus
Copy link
Member Author

@monperrus
Copy link
Member Author

@bbaudry
Copy link
Collaborator

bbaudry commented May 21, 2023

A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
https://github.com/upgundecha/howtheysre

@bbaudry
Copy link
Collaborator

bbaudry commented May 22, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic DevOps relevant topics
Projects
None yet
Development

No branches or pull requests

7 participants