A small POC project to showcase instrumentation of a Net 6+ applications with OpenTelemetry. This is an educational experiment and is not meant to represent a production-ready environment. Some of this information is a little challenging to research and piece together. I thought this POC might be helpful as documentation for me and for those trying to get a jump start.
There are many ways to instrument this tooling. I will update this README.md to describe the overall approach as it evolves.
- Net 6 Web Api w/Controllers
- Docker Compose
- System.Diagnostics (uses open telemetry standards)
- Serilog w/OpenTelemetry Sink
- OpenTelemetry .net packages
- OpenTelemetry Collector
- Prometheus (Metrics)
- Jaeger (Tracing)
- Loki (Logging)
- Grafana
- Influx Db (for Grafana)
- Vendor agnostic instrumentation through use of an Otel collector gateway and otel exporters on api
- Simulated api response time delays for variation
- Simulated errors with random exceptions thrown (500 status code) for variation
- Api filters to enrich logs and metrics with route, class, class method, and http status code labels to use for querying
- Signal correlation with grafana -- see: Grafana Section
- (FIXED) - Experiencing delayed metrics export with otel exporter compared to Prometheus exporter. See Stack overflow: https://stackoverflow.com/questions/75552005/opentelemetry-net-application-metrics-collected-slowly-by-collector
- Continue improving signal correlation in Grafana
- Instrument Zipkin to compare to Jaeger
- Instrument Elk Stack to compare to Loki
- Add load testing tooling to simulate a constant load for better demonstration
- Add a collector agent
- Improve metric and tracing code in application
- Try adding adding another service and something like Kafka to demonstrate trace propagation across services
- From Visual Studio: Select and run project with Docker Compose Startup project in Visual Studio
- From CLI: docker compose up
- Swagger: http://localhost:5000/swagger/index.html
- Prometheus: http://localhost:9090/graph
- Collector Prometheus Export Endpont: http://localhost:8889
- Jaeger: http://localhost:16686/search
- Grafana: http://localhost:3000
-
Navigate to
http://localhost:3000
and login with admin/admin -
Add prometheus datasource with host:
http://poc-prometheus:9090
-
Add Loki datasource with host:
http://poc-loki:3100
- To create a link correlation to the trace, add a derived field with the following values:
- Name:
trace_id
- Regex:
traceID=([\w\d]+)
- Query:
${__value.raw}
- Url Label:
Trace
-
Add Jaeger datasource with host: http://poc-jaeger:16686
- To create a link correlation to the log, in the
Trace to Logs
section use the following values:- Data source:
Loki
- Map tag names:
enabled
- Add a Tag:
environment
- Span start time shift:
-1h
- Span end time shift:
1h
- Filter by Trace ID:
enabled
- Data source:
- To create a link correlation to the log, in the
-
Set the Jager datasource to scrape every
1s
for best demo effect -
Go to the folder
/telemetry/grafana
and copy contents ofpoc-dashboard.json
-
Create a new dashboard with "Import" and paste the dashboard json
- Select the datasources you set up previously during the import process
- Ability to filter metrics by metric tag variables.
- Ability to correlate Metrics and Traces by time range, route, or status code.
- Traces will not display if multiple values are selected on a given variable.
- Make sure to only select a single value per variable for now if you want to see traces.