Skip to content

Commit

Permalink
feat(apps): export logs to open telemetry endpoint (#1617)
Browse files Browse the repository at this point in the history
<!--- Provide a general summary of your changes in the Title above -->

## Description

<!--- Describe your changes in detail -->

Notes:
* Added loki as a way to view logs in OTLP format locally
* Confirmed logs are exported
* Continue using Serilog, using Serilog open telemetry sink if open
telemetry configuration is found, both for bootstrap logger and final
logger. Will use Console exporter if configuration is not found
* Moved configuration of open telemetry from the aspnet package to the
respective services that use them directly.
* Removed fusioncache tracing for `Service`

## Related Issue(s)

- #1616

## Verification

- [ ] **Your** code builds clean without any errors or warnings
- [ ] Manual testing done (required)
- [ ] Relevant automated test added (if you find this hard, leave it and
we'll help out)

## Documentation

- [ ] Documentation is updated (either in `docs`-directory, Altinnpedia
or a separate linked PR in
[altinn-studio-docs.](https://github.com/Altinn/altinn-studio-docs), if
applicable)

---------

Co-authored-by: Magnus Sandgren <5285192+MagnusSandgren@users.noreply.github.com>
Co-authored-by: Knut Haug <knut.espen.haug@digdir.no>
  • Loading branch information
3 people authored Jan 13, 2025
1 parent c9cd6a7 commit 1a71763
Show file tree
Hide file tree
Showing 17 changed files with 692 additions and 237 deletions.
66 changes: 62 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -126,11 +126,12 @@ These health checks are integrated with Azure Container Apps' health probe syste

## Observability with OpenTelemetry

This project uses OpenTelemetry for distributed tracing and metrics collection. The setup includes:
This project uses OpenTelemetry for distributed tracing, metrics collection, and logging. The setup includes:

### Core Features
- Distributed tracing across services
- Runtime and application metrics
- Log aggregation and correlation
- Integration with Azure Monitor/Application Insights
- Support for both OTLP and Azure Monitor exporters
- Automatic instrumentation for:
Expand All @@ -157,15 +158,72 @@ OpenTelemetry is configured through environment variables that are automatically
### Local Development

For local development, the project includes a docker-compose setup with:
- OpenTelemetry Collector
- Grafana
- Other supporting services
- OpenTelemetry Collector (ports 4317/4318 for OTLP receivers)
- Grafana (port 3000)
- Jaeger (port 16686)
- Loki (port 3100)
- Prometheus (port 9090)

To run the local observability stack:
```bash
podman compose -f docker-compose-otel.yml up
```

### Accessing Observability Tools

Once the local stack is running, you can access the following tools:

#### Distributed Tracing with Jaeger
- URL: http://localhost:16686
- Features:
- View distributed traces across services
- Search by service, operation, or trace ID
- Analyze timing and dependencies
- Debug request flows and errors

#### Metrics with Prometheus
- URL: http://localhost:9090
- Features:
- Query raw metrics data
- View metric targets and service discovery
- Debug metric collection

#### Log Aggregation with Loki
- Direct URL: http://localhost:3100
- Grafana Integration: http://localhost:3000 (preferred interface)
- Features:
- Search and filter logs across all services
- Correlate logs with traces using trace IDs
- Create log-based alerts and dashboards
- Use LogQL to query logs:
```logql
# Example: Find all error logs
{container="web-api"} |= "error"
# Example: Find logs with specific trace ID
{container=~"web-api|graphql"} |~ "trace_id=([a-f0-9]{32})"
```
#### Metrics and Dashboards in Grafana
- URL: http://localhost:3000
- Features:
- Pre-configured dashboards for:
- Application metrics
- Runtime metrics
- HTTP request metrics
- Data sources:
- Prometheus (metrics)
- Loki (logs)
- Jaeger (traces)
- Create custom dashboards
- Set up alerts
#### OpenTelemetry Collector Endpoints
- OTLP gRPC receiver: localhost:4317
- OTLP HTTP receiver: localhost:4318
- Prometheus metrics: localhost:8888
- Prometheus exporter metrics: localhost:8889
### Request Filtering
The telemetry setup includes smart filtering to:
Expand Down
23 changes: 23 additions & 0 deletions docker-compose-otel.yml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,12 @@ services:
- "14250:14250" # Model used by collector
environment:
- COLLECTOR_OTLP_ENABLED=true
healthcheck:
test: ["CMD", "wget", "--spider", "-q", "localhost:16686"]
interval: 3s
timeout: 3s
retries: 10
start_period: 10s

# Prometheus for metrics
prometheus:
Expand All @@ -31,6 +37,21 @@ services:
ports:
- "9090:9090"

# Loki for log aggregation
loki:
image: grafana/loki:3.2.2
ports:
- "3100:3100"
volumes:
- ./local-otel-configuration/loki-config.yaml:/etc/loki/local-config.yaml
command: -config.file=/etc/loki/local-config.yaml
healthcheck:
test: ["CMD-SHELL", "wget -q --tries=1 -O- http://localhost:3100/ready"]
interval: 3s
timeout: 3s
retries: 10
start_period: 10s

# Grafana for metrics visualization
grafana:
image: grafana/grafana:11.4.0
Expand All @@ -43,3 +64,5 @@ services:
- ./local-otel-configuration/grafana-datasources.yml:/etc/grafana/provisioning/datasources/datasources.yml
- ./local-otel-configuration/grafana-dashboards.yml:/etc/grafana/provisioning/dashboards/dashboards.yml
- ./local-otel-configuration/dashboards:/etc/grafana/provisioning/dashboards
depends_on:
- loki
50 changes: 23 additions & 27 deletions local-otel-configuration/dashboards/runtime-metrics.json
Original file line number Diff line number Diff line change
Expand Up @@ -85,13 +85,19 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "dialogporten_process_runtime_dotnet_gc_heap_size_bytes",
"legendFormat": "Heap Size",
"refId": "A"
},
{
"expr": "dialogporten_process_runtime_dotnet_gc_committed_memory_size_bytes",
"legendFormat": "Committed Memory",
"refId": "B"
},
{
"expr": "dialogporten_dotnet_process_memory_working_set_bytes",
"legendFormat": "Working Set",
"refId": "C"
}
]
},
Expand Down Expand Up @@ -171,13 +177,14 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "rate(dialogporten_process_runtime_dotnet_gc_collections_count_total[5m])",
"legendFormat": "Gen {{generation}}",
"legendFormat": "Collections/sec",
"refId": "A"
},
{
"expr": "rate(dialogporten_process_runtime_dotnet_gc_duration_nanoseconds_total[5m])",
"legendFormat": "GC Duration/sec",
"refId": "B"
}
]
},
Expand Down Expand Up @@ -257,22 +264,19 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "dialogporten_process_runtime_dotnet_thread_pool_queue_length",
"legendFormat": "Queue Length",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "dialogporten_process_runtime_dotnet_thread_pool_threads_count",
"legendFormat": "Thread Count",
"refId": "B"
},
{
"expr": "rate(dialogporten_process_runtime_dotnet_thread_pool_completed_items_count_total[5m])",
"legendFormat": "Completed Items/sec",
"refId": "C"
}
]
},
Expand Down Expand Up @@ -352,20 +356,12 @@
"type": "timeseries",
"targets": [
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "rate(dialogporten_process_runtime_dotnet_exceptions_count_total[$__rate_interval])",
"expr": "rate(dialogporten_process_runtime_dotnet_exceptions_count_total[5m])",
"legendFormat": "Exceptions/sec",
"refId": "A"
},
{
"datasource": {
"type": "prometheus",
"uid": "Prometheus"
},
"expr": "rate(dialogporten_process_runtime_dotnet_monitor_lock_contention_count_total[$__rate_interval])",
"expr": "rate(dialogporten_process_runtime_dotnet_monitor_lock_contention_count_total[5m])",
"legendFormat": "Lock Contentions/sec",
"refId": "B"
}
Expand Down
9 changes: 8 additions & 1 deletion local-otel-configuration/grafana-datasources.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,4 +5,11 @@ datasources:
type: prometheus
access: proxy
url: http://prometheus:9090
isDefault: true
isDefault: true

- name: Loki
type: loki
access: proxy
url: http://loki:3100
jsonData:
maxLines: 1000
45 changes: 45 additions & 0 deletions local-otel-configuration/loki-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
auth_enabled: false

server:
http_listen_port: 3100

common:
path_prefix: /tmp/loki

compactor:
working_directory: /tmp/loki/compactor
compaction_interval: 10m

ingester:
lifecycler:
address: 127.0.0.1
ring:
kvstore:
store: inmemory
replication_factor: 1
final_sleep: 0s
chunk_idle_period: 5m
chunk_retain_period: 30s

schema_config:
configs:
- from: 2020-10-24
store: tsdb
object_store: filesystem
schema: v13
index:
prefix: index_
period: 24h

storage_config:
tsdb_shipper:
active_index_directory: /tmp/loki/tsdb-index
cache_location: /tmp/loki/tsdb-cache
cache_ttl: 24h
filesystem:
directory: /tmp/loki/chunks

limits_config:
reject_old_samples: true
reject_old_samples_max_age: 168h
allow_structured_metadata: true
4 changes: 3 additions & 1 deletion local-otel-configuration/otel-collector-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ exporters:
verbosity: detailed
sampling_initial: 5
sampling_thereafter: 200
otlphttp:
endpoint: "http://loki:3100/otlp"

extensions:
health_check:
Expand All @@ -49,4 +51,4 @@ service:
logs:
receivers: [otlp]
processors: [batch]
exporters: [debug]
exporters: [otlphttp, debug]
Loading

0 comments on commit 1a71763

Please sign in to comment.