Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added the docs for all the grafana dashboards. #21795

Merged
merged 97 commits into from
Nov 5, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
97 commits
Select commit Hold shift + click to select a range
74ff97e
Added the docs for all the grafana dashboards.
Sep 28, 2024
103cb4f
Seperated the dashboards
YasminLorinKaygalak Oct 3, 2024
0f38739
Delete grafana-dashboards.mdx
YasminLorinKaygalak Oct 3, 2024
3ea5bd1
Seperated the dashboards and added descriptions
YasminLorinKaygalak Oct 3, 2024
afabad7
Seperated the dashboards and added descriptions
YasminLorinKaygalak Oct 3, 2024
df19762
Seperated the dashboards and added descriptions
YasminLorinKaygalak Oct 3, 2024
7bd091f
Seperated the dashboards and added descriptions
YasminLorinKaygalak Oct 3, 2024
533c489
Typo edit
YasminLorinKaygalak Oct 4, 2024
4da3c29
added changelog for docs PR
YasminLorinKaygalak Oct 4, 2024
109cdba
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 7, 2024
d59d3b3
Update website/content/docs/connect/observability/grafanadashboards/i…
YasminLorinKaygalak Oct 7, 2024
0e803db
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 7, 2024
31f7c4b
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 7, 2024
9d6857a
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 7, 2024
bf1502b
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 7, 2024
1d1bea5
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 7, 2024
a9a884a
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 8, 2024
f31e1b1
Testing the revised dashboard
YasminLorinKaygalak Oct 9, 2024
8451687
Update website/content/docs/connect/observability/grafanadashboards/i…
YasminLorinKaygalak Oct 9, 2024
88f5885
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 9, 2024
f8c9cc7
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 9, 2024
af4819b
Adding the PR revised docs
YasminLorinKaygalak Oct 9, 2024
c7e74f5
Adding the PR revised docs consul k8s
YasminLorinKaygalak Oct 9, 2024
413ff85
Adding the PR revised docs service dashboard
YasminLorinKaygalak Oct 9, 2024
3804077
Adding the PR revised docs consul server dashboard
YasminLorinKaygalak Oct 9, 2024
33af0ad
Adding the PR revised docs consul server dashboard insertions
YasminLorinKaygalak Oct 9, 2024
998a1e4
Adding the PR revised docs consul k8s docs edit
YasminLorinKaygalak Oct 9, 2024
529788a
Adding the PR revised docs service dashboard
YasminLorinKaygalak Oct 9, 2024
9541745
Added the final edits for the PR feedback
YasminLorinKaygalak Oct 9, 2024
5c94946
Adding consul dataplane dashboard screenshoots
YasminLorinKaygalak Oct 9, 2024
0ee5e35
Minor edit in service to service dashboard
YasminLorinKaygalak Oct 9, 2024
42dcc67
Minor edit in overview
YasminLorinKaygalak Oct 9, 2024
d802441
Minor edit in overview page
YasminLorinKaygalak Oct 9, 2024
6c1386a
Completed the codeblockconfigs for each query in all the dashboards.
YasminLorinKaygalak Oct 15, 2024
ceec4f1
Update website/content/docs/connect/observability/grafanadashboards/i…
YasminLorinKaygalak Oct 15, 2024
382883c
Update website/content/docs/connect/observability/grafanadashboards/i…
YasminLorinKaygalak Oct 15, 2024
badc3e3
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 15, 2024
86056a2
Completing the whole PR feedback.
YasminLorinKaygalak Oct 15, 2024
f06d76d
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Oct 15, 2024
86f687d
Completing the whole PR feedback.
YasminLorinKaygalak Oct 15, 2024
106be26
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Oct 15, 2024
44c93c1
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Oct 15, 2024
2d1308a
Completing the whole PR feedback.
YasminLorinKaygalak Oct 15, 2024
ae4e7c4
Completing the whole PR feedback.
YasminLorinKaygalak Oct 15, 2024
073838f
Added the correct path to the reference code on each dashboard.
YasminLorinKaygalak Oct 15, 2024
e641ed4
Finished the PR feedback.
YasminLorinKaygalak Oct 15, 2024
8701bf7
update serf links (#21797)
jmurret Oct 2, 2024
1300278
docs: Add missing `&&` in DNS forwading tutorial (#21804)
lens0021 Oct 7, 2024
7c7478a
Adds grafana dashboards (#21806)
YasminLorinKaygalak Oct 9, 2024
44811e4
Upgrade test improvements for 1.20.x (#21813)
nathancoleman Oct 11, 2024
66247ed
Update ENVOY_VERSIONS (#21820)
nathancoleman Oct 14, 2024
e4df2b9
ci: ensure int test docker pull goes through proxy (#21819)
zalimeni Oct 14, 2024
5238a2a
docs: Consul DNS views on Kubernetes (#21802)
boruszak Oct 14, 2024
67121b5
Post-release updates for 1.20.0 (#21829)
nathancoleman Oct 15, 2024
6fb8360
docs: Consul v1.20 release notes (#21826)
boruszak Oct 15, 2024
0fc442d
chore: remove unintentionally committed consul-k8s submodule (#21833)
zalimeni Oct 16, 2024
f28c5c9
[NET-1151 NET-11228] security: Add request normalization and header m…
zalimeni Oct 16, 2024
939491a
Enabling prometheus change.
YasminLorinKaygalak Oct 16, 2024
06d625c
Small fix.
YasminLorinKaygalak Oct 16, 2024
1bfb0d7
Testing the fix in queries in deployment
YasminLorinKaygalak Oct 22, 2024
b9feb14
Testing the fix in queries in deployment
YasminLorinKaygalak Oct 22, 2024
81dfab0
Made the promql changes in all the dashboards
YasminLorinKaygalak Oct 22, 2024
ebfaf73
Fixed the bug with the consul dataplane dashboard queries
YasminLorinKaygalak Oct 22, 2024
5f7774e
Deleted website-preview and website-preview-tmp.
YasminLorinKaygalak Oct 22, 2024
48170e8
Suppress CVE-2024-9143 (#21848)
sarahalsmiller Oct 17, 2024
6ce0146
docs: clarify Envoy and dataplane LTS support policy (#21337)
zalimeni Oct 17, 2024
00cb716
Update compatibility matrix to include 1.20.x (#21843)
nathancoleman Oct 17, 2024
1675c37
Update Envoy compatibility matrices to include consul 1.20.x and data…
nathancoleman Oct 17, 2024
f19adaa
Upgrade envoy version in nightly integration tests (#21864)
sarahalsmiller Oct 21, 2024
903a9f9
chore: retain retracted api submodule version (#21861)
zalimeni Oct 22, 2024
2b7851c
Added the correct links
YasminLorinKaygalak Oct 22, 2024
0d76bcb
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Oct 22, 2024
c4c1875
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Oct 22, 2024
2d37593
Update website/content/docs/connect/observability/grafanadashboards/i…
YasminLorinKaygalak Oct 22, 2024
593afad
Update website/content/docs/connect/observability/grafanadashboards/i…
YasminLorinKaygalak Oct 22, 2024
9df6f0f
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Oct 22, 2024
a6800b0
Fixing weird commit issues
missylbytes Oct 23, 2024
59ba16a
Fixing weird commit issues
missylbytes Oct 23, 2024
6ceceb5
Last weird commit issue
missylbytes Oct 23, 2024
426d5fa
Fix build issues
missylbytes Oct 23, 2024
09e491f
Newline
missylbytes Oct 23, 2024
036e398
Finally fixed, it was a newline issue
missylbytes Oct 23, 2024
e7d1d21
Update website/content/docs/connect/observability/grafanadashboards/s…
missylbytes Oct 29, 2024
00c0044
[NET-1151 NET-11046] docs: clarify request normalization and L7 heade…
zalimeni Oct 28, 2024
d7afbde
Allow multiple endpoints in Envoy clusters configured with hostnames …
t-davies Oct 28, 2024
019a4c7
Apply suggestions from code review
missylbytes Oct 29, 2024
f5edd15
Apply suggestions from code review
missylbytes Oct 29, 2024
2e1930d
Apply suggestions from code review
missylbytes Oct 29, 2024
db619f5
docs: add missing slash in redirect (#21881)
boruszak Oct 29, 2024
788c879
Merge branch 'main' into Net-12345--docs-for-grafana-dashboards
missylbytes Oct 30, 2024
b124a5c
Merge branch 'main' into Net-12345--docs-for-grafana-dashboards
missylbytes Oct 31, 2024
2ce30d6
Double checked the images on the dashboards for grafana docs.
YasminLorinKaygalak Nov 4, 2024
42c5c48
Update website/content/docs/connect/observability/grafanadashboards/c…
YasminLorinKaygalak Nov 4, 2024
73c4939
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Nov 4, 2024
4c20394
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Nov 4, 2024
1ad666d
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Nov 4, 2024
592fd27
Update website/content/docs/connect/observability/grafanadashboards/s…
YasminLorinKaygalak Nov 4, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .changelog/21795.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
```release-note:feature
docs: added the docs for the grafana dashboards
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
---
layout: docs
page_title: Dashboard for Consul dataplane metrics
description: >-
This Grafana dashboard provides Consul dataplane metrics on Kubernetes deployments. Learn about the Grafana queries that produce the metrics and visualizations in this dashboard.
---

# Consul dataplane monitoring dashboard

This page provides reference information about the [Grafana dashboard configuration included in the `hashicorp/consul` GitHub repository](https://github.com/hashicorp/consul/blob/main/grafana/consuldataplanedashboard.json). The Consul dataplane dashboard provides a comprehensive view of the service health, performance, and resource utilization within the Consul service mesh. You can monitor key metrics at both the cluster and service levels with this dashboard. It can help you ensure service reliability and performance.

![Preview of the Consul dataplane dashboard](/public/img/grafana/consul-dataplane-dashboard.png)
missylbytes marked this conversation as resolved.
Show resolved Hide resolved

This image provides an example of the dashboard's visual layout and contents.

## Grafana queries overview

The Consul dataplane dashboard provides the following information about service mesh operations.

### Live service count

**Description:** Displays the total number of live Envoy proxies currently running in the service mesh. It helps track the overall availability of services and identify any outages or other widespread issues in the service mesh.

```promql
sum(envoy_server_live{app=~"$service"})
```

### Total request success rate

**Description:** Tracks the percentage of successful requests across the service mesh. It excludes 4xx and 5xx response codes to focus on operational success. Use it to monitor the overall reliability of your services.

```promql
sum(irate(envoy_cluster_upstream_rq_xx{envoy_response_code_class!~"5|4",consul_destination_service=~"$service"}[10m])) / sum(irate(envoy_cluster_upstream_rq_xx{consul_destination_service=~"$service"}[10m]))
```

### Total failed requests

**Description:** This pie chart shows the total number of failed requests within the service mesh, categorized by service. It provides a visual breakdown of where failures are occurring, allowing operators to focus on problematic services.

```promql
sum(increase(envoy_cluster_upstream_rq_xx{envoy_response_code_class=~"4|5", consul_destination_service=~"$service"}[10m])) by (local_cluster)
```

### Requests per second

**Description:** This metric shows the rate of incoming HTTP requests per second to the selected services. It helps operators understand the current load on services and how much traffic they are processing.

```promql
sum(rate(envoy_http_downstream_rq_total{service=~"$service",envoy_http_conn_manager_prefix="public_listener"}[5m])) by (service)
```

### Unhealthy clusters

**Description:** This metric tracks the number of unhealthy clusters in the mesh, helping operators identify services that are experiencing issues and need attention to ensure operational health.

```promql
(sum(envoy_cluster_membership_healthy{app=~"$service",envoy_cluster_name=~"$cluster"}) - sum(envoy_cluster_membership_total{app=~"$service",envoy_cluster_name=~"$cluster"}))
```

### Heap size

**Description:** This metric displays the total memory heap size of the Envoy proxies. Monitoring heap size is essential to detect memory issues and ensure that services are operating efficiently.

```promql
SUM(envoy_server_memory_heap_size{app=~"$service"})
```

### Allocated memory

**Description:** This metric shows the amount of memory allocated by the Envoy proxies. It helps operators monitor the resource usage of services to prevent memory overuse and optimize performance.

```promql
SUM(envoy_server_memory_allocated{app=~"$service"})
```

### Avg uptime per node

**Description:** This metric calculates the average uptime of Envoy proxies across all nodes. It helps operators monitor the stability of services and detect potential issues with service restarts or crashes.

```promql
avg(envoy_server_uptime{app=~"$service"})
```

### Cluster state

**Description:** This metric indicates whether all clusters are healthy. It provides a quick overview of the cluster state to ensure that there are no issues affecting service performance.

```promql
(sum(envoy_cluster_membership_total{app=~"$service",envoy_cluster_name=~"$cluster"})-sum(envoy_cluster_membership_healthy{app=~"$service",envoy_cluster_name=~"$cluster"})) == bool 0
```

### CPU throttled seconds by namespace

**Description:** This metric tracks the number of seconds during which CPU usage was throttled. Monitoring CPU throttling helps operators identify when services are exceeding their allocated CPU limits and may need optimization.

```promql
rate(container_cpu_cfs_throttled_seconds_total{namespace=~"$namespace"}[5m])
```

### Memory usage by pod limits

**Description:** This metric shows memory usage as a percentage of the memory limit set for each pod. It helps operators ensure that services are staying within their allocated memory limits to avoid performance degradation.

```promql
100 * max (container_memory_working_set_bytes{namespace=~"$namespace"} / on(container, pod) label_replace(kube_pod_container_resource_limits{resource="memory"}, "pod", "$1", "exported_pod", "(.+)")) by (pod)
```

### CPU usage by pod limits

**Description:** This metric displays CPU usage as a percentage of the CPU limit set for each pod. Monitoring CPU usage helps operators optimize service performance and prevent CPU exhaustion.

```promql
100 * max(
container_memory_working_set_bytes{namespace=~"$namespace"} /
on(container, pod) label_replace(kube_pod_container_resource_limits{resource="memory"}, "pod", "$1", "exported_pod", "(.+)")
) by (pod)
```

### Total active upstream connections

**Description:** This metric tracks the total number of active upstream connections to other services in the mesh. It provides insight into service dependencies and network load.

```promql
sum(envoy_cluster_upstream_cx_active{app=~"$service",envoy_cluster_name=~"$cluster"}) by (app, envoy_cluster_name)
```

### Total active downstream connections

**Description:** This metric tracks the total number of active downstream connections from services to clients. It helps operators monitor service load and ensure that services are able to handle the traffic effectively.

```promql
sum(envoy_http_downstream_cx_active{app=~"$service"})
```
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
---
layout: docs
page_title: Dashboard for Consul k8s control plane metrics
description: >-
This documentation provides an overview of the Consul on Kubernetes Grafana Dashboard. Learn about the metrics it displays and the queries that produce the metrics.
---

# Consul on Kubernetes control plane monitoring dashboard

This page provides reference information about the [Grafana dashboard configuration included in the `hashicorp/consul` GitHub repository](https://github.com/hashicorp/consul/blob/main/grafana/consul-k8s-control-plane-monitoring.json).

## Grafana queries overview

This dashboard provides the following information about service mesh operations.

### Number of Consul servers

**Description:** Displays the number of Consul servers currently active. This metric provides insight into the cluster's health and the number of Consul nodes running in the environment.

```promql
consul_consul_server_0_consul_members_servers{pod="consul-server-0"}
```

### Number of connected Consul dataplanes

**Description:** Tracks the number of connected Consul dataplanes. This metric helps operators understand how many Envoy sidecars are actively connected to the mesh.

```promql
count(consul_dataplane_envoy_connected)
```

### CPU usage in seconds (Consul servers)

**Description:** This metric shows the CPU usage of the Consul servers over time, helping operators monitor resource consumption.

```promql
rate(container_cpu_usage_seconds_total{container="consul", pod=~"consul-server-.*"}[5m])
```

### Memory usage (Consul servers)

**Description:** Displays the memory usage of the Consul servers. This metric helps ensure that the servers have sufficient memory resources for proper operation.

```promql
container_memory_working_set_bytes{container="consul", pod=~"consul-server-.*"}
```

### Disk read/write total per 5 minutes (Consul servers)

**Description:** Tracks the total network bytes received by Consul servers within a 5 minute window. This metric helps assess the network load on Consul nodes.

```promql
sum(rate(container_fs_writes_bytes_total{pod=~"consul-server-.*", container="consul"}[5m])) by (pod, device)
```

```promql
sum(rate(container_fs_reads_bytes_total{pod=~"consul-server-.*", container="consul"}[5m])) by (pod, device)
```

### Received bytes total per 5 minutes (Consul servers)

**Description:** Tracks the total network bytes received by Consul servers within a 5 minute window. This metric helps assess the network load on Consul nodes.

```promql
sum(rate(container_network_receive_bytes_total{pod=~"consul-server-.*"}[5m])) by (pod)
```

### Memory limit (Consul servers)

**Description:** Displays the memory limit for Consul servers. This metric ensures that memory usage stays within the defined limits for each Consul server.

```promql
kube_pod_container_resource_limits{resource="memory", pod="consul-server-0"}
```

### CPU limit in seconds (Consul servers)

**Description:** Displays the CPU limit for Consul servers. Monitoring CPU limits helps operators ensure that the services are not constrained by resource limitations.

```promql
kube_pod_container_resource_limits{resource="cpu", pod="consul-server-0"}
```

### Disk usage (Consul servers)

**Description:** Shows the amount of filesystem storage used by Consul servers. This metric helps operators track disk usage and plan for capacity.

```promql
sum(container_fs_usage_bytes{}) by (pod)
```

```promql
sum(container_fs_usage_bytes{pod="consul-server-0"})
```

### CPU usage in seconds (Connect injector)

**Description:** Tracks the CPU usage of the Connect injector, which is responsible for injecting Envoy sidecars and other operations within the mesh. Monitoring this helps ensure that Connect injector has adequate CPU resources.

```promql
rate(container_cpu_usage_seconds_total{pod=~".*-connect-injector-.*", container="sidecar-injector"}[5m])
```

### CPU limit in seconds (Connect injector)

**Description:** Displays the CPU limit for the Connect injector. Monitoring the CPU limits ensures that Connect injector is not constrained by resource limitations.

```promql
max(kube_pod_container_resource_limits{resource="cpu", container="sidecar-injector"})
```

### Memory usage (Connect injector)

**Description:** Tracks the memory usage of the Connect injector. Monitoring this helps ensure the Connect injector has sufficient memory resources.

```promql
container_memory_working_set_bytes{pod=~".*-connect-injector-.*", container="sidecar-injector"}
```

### Memory limit (Connect injector)

**Description:** Displays the memory limit for the Connect injector, helping to monitor if the service is nearing its resource limits.

```promql
max(kube_pod_container_resource_limits{resource="memory", container="sidecar-injector"})
```


Loading
Loading