Skip to content

Commit

Permalink
Add SkyWalking Java Agent self observability dashboard (#12622)
Browse files Browse the repository at this point in the history
* Add SkyWalking Java Agent self observability dashboard.

* add e2e testcase, sync ui, add tips.

* update e2e kafka testcase.
  • Loading branch information
CzyerChen authored Sep 16, 2024
1 parent ddbed6d commit d3f8fe8
Show file tree
Hide file tree
Showing 21 changed files with 492 additions and 14 deletions.
1 change: 1 addition & 0 deletions docs/en/changes/changes.md
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@
* Fix `findEndpoint` query require `keyword` when using BanyanDB.
* Support to analysis the ztunnel mapped IP address in eBPF Access Log Receiver.
* Adapt BanyanDB Java Client 0.7.0-rc3.
* Add SkyWalking Java Agent self observability dashboard.

#### UI

Expand Down
32 changes: 32 additions & 0 deletions docs/en/setup/backend/dashboards-so11y-java-agent.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Java Agent self observability dashboard

SkyWalking java agent reports itself metrics by Meter APIS in order to measure tracing performance.
it also provides a dashboard to visualize the agent metrics.

## Data flow
1. SkyWalking java agent reports metrics data internally and automatically.
2. SkyWalking OAP accept these meters through native protocols.
3. The SkyWalking OAP Server parses the expression with [MAL](../../concepts-and-designs/mal.md) to filter/calculate/aggregate and store the results.

## Set up
Java Agent so11y is a build-in feature, it reports meters automatically after boot.

## Self observability monitoring
Self observability monitoring provides monitoring of the runtime performance of the java agent itself. `agent.service_name` is a `Service` in Agent so11y, and land on the `Layer: SO11Y_JAVA_AGENT`.

### Self observability metrics

| Unit | Metric Name | Description | Data Source |
|-------------------|----------------------------------------------------------------|---------------------------------------------|-----------------------|
| Count Per Minute | meter_java_agent_created_tracing_context_count | Created Tracing Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_finished_tracing_context_count | Finished Tracing Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_created_ignored_context_count | Created Ignored Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_finished_ignored_context_count | Finished Ignored Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_possible_leaked_context_count | Possible Leak Context Count (Per Minute) | SkyWalking Java Agent |
| Count Per Minute | meter_java_agent_interceptor_error_count | Interceptor Error Count (Per Minute) | SkyWalking Java Agent |
| ns | meter_java_agent_tracing_context_execution_time_percentile | Tracing Context Execution Time (ns) | SkyWalking Java Agent |

## Customizations
You can customize your own metrics/expression/dashboard panel.
The metrics definition and expression rules are found in `/meter-analyzer-config/java-agent.yaml`
The self observability dashboard panel configurations are found in `/config/ui-initialized-templates/so11y_java_agent`.
2 changes: 2 additions & 0 deletions docs/menu.yml
Original file line number Diff line number Diff line change
Expand Up @@ -146,6 +146,8 @@ catalog:
path: "/en/setup/backend/dashboards-so11y"
- name: "Satellite self telemetry"
path: "/en/setup/backend/dashboards-so11y-satellite"
- name: "SkyWalking Java Agent self telemetry"
path: "/en/setup/backend/dashboards-so11y-java-agent"
- name: "Configuration Vocabulary"
path: "/en/setup/backend/configuration-vocabulary"
- name: "Advanced Setup"
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -234,7 +234,13 @@ public enum Layer {
* Cilium is open source software for providing and transparently securing network connectivity and load balancing
* between application workloads such as application containers or processes.
*/
CILIUM_SERVICE(38, true);
CILIUM_SERVICE(38, true),

/**
* The self observability of SkyWalking Java Agent,
* which provides the abilities to measure the tracing performance and error statistics of plugins.
*/
SO11Y_JAVA_AGENT(39, true);

private final int value;
/**
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ public class UITemplateInitializer {
Layer.CLICKHOUSE.name(),
Layer.ACTIVEMQ.name(),
Layer.CILIUM_SERVICE.name(),
Layer.SO11Y_JAVA_AGENT.name(),
"custom"
};
private final UITemplateManagementService uiTemplateManagementService;
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -269,7 +269,7 @@ agent-analyzer:
# Nginx and Envoy agents can't get the real remote address.
# Exit spans with the component in the list would not generate the client-side instance relation metrics.
noUpstreamRealAddressAgents: ${SW_NO_UPSTREAM_REAL_ADDRESS:6000,9000}
meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling} # Which files could be meter analyzed, files split by ","
meterAnalyzerActiveFiles: ${SW_METER_ANALYZER_ACTIVE_FILES:datasource,threadpool,satellite,go-runtime,python-runtime,continuous-profiling,java-agent} # Which files could be meter analyzed, files split by ","
slowCacheReadThreshold: ${SW_SLOW_CACHE_SLOW_READ_THRESHOLD:default:20,redis:10} # The slow cache read operation thresholds. Unit ms.
slowCacheWriteThreshold: ${SW_SLOW_CACHE_SLOW_WRITE_THRESHOLD:default:20,redis:10} # The slow cache write operation thresholds. Unit ms.

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

expSuffix: instance(['service'], ['instance'], Layer.SO11Y_JAVA_AGENT)
metricPrefix: meter_java_agent
metricsRules:
- name: created_tracing_context_count
exp: created_tracing_context_counter.sum(['created_by', 'service', 'instance']).increase('PT1M')
- name: finished_tracing_context_count
exp: finished_tracing_context_counter.sum(['service', 'instance']).increase('PT1M')
- name: created_ignored_context_count
exp: created_ignored_context_counter.sum(['created_by', 'service', 'instance']).increase('PT1M')
- name: finished_ignored_context_count
exp: finished_ignored_context_counter.sum(['service', 'instance']).increase('PT1M')
- name: possible_leaked_context_count
exp: possible_leaked_context_counter.sum(['source', 'service', 'instance']).increase('PT1M')
- name: interceptor_error_count
exp: interceptor_error_counter.sum(['plugin_name', 'inter_type', 'service', 'instance']).increase('PT1M')
- name: tracing_context_execution_time_percentile
exp: tracing_context_performance.sum(['le', 'service', 'instance']).histogram().histogram_percentile([50,70,90,99])
Original file line number Diff line number Diff line change
Expand Up @@ -247,3 +247,8 @@ menus:
description: "Satellite: an open-source agent designed for the cloud-native infrastructures, which provides a low-cost, high-efficient, and more secure way to collect telemetry data. It is the recommended load balancer for telemetry collecting."
documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/backend-load-balancer/
i18nKey: self_observability_satellite
- title: SkyWalking Java Agent
layer: SO11Y_JAVA_AGENT
description: The Java Agent for Apache SkyWalking, which provides the native tracing/metrics/logging/event/profiling abilities for Java projects.
documentLink: https://skywalking.apache.org/docs/main/next/en/setup/backend/dashboards-so11y-java-agent/
i18nKey: self_observability_java_agent
Original file line number Diff line number Diff line change
@@ -0,0 +1,202 @@
[
{
"id": "Self-Observability-Java-Agent-Instance",
"configuration": {
"children": [
{
"x": 0,
"y": 0,
"w": 6,
"h": 13,
"i": "14",
"type": "Widget",
"widget": {
"title": "Tracing Context Creation (Per Minute)",
"tips": "The number of created tracing contexts, including a label created_by(value=sampler,propagated)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_created_tracing_context_count"
]
},
{
"x": 6,
"y": 0,
"w": 6,
"h": 13,
"i": "6",
"type": "Widget",
"widget": {
"title": "Tracing Context Creation and Completion (Per Minute)",
"tips": "The number of created tracing contexts and finished tracing contexts."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"metricConfig": [
{
"label": "Creation"
},
{
"label": "Completion"
}
],
"expressions": [
"aggregate_labels(meter_java_agent_created_tracing_context_count,sum)",
"meter_java_agent_finished_tracing_context_count"
]
},
{
"x": 12,
"y": 0,
"w": 6,
"h": 13,
"i": "1",
"type": "Widget",
"widget": {
"title": "Ignored Context Creation (Per Minute)",
"tips": "The number of created ignored contexts, including a label created_by(value=sampler,propagated)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_created_ignored_context_count"
]
},
{
"x": 18,
"y": 0,
"w": 6,
"h": 13,
"i": "2",
"type": "Widget",
"widget": {
"title": "Ignored Context Creation and Completion (Per Minute)",
"tips": "The number of created ignored contexts and finished ignored contexts."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"aggregate_labels(meter_java_agent_created_ignored_context_count,sum)",
"meter_java_agent_finished_ignored_context_count"
],
"metricConfig": [
{
"label": "Creation"
},
{
"label": "Completion"
}
]
},
{
"x": 0,
"y": 13,
"w": 6,
"h": 13,
"i": "11",
"type": "Widget",
"widget": {
"title": "Possible Leaked Context (Per Minute)",
"tips": "The number of detected leaked contexts, including a label source(value=tracing, ignore)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_possible_leaked_context_count"
],
"metricConfig": [
{
"label": "count"
}
]
},
{
"x": 12,
"y": 13,
"w": 12,
"h": 13,
"i": "8",
"type": "Widget",
"widget": {
"title": "Interceptor Error Count (Per Minute)",
"tips": "The number of errors happened in the interceptor logic, including the label plugin_name and inter_type(constructor, inst, static)."
},
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"expressions": [
"meter_java_agent_interceptor_error_count"
],
"metricConfig": [
{
"label": "count"
}
]
},
{
"x": 6,
"y": 13,
"w": 6,
"h": 13,
"i": "15",
"type": "Widget",
"graph": {
"type": "Line",
"step": false,
"smooth": false,
"showSymbol": true,
"showXAxis": true,
"showYAxis": true
},
"widget": {
"title": "Tracing Context Execution time (ms)",
"tips": "For successfully finished tracing context, it measures every interceptor's time cost."
},
"expressions": [
"relabels(meter_java_agent_tracing_context_execution_time_percentile,p='50,75,90,95,99',p='50,75,90,95,99')/1000000"
]
}
],
"layer": "SO11Y_JAVA_AGENT",
"entity": "ServiceInstance",
"name": "Self-Observability-Java-Agent-Instance",
"isRoot": false
}
}
]
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
[
{
"id": "Self-Observability-Java-Agent-Service",
"configuration": {
"children": [
{
"x": 0,
"y": 2,
"w": 24,
"h": 38,
"i": "0",
"type": "Widget",
"graph": {
"type": "InstanceList",
"dashboardName": "Self-Observability-Java-Agent-Instance",
"fontSize": 12
},
"metricConfig": [
{
"label": "Context Creation",
"detailLabel": "context_creation",
"unit": "Per Minute"
},
{
"label": "Context Completion",
"unit": "Per Minute",
"detailLabel": "context_completion"
}
],
"expressions": [
"avg(aggregate_labels(meter_java_agent_created_tracing_context_count,sum)+aggregate_labels(meter_java_agent_created_ignored_context_count,sum))",
"avg(meter_java_agent_finished_tracing_context_count+meter_java_agent_finished_ignored_context_count)"
],
"subExpressions": [
"aggregate_labels(meter_java_agent_created_tracing_context_count,sum)+aggregate_labels(meter_java_agent_created_ignored_context_count,sum)",
"meter_java_agent_finished_tracing_context_count+meter_java_agent_finished_ignored_context_count"
]
},
{
"x": 0,
"y": 0,
"w": 24,
"h": 2,
"i": "100",
"type": "Text",
"graph": {
"fontColor": "theme",
"backgroundColor": "theme",
"content": "The self observability of SkyWalking Java Agent, which provides the abilities to measure the tracing performance and error statistics of plugins.",
"fontSize": 14,
"textAlign": "left",
"url": "https://skywalking.apache.org/docs/main/next/en/setup/backend/dashboards-so11y-java-agent/"
}
}
],
"layer": "SO11Y_JAVA_AGENT",
"entity": "Service",
"name": "Self-Observability-Java-Agent-Service",
"isRoot": true
}
}
]
2 changes: 1 addition & 1 deletion skywalking-ui
Loading

0 comments on commit d3f8fe8

Please sign in to comment.