-
Notifications
You must be signed in to change notification settings - Fork 994
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[featrue] add apache hdfs monitor (#1920)
Co-authored-by: zhangshenghang <shenghang.zhang@avrisdigital.com> Co-authored-by: zhangshenghang <admin@hadoop.wiki> Co-authored-by: crossoverJie <crossoverJie@gmail.com> Co-authored-by: yqxxgh <42080876+yqxxgh@users.noreply.github.com> Co-authored-by: Ceilzcx <48920254+Ceilzcx@users.noreply.github.com> Co-authored-by: aias00 <rokkki@163.com> Co-authored-by: tomsun28 <tomsun28@outlook.com>
- Loading branch information
1 parent
f793178
commit 49fbf2a
Showing
7 changed files
with
1,059 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
id: hdfs_datanode | ||
title: Monitoring Apache HDFS DataNode Monitoring | ||
sidebar_label: Apache HDFS DataNode | ||
keywords: [big data monitoring system, distributed file system monitoring, Apache HDFS DataNode monitoring] | ||
--- | ||
|
||
> Hertzbeat monitors metrics for Apache HDFS DataNode nodes. | ||
**Protocol Used: HTTP** | ||
|
||
## Pre-monitoring Operations | ||
|
||
Retrieve the HTTP monitoring port for the Apache HDFS DataNode. Value: `dfs.datanode.http.address` | ||
|
||
## Configuration Parameters | ||
|
||
| Parameter Name | Parameter Description | | ||
| ----------------- |-------------------------------------------------------| | ||
| Target Host | IP(v4 or v6) or domain name of the target to be monitored. Exclude protocol. | | ||
| Port | Monitoring port number for Apache HDFS DataNode, default is 50075. | | ||
| Query Timeout | Timeout for querying Apache HDFS DataNode, in milliseconds, default is 6000 milliseconds. | | ||
| Metrics Collection Interval | Time interval for monitoring data collection, in seconds, minimum interval is 30 seconds. | | ||
| Probe Before Monitoring | Whether to probe and check monitoring availability before adding. | | ||
| Description/Remarks | Additional description and remarks for this monitoring. | | ||
|
||
### Metrics Collected | ||
|
||
#### Metric Set: FSDatasetState | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| ------------ | ----------- | ------------------------------ | | ||
| DfsUsed | GB | DataNode HDFS usage | | ||
| Remaining | GB | Remaining space on DataNode HDFS | | ||
| Capacity | GB | Total capacity of DataNode HDFS | | ||
|
||
#### Metric Set: JvmMetrics | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| ---------------------- | ----------- | ----------------------------------------- | | ||
| MemNonHeapUsedM | MB | Current usage of NonHeapMemory by JVM | | ||
| MemNonHeapCommittedM | MB | Committed size of NonHeapMemory configured in JVM | | ||
| MemHeapUsedM | MB | Current usage of HeapMemory by JVM | | ||
| MemHeapCommittedM | MB | Committed size of HeapMemory by JVM | | ||
| MemHeapMaxM | MB | Maximum size of HeapMemory configured in JVM | | ||
| MemMaxM | MB | Maximum memory available for JVM at runtime | | ||
| ThreadsRunnable | Count | Number of threads in RUNNABLE state | | ||
| ThreadsBlocked | Count | Number of threads in BLOCKED state | | ||
| ThreadsWaiting | Count | Number of threads in WAITING state | | ||
| ThreadsTimedWaiting | Count | Number of threads in TIMED WAITING state | | ||
|
||
#### Metric Set: runtime | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| ------------ | ----------- | ------------------ | | ||
| StartTime | | Startup time | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,92 @@ | ||
--- | ||
id: hdfs_namenode | ||
title: Monitoring HDFS NameNode Monitoring | ||
sidebar_label: Apache HDFS NameNode | ||
keywords: [big data monitoring system, distributed file system monitoring, HDFS NameNode monitoring] | ||
--- | ||
|
||
> Hertzbeat monitors metrics for HDFS NameNode nodes. | ||
**Protocol Used: HTTP** | ||
|
||
## Pre-Monitoring Actions | ||
|
||
Ensure that you have obtained the JMX monitoring port for the HDFS NameNode. | ||
|
||
## Configuration Parameters | ||
|
||
| Parameter Name | Parameter Description | | ||
| ------------------ |--------------------------------------------------------| | ||
| Target Host | The IPv4, IPv6, or domain name of the target being monitored. Exclude protocol headers. | | ||
| Port | The monitoring port number of the HDFS NameNode, default is 50070. | | ||
| Query Timeout | Timeout for querying the HDFS NameNode, in milliseconds, default is 6000 milliseconds. | | ||
| Metrics Collection Interval | Time interval for collecting monitoring data, in seconds, minimum interval is 30 seconds. | | ||
| Probe Before Monitoring | Whether to probe and check the availability of monitoring before adding it. | | ||
| Description/Remarks | Additional description and remarks for this monitoring. | | ||
|
||
### Collected Metrics | ||
|
||
#### Metric Set: FSNamesystem | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| --------------------------- | ----------- | ------------------------------------- | | ||
| CapacityTotal | | Total cluster storage capacity | | ||
| CapacityTotalGB | GB | Total cluster storage capacity | | ||
| CapacityUsed | | Used cluster storage capacity | | ||
| CapacityUsedGB | GB | Used cluster storage capacity | | ||
| CapacityRemaining | | Remaining cluster storage capacity | | ||
| CapacityRemainingGB | GB | Remaining cluster storage capacity | | ||
| CapacityUsedNonDFS | | Non-HDFS usage of cluster capacity | | ||
| TotalLoad | | Total client connections in the cluster | | ||
| FilesTotal | | Total number of files in the cluster | | ||
| BlocksTotal | | Total number of BLOCKs | | ||
| PendingReplicationBlocks | | Number of blocks awaiting replication | | ||
| UnderReplicatedBlocks | | Number of blocks with insufficient replicas | | ||
| CorruptBlocks | | Number of corrupt blocks | | ||
| ScheduledReplicationBlocks | | Number of blocks scheduled for replication | | ||
| PendingDeletionBlocks | | Number of blocks awaiting deletion | | ||
| ExcessBlocks | | Number of excess blocks | | ||
| PostponedMisreplicatedBlocks| | Number of misreplicated blocks postponed for processing | | ||
| NumLiveDataNodes | | Number of live data nodes in the cluster | | ||
| NumDeadDataNodes | | Number of data nodes marked as dead | | ||
| NumDecomLiveDataNodes | | Number of decommissioned live nodes | | ||
| NumDecomDeadDataNodes | | Number of decommissioned dead nodes | | ||
| NumDecommissioningDataNodes | | Number of nodes currently being decommissioned | | ||
| TransactionsSinceLastCheckpoint | | Number of transactions since the last checkpoint | | ||
| LastCheckpointTime | | Time of the last checkpoint | | ||
| PendingDataNodeMessageCount| | Number of DATANODE requests queued in the standby namenode | | ||
|
||
#### Metric Set: RPC | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| ------------------------- | ----------- | -------------------------- | | ||
| ReceivedBytes | | Data receiving rate | | ||
| SentBytes | | Data sending rate | | ||
| RpcQueueTimeNumOps | | RPC call rate | | ||
|
||
#### Metric Set: runtime | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| ------------------------- | ----------- | ------------------- | | ||
| StartTime | | Start time | | ||
|
||
#### Metric Set: JvmMetrics | ||
|
||
| Metric Name | Metric Unit | Metric Description | | ||
| ------------------------- | ----------- | ------------------- | | ||
| MemNonHeapUsedM | MB | Current usage of NonHeapMemory by JVM | | ||
| MemNonHeapCommittedM | MB | Committed NonHeapMemory by JVM | | ||
| MemHeapUsedM | MB | Current usage of HeapMemory by JVM | | ||
| MemHeapCommittedM | MB | Committed HeapMemory by JVM | | ||
| MemHeapMaxM | MB | Maximum HeapMemory configured for JVM | | ||
| MemMaxM | MB | Maximum memory that can be used by JVM | | ||
| GcCountParNew | Count | Number of ParNew GC events | | ||
| GcTimeMillisParNew | Milliseconds| Time spent in ParNew GC | | ||
| GcCountConcurrentMarkSweep| Count | Number of ConcurrentMarkSweep GC events| | ||
| GcTimeMillisConcurrentMarkSweep | Milliseconds | Time spent in ConcurrentMarkSweep GC | | ||
| GcCount | Count | Total number of GC events | | ||
| GcTimeMillis | Milliseconds| Total time spent in GC events | | ||
| ThreadsRunnable | Count | Number of threads in RUNNABLE state | | ||
| ThreadsBlocked | Count | Number of threads in BLOCKED state | | ||
| ThreadsWaiting | Count | Number of threads in WAITING state | | ||
| ThreadsTimedWaiting | Count | Number of threads in TIMED WAITING state| |
56 changes: 56 additions & 0 deletions
56
home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hdfs_datanode.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,56 @@ | ||
--- | ||
id: hdfs_datanode | ||
title: 监控:Apache HDFS DataNode监控 | ||
sidebar_label: Apache HDFS DataNode | ||
keywords: [大数据监控系统, 分布式文件系统监控, Apache HDFS DataNode监控] | ||
--- | ||
|
||
> Hertzbeat 对 Apache HDFS DataNode 节点监控指标进行监控。 | ||
**使用协议:HTTP** | ||
|
||
## 监控前操作 | ||
|
||
获取 Apache HDFS DataNode 的 HTTP 监控端口。 取值:`dfs.datanode.http.address` | ||
|
||
## 配置参数 | ||
|
||
| 参数名称 | 参数帮助描述 | | ||
| ---------------- |---------------------------------------| | ||
| 目标Host | 被监控的对端IPV4,IPV6或域名。不带协议头。 | | ||
| 端口 | Apache HDFS DataNode 的监控端口号,默认为50075。 | | ||
| 查询超时时间 | 查询 Apache HDFS DataNode 的超时时间,单位毫秒,默认6000毫秒。 | | ||
| 指标采集间隔 | 监控数据采集的时间间隔,单位秒,最小间隔为30秒。 | | ||
| 是否探测 | 新增监控前是否先探测检查监控可用性。 | | ||
| 描述备注 | 此监控的更多描述和备注信息。 | | ||
|
||
### 采集指标 | ||
|
||
#### 指标集合:FSDatasetState | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| -------------------------- | -------- | ------------------------------------ | | ||
| DfsUsed | GB | DataNode HDFS使用量 | | ||
| Remaining | GB | DataNode HDFS剩余空间 | | ||
| Capacity | GB | DataNode HDFS空间总量 | | ||
|
||
#### 指标集合:JvmMetrics | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| ------------------------ | -------- | ------------------------------------ | | ||
| MemNonHeapUsedM | MB | JVM 当前已经使用的 NonHeapMemory 的大小 | | ||
| MemNonHeapCommittedM | MB | JVM 配置的 NonHeapCommittedM 的大小 | | ||
| MemHeapUsedM | MB | JVM 当前已经使用的 HeapMemory 的大小 | | ||
| MemHeapCommittedM | MB | JVM HeapMemory 提交大小 | | ||
| MemHeapMaxM | MB | JVM 配置的 HeapMemory 的大小 | | ||
| MemMaxM | MB | JVM 运行时可以使用的最大内存大小 | | ||
| ThreadsRunnable | 个 | 处于 RUNNABLE 状态的线程数量 | | ||
| ThreadsBlocked | 个 | 处于 BLOCKED 状态的线程数量 | | ||
| ThreadsWaiting | 个 | 处于 WAITING 状态的线程数量 | | ||
| ThreadsTimedWaiting | 个 | 处于 TIMED WAITING 状态的线程数量 | | ||
|
||
#### 指标集合:runtime | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| --------------------| -------- | ----------------- | | ||
| StartTime | | 启动时间 | |
93 changes: 93 additions & 0 deletions
93
home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hdfs_namenode.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,93 @@ | ||
--- | ||
id: hdfs_namenode | ||
title: 监控:Apache HDFS NameNode监控 | ||
sidebar_label: Apache HDFS NameNode | ||
keywords: [大数据监控系统, 分布式文件系统监控, Apache HDFS NameNode监控] | ||
--- | ||
|
||
> Hertzbeat 对 Apache HDFS NameNode 节点监控指标进行监控。 | ||
**使用协议:HTTP** | ||
|
||
## 监控前操作 | ||
|
||
获取 Apache HDFS NameNode 的 HTTP 监控端口。取值:`dfs.namenode.http-address` | ||
|
||
## 配置参数 | ||
|
||
| 参数名称 | 参数帮助描述 | | ||
| ---------------- |---------------------------------------| | ||
| 目标Host | 被监控的对端IPV4,IPV6或域名。不带协议头。 | | ||
| 端口 | HDFS NameNode 的监控端口号,默认为50070。 | | ||
| 查询超时时间 | 查询 HDFS NameNode 的超时时间,单位毫秒,默认6000毫秒。 | | ||
| 指标采集间隔 | 监控数据采集的时间间隔,单位秒,最小间隔为30秒。 | | ||
| 是否探测 | 新增监控前是否先探测检查监控可用性。 | | ||
| 描述备注 | 此监控的更多描述和备注信息。 | | ||
|
||
### 采集指标 | ||
|
||
#### 指标集合:FSNamesystem | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| -------------------------- | -------- | ------------------------------------ | | ||
| CapacityTotal | | 集群存储总容量 | | ||
| CapacityTotalGB | GB | 集群存储总容量 | | ||
| CapacityUsed | | 集群存储已使用容量 | | ||
| CapacityUsedGB | GB | 集群存储已使用容量 | | ||
| CapacityRemaining | | 集群存储剩余容量 | | ||
| CapacityRemainingGB | GB | 集群存储剩余容量 | | ||
| CapacityUsedNonDFS | | 集群非 HDFS 使用容量 | | ||
| TotalLoad | | 整个集群的客户端连接数 | | ||
| FilesTotal | | 集群文件总数量 | | ||
| BlocksTotal | | 总 BLOCK 数量 | | ||
| PendingReplicationBlocks | | 等待被备份的块数量 | | ||
| UnderReplicatedBlocks | | 副本数不够的块数量 | | ||
| CorruptBlocks | | 坏块数量 | | ||
| ScheduledReplicationBlocks | | 安排要备份的块数量 | | ||
| PendingDeletionBlocks | | 等待被删除的块数量 | | ||
| ExcessBlocks | | 多余的块数量 | | ||
| PostponedMisreplicatedBlocks | | 被推迟处理的异常块数量 | | ||
| NumLiveDataNodes | | 活的数据节点数量 | | ||
| NumDeadDataNodes | | 已经标记为 Dead 状态的数据节点数量 | | ||
| NumDecomLiveDataNodes | | 下线且 Live 的节点数量 | | ||
| NumDecomDeadDataNodes | | 下线且 Dead 的节点数量 | | ||
| NumDecommissioningDataNodes | | 正在下线的节点数量 | | ||
| TransactionsSinceLastCheckpoint | | 从上次Checkpoint之后的事务数量 | | ||
| LastCheckpointTime | | 上一次Checkpoint时间 | | ||
| PendingDataNodeMessageCount | | DATANODE 的请求被 QUEUE 在 standby namenode 中的个数 | | ||
|
||
#### 指标集合:RPC | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| ------------------- | -------- | ---------------------- | | ||
| ReceivedBytes | | 接收数据速率 | | ||
| SentBytes | | 发送数据速率 | | ||
| RpcQueueTimeNumOps | | RPC 调用速率 | | ||
|
||
#### 指标集合:runtime | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| --------------------| -------- | ----------------- | | ||
| StartTime | | 启动时间 | | ||
|
||
#### 指标集合:JvmMetrics | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| ------------------------ | -------- | ---------------- | | ||
| MemNonHeapUsedM | MB | JVM 当前已经使用的 NonHeapMemory 的大小 | | ||
| MemNonHeapCommittedM | MB | JVM 配置的 NonHeapCommittedM 的大小 | | ||
| MemHeapUsedM | MB | JVM 当前已经使用的 HeapMemory 的大小 | | ||
| MemHeapCommittedM | MB | JVM HeapMemory 提交大小 | | ||
| MemHeapMaxM | MB | JVM 配置的 HeapMemory 的大小 | | ||
| MemMaxM | MB | JVM 运行时可以使用的最大内存大小 | | ||
| GcCountParNew | 次 | 新生代GC消耗时间 | | ||
| GcTimeMillisParNew | 毫秒 | 新生代GC消耗时间 | | ||
| GcCountConcurrentMarkSweep | 毫秒 | 老年代GC次数 | | ||
| GcTimeMillisConcurrentMarkSweep | 个 | 老年代GC消耗时间 | | ||
| GcCount | 个 | GC次数 | | ||
| GcTimeMillis | 个 | GC消耗时间 | | ||
| ThreadsRunnable | 个 | 处于 BLOCKED 状态的线程数量 | | ||
| ThreadsBlocked | 个 | 处于 BLOCKED 状态的线程数量 | | ||
| ThreadsWaiting | 个 | 处于 WAITING 状态的线程数量 | | ||
| ThreadsTimedWaiting | 个 | 处于 TIMED WAITING 状态的线程数量 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.