-
Notifications
You must be signed in to change notification settings - Fork 994
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[feature] add Apache Hbase RegionServer monitoring (#1833)
Co-authored-by: zhangshenghang <shenghang.zhang@avrisdigital.com> Co-authored-by: zhangshenghang <admin@hadoop.wiki> Co-authored-by: tomsun28 <tomsun28@outlook.com>
- Loading branch information
1 parent
2fa3b5a
commit 4a3e273
Showing
5 changed files
with
774 additions
and
1 deletion.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,96 @@ | ||
--- | ||
id: hbase_regionserver | ||
title: Monitoring HBase RegionServer Monitoring | ||
sidebar_label: HBase RegionServer Monitoring | ||
keywords: [Open-source monitoring system, Open-source database monitoring, RegionServer monitoring] | ||
--- | ||
> Collect and monitor common performance metrics for HBase RegionServer. | ||
**Protocol:** HTTP | ||
|
||
## Pre-Monitoring Operations | ||
|
||
Review the `hbase-site.xml` file to obtain the value of the `hbase.regionserver.info.port` configuration item, which is used for monitoring. | ||
|
||
## Configuration Parameters | ||
|
||
|
||
| Parameter Name | Parameter Description | | ||
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------- | | ||
| Target Host | The IPV4, IPV6, or domain name of the monitored entity. Note ⚠️ Do not include the protocol header (e.g., https://, http://). | | ||
| Port | The port number of the HBase regionserver, default is 16030, i.e., the value of the`hbase.regionserver.info.port` parameter | | ||
| Task Name | A unique name to identify this monitoring task. | | ||
| Query Timeout | Set the timeout for Kafka connections in milliseconds, default is 3000 ms. | | ||
| Collection Interval | The interval time for periodic data collection in seconds, with a minimum interval of 30 seconds. | | ||
| Probe Before Adding | Whether to probe and check the availability of monitoring before adding new monitoring, only proceed with the addition if the probe is successful. | | ||
| Description Note | Additional notes to identify and describe this monitoring, users can add notes here. | | ||
|
||
### Collection Metrics | ||
|
||
> All metric names are directly referenced from the official fields, hence there may be non-standard naming. | ||
#### Metric Set: server | ||
|
||
|
||
| Metric Name | Unit | Metric Description | | ||
| --------------------------------- | ----- | ------------------------------------------------------------------------- | | ||
| regionCount | None | Number of Regions | | ||
| readRequestCount | None | Number of read requests since cluster restart | | ||
| writeRequestCount | None | Number of write requests since cluster restart | | ||
| averageRegionSize | MB | Average size of a Region | | ||
| totalRequestCount | None | Total number of requests | | ||
| ScanTime_num_ops | None | Total number of Scan requests | | ||
| Append_num_ops | None | Total number of Append requests | | ||
| Increment_num_ops | None | Total number of Increment requests | | ||
| Get_num_ops | None | Total number of Get requests | | ||
| Delete_num_ops | None | Total number of Delete requests | | ||
| Put_num_ops | None | Total number of Put requests | | ||
| ScanTime_mean | None | Average time of a Scan request | | ||
| ScanTime_min | None | Minimum time of a Scan request | | ||
| ScanTime_max | None | Maximum time of a Scan request | | ||
| ScanSize_mean | bytes | Average size of a Scan request | | ||
| ScanSize_min | None | Minimum size of a Scan request | | ||
| ScanSize_max | None | Maximum size of a Scan request | | ||
| slowPutCount | None | Number of slow Put operations | | ||
| slowGetCount | None | Number of slow Get operations | | ||
| slowAppendCount | None | Number of slow Append operations | | ||
| slowIncrementCount | None | Number of slow Increment operations | | ||
| slowDeleteCount | None | Number of slow Delete operations | | ||
| blockCacheSize | None | Size of memory used by block cache | | ||
| blockCacheCount | None | Number of blocks in Block Cache | | ||
| blockCacheExpressHitPercent | None | Block cache hit ratio | | ||
| memStoreSize | None | Size of Memstore | | ||
| FlushTime_num_ops | None | Number of RS writes to disk/Memstore flushes | | ||
| flushQueueLength | None | Length of Region Flush queue | | ||
| flushedCellsSize | None | Size flushed to disk | | ||
| storeFileCount | None | Number of Storefiles | | ||
| storeCount | None | Number of Stores | | ||
| storeFileSize | None | Size of Storefiles | | ||
| compactionQueueLength | None | Length of Compaction queue | | ||
| percentFilesLocal | None | Percentage of HFile in local HDFS Data Node | | ||
| percentFilesLocalSecondaryRegions | None | Percentage of HFile for secondary region replicas in local HDFS Data Node | | ||
| hlogFileCount | None | Number of WAL files | | ||
| hlogFileSize | None | Size of WAL files | | ||
|
||
#### Metric Set: IPC | ||
|
||
|
||
| Metric Name | Unit | Metric Description | | ||
| ------------------------- | ---- | -------------------------------------- | | ||
| numActiveHandler | None | Current number of RITs | | ||
| NotServingRegionException | None | Number of RITs exceeding the threshold | | ||
| RegionMovedException | ms | Duration of the oldest RIT | | ||
| RegionTooBusyException | ms | Duration of the oldest RIT | | ||
|
||
#### Metric Set: JVM | ||
|
||
|
||
| Metric Name | Unit | Metric Description | | ||
| -------------------- | ---- | --------------------------------- | | ||
| MemNonHeapUsedM | None | Current active RegionServer list | | ||
| MemNonHeapCommittedM | None | Current offline RegionServer list | | ||
| MemHeapUsedM | None | Zookeeper list | | ||
| MemHeapCommittedM | None | Master node | | ||
| MemHeapMaxM | None | Cluster balance load times | | ||
| MemMaxM | None | RPC handle count | | ||
| GcCount | MB | Cluster data reception volume | |
2 changes: 1 addition & 1 deletion
2
home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hbase_master.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
97 changes: 97 additions & 0 deletions
97
home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/hbase_regionserver.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,97 @@ | ||
--- | ||
id: hbase_regionserver | ||
title: 监控 Hbase RegionServer监控 | ||
sidebar_label: Apache Hbase RegionServer | ||
keywords: [开源监控系统, 开源数据库监控, RegionServer监控] | ||
--- | ||
> 对Hbase RegionServer的通用性能指标进行采集监控 | ||
**使用协议:HTTP** | ||
|
||
## 监控前操作 | ||
|
||
查看 `hbase-site.xml` 文件,获取 `hbase.regionserver.info.port` 配置项的值,该值用作监控使用。 | ||
|
||
## 配置参数 | ||
|
||
|
||
| 参数名称 | 参数帮助描述 | | ||
| ------------ |---------------------------------------------------------------------| | ||
| 目标Host | 被监控的对端IPV4,IPV6或域名。注意⚠️不带协议头(eg: https://, http://)。 | | ||
| 端口 | hbase regionserver的端口号,默认为16030。即:`hbase.regionserver.info.port`参数值 | | ||
| 任务名称 | 标识此监控的名称,名称需要保证唯一性。 | | ||
| 查询超时时间 | 设置Kafka连接的超时时间,单位ms毫秒,默认3000毫秒。 | | ||
| 采集间隔 | 监控周期性采集数据间隔时间,单位秒,可设置的最小间隔为30秒 | | ||
| 是否探测 | 新增监控前是否先探测检查监控可用性,探测成功才会继续新增修改操作 | | ||
| 描述备注 | 更多标识和描述此监控的备注信息,用户可以在这里备注信息 | | ||
|
||
### 采集指标 | ||
|
||
> 所有指标名称均直接引用官方的字段,所以存在命名不规范。 | ||
#### 指标集合:server | ||
|
||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| -------------------- |-------|------------------------------------------| | ||
| regionCount | 无 | Region数量 | | ||
| readRequestCount | 无 | 重启集群后的读请求数量 | | ||
| writeRequestCount | 无 | 重启集群后的写请求数量 | | ||
| averageRegionSize | MB | 平均Region大小 | | ||
| totalRequestCount | 无 | 全部请求数量 | | ||
| ScanTime_num_ops | 无 | Scan 请求总量 | | ||
| Append_num_ops | 无 | Append 请求量 | | ||
| Increment_num_ops | 无 | Increment请求量 | | ||
| Get_num_ops | 无 | Get 请求量 | | ||
| Delete_num_ops | 无 | Delete 请求量 | | ||
| Put_num_ops | 无 | Put 请求量 | | ||
| ScanTime_mean | 无 | 平均 Scan 请求时间 | | ||
| ScanTime_min | 无 | 最小 Scan 请求时间 | | ||
| ScanTime_max | 无 | 最大 Scan 请求时间 | | ||
| ScanSize_mean | bytes | 平均 Scan 请求大小 | | ||
| ScanSize_min | 无 | 最小 Scan 请求大小 | | ||
| ScanSize_max | 无 | 最大 Scan 请求大小 | | ||
| slowPutCount | 无 | 慢操作次数/Put | | ||
| slowGetCount | 无 | 慢操作次数/Get | | ||
| slowAppendCount | 无 | 慢操作次数/Append | | ||
| slowIncrementCount | 无 | 慢操作次数/Increment | | ||
| slowDeleteCount | 无 | 慢操作次数/Delete | | ||
| blockCacheSize | 无 | 缓存块内存占用大小 | | ||
| blockCacheCount | 无 | 缓存块数量_Block Cache 中的 Block 数量 | | ||
| blockCacheExpressHitPercent | 无 | 读缓存命中率 | | ||
| memStoreSize | 无 | Memstore 大小 | | ||
| FlushTime_num_ops | 无 | RS写磁盘次数/Memstore flush 写磁盘次数 | | ||
| flushQueueLength | 无 | Region Flush 队列长度 | | ||
| flushedCellsSize | 无 | flush到磁盘大小 | | ||
| storeFileCount | 无 | Storefile 个数 | | ||
| storeCount | 无 | Store 个数 | | ||
| storeFileSize | 无 | Storefile 大小 | | ||
| compactionQueueLength | 无 | Compaction 队列长度 | | ||
| percentFilesLocal | 无 | Region 的 HFile 位于本地 HDFS Data Node的比例 | | ||
| percentFilesLocalSecondaryRegions | 无 | Region 副本的 HFile 位于本地 HDFS Data Node的比例 | | ||
| hlogFileCount | 无 | WAL 文件数量 | | ||
| hlogFileSize | 无 | WAL 文件大小 | | ||
|
||
#### 指标集合:IPC | ||
|
||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| --------------------- | ------ | ------------------- | | ||
| numActiveHandler | 无 | 当前的 RIT 数量 | | ||
| NotServingRegionException | 无 | 超过阈值的 RIT 数量 | | ||
| RegionMovedException | ms | 最老的RIT的持续时间 | | ||
| RegionTooBusyException | ms | 最老的RIT的持续时间 | | ||
|
||
#### 指标集合:JVM | ||
|
||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| ----------------------- | ----- | ------------------------ | | ||
| MemNonHeapUsedM | 无 | 当前活跃RegionServer列表 | | ||
| MemNonHeapCommittedM | 无 | 当前离线RegionServer列表 | | ||
| MemHeapUsedM | 无 | Zookeeper列表 | | ||
| MemHeapCommittedM | 无 | Master节点 | | ||
| MemHeapMaxM | 无 | 集群负载均衡次数 | | ||
| MemMaxM | 无 | RPC句柄数 | | ||
| GcCount | MB | 集群接收数据量 | | ||
|
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.