Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature]Add monitoring for Hbase Master #1820

Merged
merged 4 commits into from
Apr 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 62 additions & 0 deletions home/docs/help/hbase_master.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
id: hbase_master
title: Monitoring Hbase Master
sidebar_label: HbaseMaster Monitoring
keywords: [Open Source Monitoring System, Open Source Database Monitoring, HbaseMaster Monitoring]
---
> Collect monitoring data for general performance metrics of Hbase Master.

**Protocol: HTTP**

## Pre-monitoring steps

Check the `hbase-site.xml` file to obtain the value of the `hbase.master.info.port` configuration item, which is used for monitoring.

## Configuration Parameters


| Parameter Name | Parameter Description |
| ------------------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| Target Host | The IPv4, IPv6, or domain name of the monitored peer. Note: without protocol header (e.g., https://, http://). |
| Port | The port number of the Hbase master, default is 16010. That is, the value of the`hbase.master.info.port` parameter. |
| Task Name | The name identifying this monitoring, which needs to be unique. |
| Query Timeout | The timeout setting for Kafka connection, in milliseconds, with a default of 3000 milliseconds. |
| Collection Interval | The periodic collection interval for monitoring data, in seconds, with the minimum allowable interval being 30 seconds. |
| Probe | Whether to probe and check the availability of monitoring before adding new monitoring, and proceed with the addition or modification operation only if the probe is successful. |
| Description | Additional notes and descriptions for this monitoring, users can add notes here. |

### Collected Metrics

#### Metric Set: server


| Metric Name | Unit | Metric Description |
| -------------------- | ---- | --------------------------------------- |
| numRegionServers | none | Number of currently alive RegionServers |
| numDeadRegionServers | none | Number of currently dead RegionServers |
| averageLoad | none | Cluster average load |
| clusterRequests | none | Total number of cluster requests |

#### Metric Set: Rit


| Metric Name | Unit | Metric Description |
| -------------------- | ---- | -------------------------------- |
| ritnone | none | Current number of RIT |
| ritnoneOverThreshold | none | Number of RIT over the threshold |
| ritOldestAge | ms | Duration of the oldest RIT |

#### Metric Set: basic


| Metric Name | Unit | Metric Description |
| ----------------------- | ---- | ------------------------------------------- |
| liveRegionServers | none | List of currently active RegionServers |
| deadRegionServers | none | List of currently offline RegionServers |
| zookeeperQuorum | none | Zookeeper list |
| masterHostName | none | Master node |
| BalancerCluster_num_ops | none | Number of cluster load balancing operations |
| numActiveHandler | none | Number of RPC handlers |
| receivedBytes | MB | Cluster received data volume |
| sentBytes | MB | Cluster sent data volume (MB) |
| clusterRequests | none | Total number of cluster requests |
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
---
id: hbase_master
title: 监控:Hbase Master监控
sidebar_label: HbaseMaster监控
keywords: [开源监控系统, 开源数据库监控, HbaseMaster监控]
---
> 对Hbase Master的通用性能指标进行采集监控

**使用协议:HTTP**

## 监控前操作

查看 `hbase-site.xml` 文件,获取 `hbase.master.info.port` 配置项的值,该值用作监控使用。

## 配置参数


| 参数名称 | 参数帮助描述 |
| ------------ | ------------------------------------------------------------------------- |
| 目标Host | 被监控的对端IPV4,IPV6或域名。注意⚠️不带协议头(eg: https://, http://)。 |
| 端口 | hbase master的端口号,默认为16010。即:`hbase.master.info.port`参数值 |
| 任务名称 | 标识此监控的名称,名称需要保证唯一性。 |
| 查询超时时间 | 设置Kafka连接的超时时间,单位ms毫秒,默认3000毫秒。 |
| 采集间隔 | 监控周期性采集数据间隔时间,单位秒,可设置的最小间隔为30秒 |
| 是否探测 | 新增监控前是否先探测检查监控可用性,探测成功才会继续新增修改操作 |
| 描述备注 | 更多标识和描述此监控的备注信息,用户可以在这里备注信息 |

### 采集指标

#### 指标集合:server


| 指标名称 | 指标单位 | 指标帮助描述 |
| -------------------- |----| ---------------------------- |
| numRegionServers | 无 | 当前存活的 RegionServer 个数 |
| numDeadRegionServers | 无 | 当前Dead的 RegionServer 个数 |
| averageLoad | 无 | 集群平均负载 |
| clusterRequests | 无 | 集群请求数量 |

#### 指标集合:Rit


| 指标名称 | 指标单位 | 指标帮助描述 |
| --------------------- | ------ | ------------------- |
| ritCount | 无 | 当前的 RIT 数量 |
| ritCountOverThreshold | 无 | 超过阈值的 RIT 数量 |
| ritOldestAge | ms | 最老的RIT的持续时间 |

#### 指标集合:basic


| 指标名称 | 指标单位 | 指标帮助描述 |
| ----------------------- | ----- | ------------------------ |
| liveRegionServers | 无 | 当前活跃RegionServer列表 |
| deadRegionServers | 无 | 当前离线RegionServer列表 |
| zookeeperQuorum | 无 | Zookeeper列表 |
| masterHostName | 无 | Master节点 |
| BalancerCluster_num_ops | 无 | 集群负载均衡次数 |
| numActiveHandler | 无 | RPC句柄数 |
| receivedBytes | MB | 集群接收数据量 |
| sentBytes | MB | 集群发送数据量(MB) |
| clusterRequests | 无 | 集群总请求数量 |
265 changes: 265 additions & 0 deletions manager/src/main/resources/define/app-hbase_master.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,265 @@
# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements. See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# The monitoring type category:service-application service monitoring db-database monitoring custom-custom monitoring os-operating system monitoring
category: bigdata
# The monitoring type eg: linux windows tomcat mysql aws...
app: hbase_master
# The monitoring i18n name
name:
zh-CN: Apache Hbase Master
en-US: Apache Hbase Master
# The description and help of this monitoring type
help:
zh-CN: Hertzbeat 对 Hbase 数据库 Master 节点监控指标进行监控。<br>您可以点击 “<i>新建 Apache Hbase Master</i>” 并进行配置,或者选择“<i>更多操作</i>”,导入已有配置。
en-US: Hertzbeat monitors the Master node monitoring indicators of the Hbase database. <br>You can click "<i>New Apache Hbase Master</i>" to configure, or select "<i>More Actions</i>" to import an existing configuration.
zh-TW: Hertzbeat 對 Hbase 數據庫 Master 节點監控指標進行監控。<br>您可以點擊 “<i>新建 Apache Hbase Master</i>” 並進行配置,或者選擇“<i>更多操作</i>”,導入已有配置。

helpLink:
zh-CN: https://hertzbeat.apache.org/zh-cn/docs/help/hbase_master/
en-US: https://hertzbeat.apache.org/docs/help/hbase_master/
# Input params define for monitoring(render web ui by the definition)
params:
# field-param field key
- field: host
# name-param field display i18n name
name:
zh-CN: 目标Host
en-US: Target Host
# type-param field type(most mapping the html input type)
type: host
# required-true or false
required: true
# field-param field key
- field: port
# name-param field display i18n name
name:
zh-CN: 端口
en-US: Port
# type-param field type(most mapping the html input type)
type: number
# when type is number, range is required
range: '[0,65535]'
# required-true or false
required: true
# default value
defaultValue: 16010
# field-param field key
- field: timeout
# name-param field display i18n name
name:
zh-CN: 查询超时时间
en-US: Query Timeout
# type-param field type(most mapping the html input type)
type: number
# required-true or false
required: false
# hide param-true or false
hide: true
# default value
defaultValue: 6000
# collect metrics config list
metrics:
# metrics - Server
- name: Server
# metrics scheduling priority(0->127)->(high->low), metrics with the same priority will be scheduled in parallel
# priority 0's metrics is availability metrics, it will be scheduled first, only availability metrics collect success will the scheduling continue
priority: 0
# collect metrics content
fields:
# field-metric name, type-metric type(0-number,1-string), unit-metric unit('%','ms','MB'), label-whether it is a metrics label field
- field: numRegionServers
type: 0
label: true
i18n:
zh-CN: 活跃RegionServer数量
en-US: numRegionServers
- field: numDeadRegionServers
type: 0
label: true
i18n:
zh-CN: 异常RegionServer数量
en-US: numDeadRegionServers
- field: averageLoad
type: 0
label: true
i18n:
zh-CN: 集群平均负载
en-US: averageLoad
- field: clusterRequests
type: 0
label: true
i18n:
zh-CN: 集群请求数量
en-US: clusterRequests
# (optional)metrics field alias name, it is used as an alias field to map and convert the collected data and metrics field
aliasFields:
- $.numRegionServers
- $.numDeadRegionServers
- $.averageLoad
- $.clusterRequests
calculates:
- numRegionServers=$.numRegionServers
- numDeadRegionServers=$.numDeadRegionServers
- averageLoad=$.averageLoad
- clusterRequests=$.clusterRequests
protocol: http
http:
host: ^_^host^_^
port: ^_^port^_^
url: /jmx
method: GET
ssl: ^_^ssl^_^
parseType: jsonPath
parseScript: '$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")]'
- name: Rit
# metrics scheduling priority(0->127)->(high->low), metrics with the same priority will be scheduled in parallel
# priority 0's metrics is availability metrics, it will be scheduled first, only availability metrics collect success will the scheduling continue
priority: 0
# collect metrics content
fields:
# field-metric name, type-metric type(0-number,1-string), unit-metric unit('%','ms','MB'), label-whether it is a metrics label field
- field: ritCount
type: 0
label: true
i18n:
zh-CN: 当前的 RIT 数量
en-US: ritCount
- field: ritCountOverThreshold
type: 0
label: true
i18n:
zh-CN: 超过阈值的 RIT 数量
en-US: ritCountOverThreshold
- field: ritOldestAge
type: 0
label: true
i18n:
zh-CN: 最老的RIT的持续时间
en-US: ritOldestAge
# (optional)metrics field alias name, it is used as an alias field to map and convert the collected data and metrics field
aliasFields:
- $.ritCount
- $.ritCountOverThreshold
- $.ritOldestAge
calculates:
- ritCount=$.ritCount
- ritCountOverThreshold=$.ritCountOverThreshold
- ritOldestAge=$.ritOldestAge
protocol: http
http:
host: ^_^host^_^
port: ^_^port^_^
url: /jmx
method: GET
ssl: ^_^ssl^_^
parseType: jsonPath
parseScript: '$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=AssignmentManager")]'
- name: basic
# metrics scheduling priority(0->127)->(high->low), metrics with the same priority will be scheduled in parallel
# priority 0's metrics is availability metrics, it will be scheduled first, only availability metrics collect success will the scheduling continue
priority: 0
# collect metrics content
fields:
# field-metric name, type-metric type(0-number,1-string), unit-metric unit('%','ms','MB'), label-whether it is a metrics label field
- field: liveRegionServers
type: 1
label: true
i18n:
zh-CN: 当前活跃RegionServer列表
en-US: liveRegionServers
- field: deadRegionServers
type: 1
label: true
i18n:
zh-CN: 当前离线RegionServer列表
en-US: deadRegionServers
- field: zookeeperQuorum
type: 1
label: true
i18n:
zh-CN: Zookeeper列表
en-US: zookeeperQuorum
- field: masterHostName
type: 1
label: true
i18n:
zh-CN: Master节点
en-US: masterHostName
- field: BalancerCluster_num_ops
type: 0
label: true
i18n:
zh-CN: 集群负载均衡次数
en-US: BalancerCluster_num_ops
- field: numActiveHandler
type: 0
label: true
i18n:
zh-CN: RPC句柄数
en-US: numActiveHandler
- field: receivedBytes
type: 0
label: true
unit: 'MB'
i18n:
zh-CN: 集群接收数据量(MB)
en-US: receivedBytes
- field: sentBytes
type: 0
label: true
unit: 'MB'
i18n:
zh-CN: 集群发送数据量(MB)
en-US: sentBytes
- field: clusterRequests
type: 0
label: true
i18n:
zh-CN: 集群总请求数量
en-US: clusterRequests
# (optional)metrics field alias name, it is used as an alias field to map and convert the collected data and metrics field
aliasFields:
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.liveRegionServers']
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.deadRegionServers']
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.zookeeperQuorum']
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.Hostname']
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Balancer")].BalancerCluster_num_ops
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=IPC")].numActiveHandler
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=IPC")].receivedBytes
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=IPC")].sentBytes
- $.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].clusterRequests
calculates:
- liveRegionServers=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.liveRegionServers']
- deadRegionServers=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.deadRegionServers']
- zookeeperQuorum=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.zookeeperQuorum']
- masterHostName=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].['tag.Hostname']
- BalancerCluster_num_ops=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Balancer")].BalancerCluster_num_ops
- numActiveHandler=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=IPC")].numActiveHandler
- receivedBytes=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=IPC")].receivedBytes
- sentBytes=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=IPC")].sentBytes
- clusterRequests=$.beans[?(@.name == "Hadoop:service=HBase,name=Master,sub=Server")].clusterRequests
units:
- receivedBytes=B->MB
- sentBytes=B->MB
protocol: http
http:
host: ^_^host^_^
port: ^_^port^_^
url: /jmx
method: GET
ssl: ^_^ssl^_^
parseType: jsonPath
parseScript: '$'
Loading