-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[feature] add apache yarn monitor (#1937)
Co-authored-by: zhangshenghang <shenghang.zhang@avrisdigital.com> Co-authored-by: zhangshenghang <admin@hadoop.wiki> Co-authored-by: crossoverJie <crossoverJie@gmail.com> Co-authored-by: yqxxgh <42080876+yqxxgh@users.noreply.github.com> Co-authored-by: Ceilzcx <48920254+Ceilzcx@users.noreply.github.com> Co-authored-by: aias00 <rokkki@163.com> Co-authored-by: tomsun28 <tomsun28@outlook.com> Co-authored-by: Logic <zqr10159@dromara.org>
- Loading branch information
1 parent
b2acd53
commit afec4bf
Showing
4 changed files
with
636 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
id: yarn | ||
title: Monitoring Apache Yarn Monitoring | ||
sidebar_label: Apache Yarn | ||
keywords: [Big Data Monitoring System, Apache Yarn Monitoring, ResourceManager Monitoring] | ||
--- | ||
|
||
> Hertzbeat monitors Apache Yarn node monitoring metrics. | ||
**Protocol Used: HTTP** | ||
|
||
## Pre-monitoring Actions | ||
|
||
Retrieve the HTTP monitoring port of Apache Yarn. Value: `yarn.resourcemanager.webapp.address` | ||
|
||
## Configuration Parameters | ||
|
||
| Parameter Name | Parameter Description | | ||
| ---------------- |----------------------------------------------------| | ||
| Target Host | IP address, IPV6, or domain name of the monitored endpoint. Without protocol header. | | ||
| Port | Monitoring port number of Apache Yarn, default is 8088. | | ||
| Query Timeout | Timeout for querying Apache Yarn, in milliseconds, default is 6000 milliseconds. | | ||
| Metrics Interval | Time interval for monitoring data collection, in seconds, minimum interval is 30 seconds. | | ||
|
||
### Collected Metrics | ||
|
||
#### Metric Set: ClusterMetrics | ||
|
||
| Metric Name | Unit | Metric Description | | ||
| ----------------------- | ---- | -----------------------------------------| | ||
| NumActiveNMs | | Number of currently active NodeManagers | | ||
| NumDecommissionedNMs | | Number of currently decommissioned NodeManagers | | ||
| NumDecommissioningNMs | | Number of nodes currently decommissioning | | ||
| NumLostNMs | | Number of lost nodes in the cluster | | ||
| NumUnhealthyNMs | | Number of unhealthy nodes in the cluster | | ||
|
||
#### Metric Set: JvmMetrics | ||
|
||
| Metric Name | Unit | Metric Description | | ||
| ----------------------- | ---- | -------------------------------------------- | | ||
| MemNonHeapCommittedM | MB | Current committed size of non-heap memory in JVM | | ||
| MemNonHeapMaxM | MB | Maximum available non-heap memory in JVM | | ||
| MemNonHeapUsedM | MB | Current used size of non-heap memory in JVM | | ||
| MemHeapCommittedM | MB | Current committed size of heap memory in JVM | | ||
| MemHeapMaxM | MB | Maximum available heap memory in JVM | | ||
| MemHeapUsedM | MB | Current used size of heap memory in JVM | | ||
| GcTimeMillis | | JVM GC time | | ||
| GcCount | | Number of JVM GC occurrences | | ||
|
||
#### Metric Set: QueueMetrics | ||
|
||
| Metric Name | Unit | Metric Description | | ||
| --------------------------- | ---- | -------------------------------------------- | | ||
| queue | | Queue name | | ||
| AllocatedVCores | | Allocated virtual cores (allocated) | | ||
| ReservedVCores | | Reserved cores | | ||
| AvailableVCores | | Available cores (unallocated) | | ||
| PendingVCores | | Blocked scheduling cores | | ||
| AllocatedMB | MB | Allocated (used) memory size | | ||
| AvailableMB | MB | Available memory (unallocated) | | ||
| PendingMB | MB | Blocked scheduling memory | | ||
| ReservedMB | MB | Reserved memory | | ||
| AllocatedContainers | | Number of allocated (used) containers | | ||
| PendingContainers | | Number of blocked scheduling containers | | ||
| ReservedContainers | | Number of reserved containers | | ||
| AggregateContainersAllocated| | Total aggregated containers allocated | | ||
| AggregateContainersReleased| | Total aggregated containers released | | ||
| AppsCompleted | | Number of completed applications | | ||
| AppsKilled | | Number of killed applications | | ||
| AppsFailed | | Number of failed applications | | ||
| AppsPending | | Number of pending applications | | ||
| AppsRunning | | Number of currently running applications | | ||
| AppsSubmitted | | Number of submitted applications | | ||
| running_0 | | Number of jobs running for less than 60 minutes | | ||
| running_60 | | Number of jobs running between 60 and 300 minutes | | ||
| running_300 | | Number of jobs running between 300 and 1440 minutes | | ||
| running_1440 | | Number of jobs running for more than 1440 minutes | | ||
|
||
#### Metric Set: runtime | ||
|
||
| Metric Name | Unit | Metric Description | | ||
| ----------------------- | ---- | --------------------------| | ||
| StartTime | | Startup timestamp | |
83 changes: 83 additions & 0 deletions
83
home/i18n/zh-cn/docusaurus-plugin-content-docs/current/help/yarn.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,83 @@ | ||
--- | ||
id: yarn | ||
title: 监控:Apache Yarn监控 | ||
sidebar_label: Apache Yarn | ||
keywords: [大数据监控系统, Apache Yarn监控, 资源管理器监控] | ||
--- | ||
|
||
> Hertzbeat 对 Apache Yarn 节点监控指标进行监控。 | ||
**使用协议:HTTP** | ||
|
||
## 监控前操作 | ||
|
||
获取 Apache Yarn 的 HTTP 监控端口。 取值:`yarn.resourcemanager.webapp.address` | ||
|
||
## 配置参数 | ||
|
||
| 参数名称 | 参数帮助描述 | | ||
| ---------------- |---------------------------------------| | ||
| 目标Host | 被监控的对端IPV4,IPV6或域名。不带协议头。 | | ||
| 端口 | Apache Yarn 的监控端口号,默认为8088。 | | ||
| 查询超时时间 | 查询 Apache Yarn 的超时时间,单位毫秒,默认6000毫秒。 | | ||
| 指标采集间隔 | 监控数据采集的时间间隔,单位秒,最小间隔为30秒。 | | ||
|
||
### 采集指标 | ||
|
||
#### 指标集合:ClusterMetrics | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| -------------------- | -------- | ---------------------------------- | | ||
| NumActiveNMs | | 当前存活的 NodeManager 个数 | | ||
| NumDecommissionedNMs | | 当前 Decommissioned 的 NodeManager 个数 | | ||
| NumDecommissioningNMs| | 集群正在下线的节点数 | | ||
| NumLostNMs | | 集群丢失的节点数 | | ||
| NumUnhealthyNMs | | 集群不健康的节点数 | | ||
|
||
#### 指标集合:JvmMetrics | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| -------------------- | -------- | ------------------------------------ | | ||
| MemNonHeapCommittedM | MB | JVM当前非堆内存大小已提交大小 | | ||
| MemNonHeapMaxM | MB | JVM非堆最大可用内存 | | ||
| MemNonHeapUsedM | MB | JVM当前已使用的非堆内存大小 | | ||
| MemHeapCommittedM | MB | JVM当前已使用堆内存大小 | | ||
| MemHeapMaxM | MB | JVM堆内存最大可用内存 | | ||
| MemHeapUsedM | MB | JVM当前已使用堆内存大小 | | ||
| GcTimeMillis | | JVM GC时间 | | ||
| GcCount | | JVM GC次数 | | ||
|
||
#### 指标集合:QueueMetrics | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| ------------------------ | -------- | ------------------------------------ | | ||
| queue | | 队列名称 | | ||
| AllocatedVCores | | 分配的虚拟核数(已分配) | | ||
| ReservedVCores | | 预留核数 | | ||
| AvailableVCores | | 可用核数(尚未分配) | | ||
| PendingVCores | | 阻塞调度核数 | | ||
| AllocatedMB | MB | 已分配(已用)的内存大小 | | ||
| AvailableMB | MB | 可用内存(尚未分配) | | ||
| PendingMB | MB | 阻塞调度内存 | | ||
| ReservedMB | MB | 预留内存 | | ||
| AllocatedContainers | | 已分配(已用)的container数 | | ||
| PendingContainers | | 阻塞调度container个数 | | ||
| ReservedContainers | | 预留container数 | | ||
| AggregateContainersAllocated | | 累积的container分配总数 | | ||
| AggregateContainersReleased | | 累积的container释放总数 | | ||
| AppsCompleted | | 完成的任务数 | | ||
| AppsKilled | | 被杀掉的任务数 | | ||
| AppsFailed | | 失败的任务数 | | ||
| AppsPending | | 阻塞的任务数 | | ||
| AppsRunning | | 提正在运行的任务数 | | ||
| AppsSubmitted | | 提交过的任务数 | | ||
| running_0 | | 运行时间小于60分钟的作业个数 | | ||
| running_60 | | 运行时间介于60~300分钟的作业个数 | | ||
| running_300 | | 运行时间介于300~1440分钟的作业个数 | | ||
| running_1440 | | 运行时间大于1440分钟的作业个数 | | ||
|
||
#### 指标集合:runtime | ||
|
||
| 指标名称 | 指标单位 | 指标帮助描述 | | ||
| -------------------- | -------- | ---------------------------- | | ||
| StartTime | | 启动时间戳 | |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.