Skip to content

Commit

Permalink
[manager,collector] support docker metrics monitor (#438)
Browse files Browse the repository at this point in the history
  [script] bugfix if logs dir not exist, create logs dir (#409)

  [manager,collector] support dm database monitor (#410)

  [manager] TagControllerTest Test class writing completed (#386)

  [collector] bugfix: Fixed expression:(A | B), A and B are null, print the error log (#387)

  [home] update constants.js jpom link (#388)

  [doc] update commit user (#389)

  [web-app] notice monitors.detail.time-series.unavailable (#390)

  [home] notice jvm code_cache only support jdk8 (#391)

  [home] support time-series db dependency iotdb deploy doc (#392)

  [home] support time-series db dependency iotdb deploy doc

  [warehouse] support iotdb connection available check when init

  [home] support time-series db dependency iotdb deploy doc

  [manager] add app service impl unit test (#393)

  [docs] update hertzbeat arch pic (#394)

  [home] update constants.js northstar (#395)

  [home] update constants.js sa-token url (#396)

  [docs] add hertzbeat roadmap  (#397)

  [docs] add roadmap

  Add @click33 as a contributor

  Add @bwcx-jzy as a contributor

  Add @kevinhuangwl as a contributor

  Add @TJxiaobao as a contributor

  Update @TJxiaobao as a contributor

  [docs] fix typo repair (#398)

Co-authored-by: 高兴存 <gxc01514416@alibaba-inc.com>

  DM DB monitoring

  Gradual improvement of DM monitoring

  [hertzbeat] update dm collect

Co-authored-by: zcx <48920254+Ceilzcx@users.noreply.github.com>
Co-authored-by: 蒋小小 <bwcx_jzy@163.com>
Co-authored-by: tomsun28 <tomsun28@outlook.com>
Co-authored-by: Kevin Huang <12959229@qq.com>
Co-authored-by: click33 <36243476+click33@users.noreply.github.com>
Co-authored-by: 高兴存 <gxc01514416@alibaba-inc.com>

  [home] add DM db document supplement (#411)

  DM Document Supplement

  [home] support dm help i18n

  [home] support dm help i18n

Co-authored-by: 高兴存 <gxc01514416@alibaba-inc.com>
Co-authored-by: tomsun28 <tomsun28@outlook.com>

  [warehouse] bugfix RealTimeRedisDataStorage wrong extend from (#413)

  [collector] end the query closed the dataSet (#414)

  [alerter] bugfix monitor status not change when alert (#415)

  [home] support algolia search (#416)

  Test Docker

  [manager] update docker

  [manager] update docker

  [manager] update docker

  [manager] update docker

  [manager] update docker

  [manager] update docker

  Supplement to Docker monitoring documents

  Supplement to Docker monitoring documents

  [home] update doc

  [home] update doc

  [home] update doc

Co-authored-by: zcx <48920254+Ceilzcx@users.noreply.github.com>
Co-authored-by: 蒋小小 <bwcx_jzy@163.com>
Co-authored-by: tomsun28 <tomsun28@outlook.com>
Co-authored-by: Kevin Huang <12959229@qq.com>
Co-authored-by: click33 <36243476+click33@users.noreply.github.com>
Co-authored-by: 高兴存 <gxc01514416@alibaba-inc.com>
  • Loading branch information
7 people authored Nov 13, 2022
1 parent b70dece commit 61f8edf
Show file tree
Hide file tree
Showing 8 changed files with 505 additions and 17 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;

/**
* Indicator group collection task and response data scheduler
Expand Down Expand Up @@ -189,7 +190,7 @@ public void dispatchCollectData(Timeout timeout, Metrics metrics, CollectRep.Met
metricsTimeoutMonitorMap.remove(job.getId() + "-" + metrics.getName() + "-sub-" + metrics.getSubTaskId());
boolean isLastTask = metrics.consumeSubTaskResponse(metricsData);
if (isLastTask) {
metricsData = metrics.getSubTaskDataTmp();
metricsData = metrics.getSubTaskDataRef().get();
} else {
return;
}
Expand Down Expand Up @@ -236,20 +237,21 @@ public void dispatchCollectData(Timeout timeout, Metrics metrics, CollectRep.Met
// use pre collect metrics data to replace next metrics config params
List<Map<String, Configmap>> configmapList = getConfigmapFromPreCollectData(metricsData);
metricsSet.forEach(metricItem -> {
JsonElement jsonElement = GSON.toJsonTree(metricItem);
if (configmapList != null && !configmapList.isEmpty() && CollectUtil.containCryPlaceholder(jsonElement)) {
if (configmapList != null && !configmapList.isEmpty() && CollectUtil.containCryPlaceholder(GSON.toJsonTree(metricItem))) {
AtomicInteger subTaskNum = new AtomicInteger(configmapList.size());
AtomicReference<CollectRep.MetricsData> metricsDataReference = new AtomicReference<>();
for (int index = 0; index < configmapList.size(); index ++) {
Map<String, Configmap> configmap = configmapList.get(index);
jsonElement = GSON.toJsonTree(metricItem);
CollectUtil.replaceCryPlaceholder(jsonElement, configmap);
metricItem = GSON.fromJson(jsonElement, Metrics.class);
metricItem.setSubTaskNum(subTaskNum);
metricItem.setSubTaskId(index);
MetricsCollect metricsCollect = new MetricsCollect(metricItem, timeout, this, unitConvertList);
JsonElement metricJson = GSON.toJsonTree(metricItem);
CollectUtil.replaceCryPlaceholder(metricJson, configmap);
Metrics metric = GSON.fromJson(metricJson, Metrics.class);
metric.setSubTaskNum(subTaskNum);
metric.setSubTaskId(index);
metric.setSubTaskDataRef(metricsDataReference);
MetricsCollect metricsCollect = new MetricsCollect(metric, timeout, this, unitConvertList);
jobRequestQueue.addJob(metricsCollect);
metricsTimeoutMonitorMap.put(job.getId() + "-" + metricItem.getName() + "-sub-" + index,
new MetricsTime(System.currentTimeMillis(), metricItem, timeout));
metricsTimeoutMonitorMap.put(job.getId() + "-" + metric.getName() + "-sub-" + index,
new MetricsTime(System.currentTimeMillis(), metric, timeout));
}
} else {
MetricsCollect metricsCollect = new MetricsCollect(metricItem, timeout, this, unitConvertList);
Expand Down
13 changes: 7 additions & 6 deletions common/src/main/java/com/usthe/common/entity/job/Metrics.java
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@
import java.util.List;
import java.util.Objects;
import java.util.concurrent.atomic.AtomicInteger;
import java.util.concurrent.atomic.AtomicReference;

/**
* Details of the collection of indicators collected by monitoring
Expand Down Expand Up @@ -165,7 +166,7 @@ public class Metrics {
* collector使用 - 临时存储分级任务指标响应数据
*/
@JsonIgnore
private transient CollectRep.MetricsData subTaskDataTmp;
private transient AtomicReference<CollectRep.MetricsData> subTaskDataRef;

/**
* collector use - Temporarily store subTask running num
Expand Down Expand Up @@ -200,19 +201,19 @@ public boolean consumeSubTaskResponse(CollectRep.MetricsData metricsData) {
}
synchronized (subTaskNum) {
int index = subTaskNum.decrementAndGet();
if (subTaskDataTmp == null) {
subTaskDataTmp = metricsData;
if (subTaskDataRef.get() == null) {
subTaskDataRef.set(metricsData);
} else {
if (metricsData.getValuesCount() > 1) {
CollectRep.MetricsData.Builder dataBuilder = CollectRep.MetricsData.newBuilder(subTaskDataTmp);
if (metricsData.getValuesCount() >= 1) {
CollectRep.MetricsData.Builder dataBuilder = CollectRep.MetricsData.newBuilder(subTaskDataRef.get());
for (CollectRep.ValueRow valueRow : metricsData.getValuesList()) {
if (valueRow.getColumnsCount() == dataBuilder.getFieldsCount()) {
dataBuilder.addValues(valueRow);
} else {
log.error("consume subTask data value not mapping filed");
}
}
subTaskDataTmp = dataBuilder.build();
subTaskDataRef.set(dataBuilder.build());
}
}
return index == 0;
Expand Down
106 changes: 106 additions & 0 deletions home/docs/help/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
id: docker
title: 监控:Docker 监控
sidebar_label: Docker 容器监控

---

> 对Docker容器的通用性能指标进行采集监控。

## 监控前操作

如果想要监控 `Docker` 中的容器信息,则需要按照一下步骤打开端口,让采集请求获取到对应的信息。

**1、编辑docker.server文件:**

```shell
vi /usr/lib/systemd/system/docker.service
```

找到 **[Service]** 节点,修改 ExecStart 属性,增加 `-H tcp://0.0.0.0:2375`

```shell
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock -H tcp://0.0.0.0:2375
```

这样相当于对外开放的是 **2375** 端口,当然也可以根据自己情况修改成其他的。

**2、重新加载Docker配置生效:**

```shell
systemctl daemon-reload
systemctl restart docker
```

**注意:记得在服务器中台打开 `2375` 端口号。**

**3、如果上述方法不行则:**

在服务器内部打开 `2375` 端口号。

```shell
firewall-cmd --zone=public --add-port=2375/tcp --permanent
firewall-cmd --reload
```





### 配置参数

| 参数名称 | 参数帮助描述 |
| ------------ | ------------------------------------------------------------ |
| 监控Host | 被监控的对端IPV4,IPV6或域名。注意⚠️不带协议头(eg: https://, http://)。 |
| 监控名称 | 标识此监控的名称,名称需要保证唯一性。 |
| 端口 | 数据库对外提供的端口,默认为2375。 |
| 查询超时时间 | 设置获取Docker服务器API接口时的超时时间,单位ms毫秒,默认3000毫秒。 |
| 器名称 | 一般是监控所有运行中的容器信息。 |
| 用户名 | 连接用户名,可选 |
| 密码 | 连接密码,可选 |
| URL | 数据库连接URL,可选,若配置,则URL里面的数据库名称,用户名密码等参数会覆盖上面配置的参数 |
| 采集间隔 | 监控周期性采集数据间隔时间,单位秒,可设置的最小间隔为10秒 |
| 是否探测 | 新增监控前是否先探测检查监控可用性,探测成功才会继续新增修改操作 |
| 描述备注 | 更多标识和描述此监控的备注信息,用户可以在这里备注信息 |

### 采集指标

#### 指标集合:system

| 指标名称 | 指标单位 | 指标帮助描述 |
| ------------------ | -------- | -------------------------------------- |
| Name || 服务器名称 |
| version || docker本版号 |
| os || 服务器版本 例如:linux x86_64 |
| root_dir || docker文件夹目录 例如:/var/lib/docker |
| containers || 容器总数(在运行+未运行) |
| containers_running || 运行中的容器数目 |
| containers_paused || 暂停中的容器数目 |
| images || 容器景象的总数目。 |
| ncpu || NCPU |
| mem_total | MB | 占用的内存总大小 |
| system_time || 系统时间 |

#### 指标集合:containers

| 指标名称 | 指标单位 | 指标帮助描述 |
| -------- | -------- | ---------------------- |
| id || Docker中容器的ID |
| name || Docker容器中的容器名称 |
| image || Docker容器使用的镜像 |
| command || Docker中的默认启动命令 |
| state || Docker中容器的运行状态 |
| status || Docker容器中的更新时间 |

#### 指标集合:stats

| 指标名称 | 指标单位 | 指标帮助描述 |
| ---------------- | -------- | ---------------------------- |
| name || Docker容器中的名字 |
| available_memory | MB | Docker容器可以利用的内存大小 |
| used_memory | MB | Docker容器已经使用的内存大小 |
| memory_usage || Docker容器的内存使用率 |
| cpu_delta || Docker容器已经使用的CPU数量 |
| number_cpus || Docker容器可以使用的CPU数量 |
| cpu_usage || Docker容器CPU使用率 |
4 changes: 4 additions & 0 deletions home/i18n/en/docusaurus-plugin-content-docs/current.json
Original file line number Diff line number Diff line change
Expand Up @@ -54,5 +54,9 @@
"sidebar.docs.category.Others": {
"message": "Others",
"description": "The label for category Others in sidebar docs"
},
"sidebar.docs.category.云原生": {
"message": "CloudNative",
"description": "The label for category 云原生 in sidebar docs"
}
}
106 changes: 106 additions & 0 deletions home/i18n/en/docusaurus-plugin-content-docs/current/help/docker.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
---
id: docker
title: Monitor:Docker Monitor
sidebar_label: Docker Monitor

---

> Collect and monitor general performance Metrics of Docker containers.

## Pre-monitoring operations

If you want to monitor the container information in `Docker`, you need to open the port according to the following steps, so that the collection request can obtain the corresponding information.

**1. Edit the docker.server file:**

````shell
vi /usr/lib/systemd/system/docker.service
````

Find the **[Service]** node, modify the ExecStart property, and add `-H tcp://0.0.0.0:2375`

````shell
ExecStart=/usr/bin/dockerd -H fd:// --containerd=/run/containerd/containerd.sock -H tcp://0.0.0.0:2375
````

This is equivalent to the **2375** port that is open to the outside world. Of course, it can be modified to other ports according to your own situation.

**2. Reload the Docker configuration to take effect:**

```shell
systemctl daemon-reload
systemctl restart docker
````

**Note: Remember to open the `2375` port number in the server console. **

**3. If the above method does not work:**

Open the `2375` port number inside the server.

```shell
firewall-cmd --zone=public --add-port=2375/tcp --permanent
firewall-cmd --reload
````
### Configuration parameters
| Parameter name | Parameter help description |
| ------------ | ------------------------------- |
| Monitor Host | Monitored peer IPV4, IPV6 or domain name. Note ⚠️ without protocol headers (eg: https://, http://). |
| Monitor Name | Identifies the name of this monitor. The name needs to be unique. |
| Port | The port provided by the database externally, the default is 2375. |
| Query Timeout | Set the timeout when getting the Docker server API interface, in ms, the default is 3000 ms. |
| Container Name | Generally monitors all running container information. |
| username | connection username, optional |
| password | connection password, optional |
| URL | Database connection URL, optional, if configured, the parameters such as database name, username and password in the URL will override the parameters configured above |
| Collection Interval | Monitor periodical collection data interval, in seconds, the minimum interval that can be set is 10 seconds |
| Whether to detect | Whether to detect and check the availability of monitoring before adding monitoring, and then continue to add and modify operations if the detection is successful |
| Description Remarks | More remarks that identify and describe this monitoring, users can remark information here |
### Collect metrics
#### Metric collection: system
| Metric Name | Metric Unit | Metric Help Description |
| ------------------ | -------- | ----------------------- |
| Name | None | Server Name |
| version | none | docker version number |
| os | none | server version eg: linux x86_64 |
| root_dir | none | docker folder directory eg: /var/lib/docker |
| containers | None | Total number of containers (running + not running) |
| containers_running | None | Number of running containers |
| containers_paused | none | number of containers in pause |
| images | None | The total number of container images. |
| ncpu | none | ncpu |
| mem_total | MB | Total size of memory used |
| system_time | none | system time |
#### Metric collection: containers
| Metric Name | Metric Unit | Metric Help Description |
| -------- | -------- | ------------ |
| id | None | The ID of the container in Docker |
| name | None | The container name in the Docker container |
| image | None | Image used by the Docker container |
| command | None | Default startup command in Docker |
| state | None | The running state of the container in Docker |
| status | None | Update time in Docker container |
#### Metrics collection: stats
| Metric Name | Metric Unit | Metric Help Description |
| ---------------- | -------- | ------------------ |
| name | None | The name in the Docker container |
| available_memory | MB | The amount of memory that the Docker container can utilize |
| used_memory | MB | The amount of memory already used by the Docker container |
| memory_usage | None | Memory usage of the Docker container |
| cpu_delta | None | The number of CPUs already used by the Docker container |
| number_cpus | None | The number of CPUs that the Docker container can use |
| cpu_usage | None | Docker container CPU usage |
7 changes: 7 additions & 0 deletions home/sidebars.json
Original file line number Diff line number Diff line change
Expand Up @@ -95,6 +95,13 @@
"help/tomcat"
]
},
{
"type": "category",
"label": "云原生",
"items": [
"help/docker"
]
},
{
"type": "category",
"label": "阈值告警配置",
Expand Down
Loading

0 comments on commit 61f8edf

Please sign in to comment.