Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tiup cluster: display down monitoring as up #1736

Closed
zhongzc opened this issue Jan 21, 2022 · 0 comments · Fixed by #1742
Closed

tiup cluster: display down monitoring as up #1736

zhongzc opened this issue Jan 21, 2022 · 0 comments · Fixed by #1742
Assignees
Labels
type/bug Categorizes issue as related to a bug.
Milestone

Comments

@zhongzc
Copy link

zhongzc commented Jan 21, 2022

Bug Report

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    deployed 2 monitorings

    [root@tiup-0 ~]# tiup cluster display topsql-multi-monitors
    Starting component `cluster`: /root/.tiup/components/cluster/v1.8.2/tiup-cluster display topsql-multi-monitors
    Cluster type:       tidb
    Cluster name:       topsql-multi-monitors
    Cluster version:    nightly
    Deploy user:        tidb-topsql-multi-monitors
    SSH type:           builtin
    Dashboard URL:      http://pd-peer:2379/dashboard
    ID               Role        Host       Ports        OS/Arch       Status   Data Dir                                          Deploy Dir
    --               ----        ----       -----        -------       ------   --------                                          ----------
    pd-peer:2379     pd          pd-peer    2379/2380    linux/x86_64  Up|L|UI  /tiup-topsql-multi-monitors/data/pd-2379          /tiup-topsql-multi-monitors/deploy/pd-2379
    tiup-peer:9090   prometheus  tiup-peer  9090/12020   linux/x86_64  Up       /tiup-topsql-multi-monitors/data/prometheus-9090  /tiup-topsql-multi-monitors/deploy/prometheus-9090
    tiup-peer:9190   prometheus  tiup-peer  9190/12120   linux/x86_64  Up       /tiup-topsql-multi-monitors/data/prometheus-9190  /tiup-topsql-multi-monitors/deploy/prometheus-9190
    tidb-peer:4000   tidb        tidb-peer  4000/10080   linux/x86_64  Up       -                                                 /tiup-topsql-multi-monitors/deploy/tidb-4000
    tikv-peer:20160  tikv        tikv-peer  20160/20180  linux/x86_64  Up       /tiup-topsql-multi-monitors/data/tikv-20160       /tiup-topsql-multi-monitors/deploy/tikv-20160
    Total nodes: 5

    running processes were expected

    [root@tiup-0 ~]# ps aux | grep prometheus-
    tidb-to+   12741  3.0  0.1 915384 222240 ?       Ssl  09:55   0:02 bin/prometheus/prometheus --config.file=/tiup-topsql-multi-monitors/deploy/prometheus-9090/conf/prometheus.yml --web.listen-address=:9090 --web.external-url=http://tiup-peer:9090/ --web.enable-admin-api --log.level=info --storage.tsdb.path=/tiup-topsql-multi-monitors/data/prometheus-9090 --storage.tsdb.retention=30d
    tidb-to+   12743  0.0  0.0  11184  2080 ?        S    09:55   0:00 /bin/bash /tiup-topsql-multi-monitors/deploy/prometheus-9090/scripts/run_prometheus.sh
    tidb-to+   12744  1.0  0.0 3253220 73220 ?       Sl   09:55   0:00 bin/ng-monitoring-server --config /tiup-topsql-multi-monitors/deploy/prometheus-9090/conf/ngmonitoring.toml
    tidb-to+   12751  0.0  0.0   5952  1792 ?        S    09:55   0:00 tee -i -a /tiup-topsql-multi-monitors/deploy/prometheus-9090/log/prometheus.log
    tidb-to+   13577  3.0  0.1 914632 185444 ?       Ssl  09:55   0:01 bin/prometheus/prometheus --config.file=/tiup-topsql-multi-monitors/deploy/prometheus-9190/conf/prometheus.yml --web.listen-address=:9190 --web.external-url=http://tiup-peer:9190/ --web.enable-admin-api --log.level=info --storage.tsdb.path=/tiup-topsql-multi-monitors/data/prometheus-9190 --storage.tsdb.retention=30d
    tidb-to+   13579  0.0  0.0  11184  2028 ?        S    09:55   0:00 /bin/bash /tiup-topsql-multi-monitors/deploy/prometheus-9190/scripts/run_prometheus.sh
    tidb-to+   13584  0.9  0.0 3308512 71140 ?       Sl   09:55   0:00 bin/ng-monitoring-server --config /tiup-topsql-multi-monitors/deploy/prometheus-9190/conf/ngmonitoring.toml
    tidb-to+   13590  0.0  0.0   5952  1736 ?        S    09:55   0:00 tee -i -a /tiup-topsql-multi-monitors/deploy/prometheus-9190/log/prometheus.log
    root       14122  0.0  0.0   9112   892 pts/0    S+   09:56   0:00 grep --color=auto prometheus-

    stop one of monitorings, tiup-peer:9090

    [root@tiup-0 ~]# tiup cluster stop topsql-multi-monitors -N tiup-peer:9090 -y
    Starting component `cluster`: /root/.tiup/components/cluster/v1.8.2/tiup-cluster stop topsql-multi-monitors -N tiup-peer:9090 -y
    + [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/topsql-multi-monitors/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/topsql-multi-monitors/ssh/id_rsa.pub
    + [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tiup-peer
    + [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tikv-peer
    + [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tidb-peer
    + [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=pd-peer
    + [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tiup-peer
    + [ Serial ] - StopCluster
    Stopping component prometheus
        Stopping instance tiup-peer
        Stop prometheus tiup-peer:9090 success
    Stopping component node_exporter
    Stopping component blackbox_exporter
    Stopped cluster `topsql-multi-monitors` successfully

    processes were down as expected

    [root@tiup-0 ~]# ps aux | grep prometheus-
    tidb-to+   13577  2.5  0.1 915256 209508 ?       Ssl  09:55   0:01 bin/prometheus/prometheus --config.file=/tiup-topsql-multi-monitors/deploy/prometheus-9190/conf/prometheus.yml --web.listen-address=:9190 --web.external-url=http://tiup-peer:9190/ --web.enable-admin-api --log.level=info --storage.tsdb.path=/tiup-topsql-multi-monitors/data/prometheus-9190 --storage.tsdb.retention=30d
    tidb-to+   13579  0.0  0.0  11184  2028 ?        S    09:55   0:00 /bin/bash /tiup-topsql-multi-monitors/deploy/prometheus-9190/scripts/run_prometheus.sh
    tidb-to+   13584  0.9  0.0 3308512 76600 ?       Sl   09:55   0:00 bin/ng-monitoring-server --config /tiup-topsql-multi-monitors/deploy/prometheus-9190/conf/ngmonitoring.toml
    tidb-to+   13590  0.0  0.0   5952  1736 ?        S    09:55   0:00 tee -i -a /tiup-topsql-multi-monitors/deploy/prometheus-9190/log/prometheus.log
    root       14165  0.0  0.0   9112   892 pts/0    S+   09:56   0:00 grep --color=auto prometheus-

    However, tiup-peer:9090 was still displayed as Up

    [root@tiup-0 ~]# tiup cluster display topsql-multi-monitors
    Starting component `cluster`: /root/.tiup/components/cluster/v1.8.2/tiup-cluster display topsql-multi-monitors
    Cluster type:       tidb
    Cluster name:       topsql-multi-monitors
    Cluster version:    nightly
    Deploy user:        tidb-topsql-multi-monitors
    SSH type:           builtin
    Dashboard URL:      http://pd-peer:2379/dashboard
    ID               Role        Host       Ports        OS/Arch       Status   Data Dir                                          Deploy Dir
    --               ----        ----       -----        -------       ------   --------                                          ----------
    pd-peer:2379     pd          pd-peer    2379/2380    linux/x86_64  Up|L|UI  /tiup-topsql-multi-monitors/data/pd-2379          /tiup-topsql-multi-monitors/deploy/pd-2379
    tiup-peer:9090   prometheus  tiup-peer  9090/12020   linux/x86_64  Up       /tiup-topsql-multi-monitors/data/prometheus-9090  /tiup-topsql-multi-monitors/deploy/prometheus-9090
    tiup-peer:9190   prometheus  tiup-peer  9190/12120   linux/x86_64  Up       /tiup-topsql-multi-monitors/data/prometheus-9190  /tiup-topsql-multi-monitors/deploy/prometheus-9190
    tidb-peer:4000   tidb        tidb-peer  4000/10080   linux/x86_64  Up       -                                                 /tiup-topsql-multi-monitors/deploy/tidb-4000
    tikv-peer:20160  tikv        tikv-peer  20160/20180  linux/x86_64  Up       /tiup-topsql-multi-monitors/data/tikv-20160       /tiup-topsql-multi-monitors/deploy/tikv-20160
    Total nodes: 5
    [root@tiup-0 ~]# tiup --version
    1.8.2 tiup
    Go Version: go1.17.5
    Git Ref: v1.8.2
    GitHash: a2912cf0470d651ce724e6c4e79871ec772abe38
  2. What did you expect to see?
    display down monitoring as down

  3. What did you see instead?
    display down monitoring as up

  4. What version of TiUP are you using (tiup --version)?
    both

    1.7.0 tiup
    Go Version: go1.17.3
    Git Ref: v1.7.0
    GitHash: ce8eb0a645cc3ead96a44d67b1ecd5034d112cf0
    

    and

    1.8.2 tiup
    Go Version: go1.17.5
    Git Ref: v1.8.2
    GitHash: a2912cf0470d651ce724e6c4e79871ec772abe38
    

Another unexpected behavior happened after downed both monitorings:

[root@tiup-0 ~]# tiup cluster stop topsql-multi-monitors -N tiup-peer:9190 -y
Starting component `cluster`: /root/.tiup/components/cluster/v1.8.2/tiup-cluster stop topsql-multi-monitors -N tiup-peer:9190 -y
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/topsql-multi-monitors/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/topsql-multi-monitors/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tiup-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=pd-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tikv-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tidb-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tiup-peer
+ [ Serial ] - StopCluster
Stopping component prometheus
	Stopping instance tiup-peer
	Stop prometheus tiup-peer:9190 success
Stopping component node_exporter
Stopping component blackbox_exporter
Stopped cluster `topsql-multi-monitors` successfully

Both monitorings were displayed down, which is expected.

[root@tiup-0 ~]# tiup cluster display topsql-multi-monitors
Starting component `cluster`: /root/.tiup/components/cluster/v1.8.2/tiup-cluster display topsql-multi-monitors
Cluster type:       tidb
Cluster name:       topsql-multi-monitors
Cluster version:    nightly
Deploy user:        tidb-topsql-multi-monitors
SSH type:           builtin
Dashboard URL:      http://pd-peer:2379/dashboard
ID               Role        Host       Ports        OS/Arch       Status   Data Dir                                          Deploy Dir
--               ----        ----       -----        -------       ------   --------                                          ----------
pd-peer:2379     pd          pd-peer    2379/2380    linux/x86_64  Up|L|UI  /tiup-topsql-multi-monitors/data/pd-2379          /tiup-topsql-multi-monitors/deploy/pd-2379
tiup-peer:9090   prometheus  tiup-peer  9090/12020   linux/x86_64  Down     /tiup-topsql-multi-monitors/data/prometheus-9090  /tiup-topsql-multi-monitors/deploy/prometheus-9090
tiup-peer:9190   prometheus  tiup-peer  9190/12120   linux/x86_64  Down     /tiup-topsql-multi-monitors/data/prometheus-9190  /tiup-topsql-multi-monitors/deploy/prometheus-9190
tidb-peer:4000   tidb        tidb-peer  4000/10080   linux/x86_64  Up       -                                                 /tiup-topsql-multi-monitors/deploy/tidb-4000
tikv-peer:20160  tikv        tikv-peer  20160/20180  linux/x86_64  Up       /tiup-topsql-multi-monitors/data/tikv-20160       /tiup-topsql-multi-monitors/deploy/tikv-20160
Total nodes: 5

However, an error occurred when I scale-in-ed tiup-peer:9090

[root@tiup-0 ~]# tiup cluster scale-in topsql-multi-monitors -N tiup-peer:9090 -y
Starting component `cluster`: /root/.tiup/components/cluster/v1.8.2/tiup-cluster scale-in topsql-multi-monitors -N tiup-peer:9090 -y
+ [ Serial ] - SSHKeySet: privateKey=/root/.tiup/storage/cluster/clusters/topsql-multi-monitors/ssh/id_rsa, publicKey=/root/.tiup/storage/cluster/clusters/topsql-multi-monitors/ssh/id_rsa.pub
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tiup-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tikv-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tidb-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=pd-peer
+ [Parallel] - UserSSH: user=tidb-topsql-multi-monitors, host=tiup-peer
+ [ Serial ] - ClusterOperate: operation=ScaleInOperation, options={Roles:[] Nodes:[tiup-peer:9090] Force:false SSHTimeout:5 OptTimeout:120 APITimeout:300 IgnoreConfigCheck:false NativeSSH:false SSHType: Concurrency:5 SSHProxyHost: SSHProxyPort:22 SSHProxyUser:root SSHProxyIdentity:/root/.ssh/id_rsa SSHProxyUsePassword:false SSHProxyTimeout:5 CleanupData:false CleanupLog:false CleanupAuditLog:false RetainDataRoles:[] RetainDataNodes:[] ShowUptime:false DisplayMode:default Operation:StartOperation}
Stopping component prometheus
	Stopping instance tiup-peer
	Stop prometheus tiup-peer:9090 success
Destroying component prometheus
Destroying instance tiup-peer
Destroy tiup-peer success
- Destroy prometheus paths: [/tiup-topsql-multi-monitors/deploy/prometheus-9090/log /tiup-topsql-multi-monitors/deploy/prometheus-9090 /etc/systemd/system/prometheus-9090.service /tiup-topsql-multi-monitors/data/prometheus-9090]
+ [ Serial ] - UpdateMeta: cluster=topsql-multi-monitors, deleted=`'tiup-peer:9090'`
+ [ Serial ] - UpdateTopology: cluster=topsql-multi-monitors
+ Refresh instance configs
  - Regenerate config pd -> pd-peer:2379 ... Done
  - Regenerate config tikv -> tikv-peer:20160 ... Done
  - Regenerate config tidb -> tidb-peer:4000 ... Done
  - Regenerate config prometheus -> tiup-peer:9190 ... Done
+ [ Serial ] - SystemCtl: host=tiup-peer action=reload prometheus-9190.service

Error: stdout: , stderr:Job for prometheus-9190.service invalid.
: executor.ssh.execute_failed: Failed to execute command over SSH for 'tidb-topsql-multi-monitors@tiup-peer:22' {ssh_stderr: Job for prometheus-9190.service invalid.
, ssh_stdout: , ssh_command: export LANG=C; PATH=$PATH:/bin:/sbin:/usr/bin:/usr/sbin /usr/bin/sudo -H bash -c "systemctl daemon-reload && systemctl reload prometheus-9190.service"}, cause: Process exited with status 1

Verbose debug logs has been written to /root/.tiup/logs/tiup-cluster-debug-2022-01-21-10-29-51.log.
Error: run `/root/.tiup/components/cluster/v1.8.2/tiup-cluster` (wd:/root/.tiup/data/Sv7AqSl) failed: exit status 1
@zhongzc zhongzc added the type/bug Categorizes issue as related to a bug. label Jan 21, 2022
@nexustar nexustar self-assigned this Jan 21, 2022
@nexustar nexustar added this to the v1.9.0 milestone Jan 26, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type/bug Categorizes issue as related to a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants