Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tidb-server is still up but don't work after down deploy disk #35007

Closed
mayjiang0203 opened this issue May 27, 2022 · 6 comments · Fixed by #42212
Closed

Tidb-server is still up but don't work after down deploy disk #35007

mayjiang0203 opened this issue May 27, 2022 · 6 comments · Fixed by #42212
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.2 This bug affects 5.2.x versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 affects-6.6 affects-7.0 severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.

Comments

@mayjiang0203
Copy link

mayjiang0203 commented May 27, 2022

Bug Report

What did you do?

1、Deploy tidb in one nvme disk via tiup;
2、Down the nvme disk network (nvme over tcp, down the tcp network to down the nvme );

[root@172 hongm]# iptables -A INPUT -p tcp --dport 4420 -j DROP;env LANG=en_US.UTF-8 date;
Fri May 27 18:50:17 CST 2022

[root@172 hongm]# env LANG=en_US.UTF-8 date;ps -ef|grep tidb-server
Fri May 27 19:05:15 CST 2022
root 15872 15853 99 5月26 ? 2-15:42:38 /tidb-server --store=tikv --advertise-address=tc-tidb-0.tc-tidb-peer.glh-610-mjxt6.svc --host=0.0.0.0 --path=tc-pd:2379 --config=/etc/tidb/tidb.toml
tidb 26080 1 28 18:41 ? 00:06:40 bin/tidb-server -P 4000 --status=10080 --host=0.0.0.0 --advertise-address=172.16.6.34 --store=tikv --path=172.16.6.34:23791,172.16.6.34:23792,172.16.6.34:23793 --log-slow-query=/data/nvme2n1/tidb-deploy/tidb-4000/log/tidb_slow_query.log --config=conf/tidb.toml --log-file=/data/nvme2n1/tidb-deploy/tidb-4000/log/tidb.log

image

What did you expect to see?

Tidb instance should panic or work fine.

What did you see instead?

tidb-server instance still running but can't work.

4. What is your TiDB version? (Required)

5.2.4

@mayjiang0203 mayjiang0203 added the type/bug The issue is confirmed as a bug. label May 27, 2022
@mayjiang0203 mayjiang0203 changed the title Tidb-server is still up but not work after down deploy disk Tidb-server is still up but don't work after down deploy disk May 27, 2022
@mayjiang0203
Copy link
Author

This test was request by one import customer.
/severity Major

@ti-chi-bot ti-chi-bot added severity/major may-affects-4.0 This bug maybe affects 4.0.x versions. may-affects-5.0 This bug maybe affects 5.0.x versions. may-affects-5.1 This bug maybe affects 5.1.x versions. may-affects-5.2 This bug maybe affects 5.2.x versions. may-affects-5.3 This bug maybe affects 5.3.x versions. may-affects-5.4 This bug maybe affects 5.4.x versions. may-affects-6.0 may-affects-6.1 labels May 27, 2022
@jebter jebter added the affects-5.2 This bug affects 5.2.x versions. label Jun 13, 2022
@ti-chi-bot ti-chi-bot removed the may-affects-5.2 This bug maybe affects 5.2.x versions. label Jun 13, 2022
@jebter
Copy link

jebter commented Jun 13, 2022

@AstroProfundis @bb7133 PTAL

@tiancaiamao
Copy link
Contributor

tiancaiamao commented Sep 20, 2022

What does the tidb log look like?

less /data/nvme2n1/tidb-deploy/tidb-4000/log/tidb.log

@tiancaiamao
Copy link
Contributor

Maybe it's writing log, and the disk is not available ... then it hang there.
I think we can check the tidb log to verify this.

@tiancaiamao
Copy link
Contributor

Maybe it's writing log, and the disk is not available ... then it hang there. I think we can check the tidb log to verify this.

Otherwise, what does the goroutine stack look like?

curl http://127.0.0.1:10080/debug/pprof/goroutine?debug=2 > goroutine.txt

We can check the goroutine stack to see where it blocks.

@tiancaiamao
Copy link
Contributor

The block is caused by a mutex when the code access log writing
The log library need to hold a lock for it's operation.
If the operation can't finish, the mutex won't be free, so the other log writing blocks.
The operation can't finish because the io is blocked... on the OS level

So I don't think this is a bug of TiDB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
affects-4.0 This bug affects 4.0.x versions. affects-5.2 This bug affects 5.2.x versions. affects-6.2 affects-6.3 affects-6.4 affects-6.5 affects-6.6 affects-7.0 severity/major sig/sql-infra SIG: SQL Infra type/bug The issue is confirmed as a bug.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants