Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

check-hardware not reporting all dmesg error message #18

Open
soichih opened this issue Feb 2, 2019 · 0 comments
Open

check-hardware not reporting all dmesg error message #18

soichih opened this issue Feb 2, 2019 · 0 comments

Comments

@soichih
Copy link

soichih commented Feb 2, 2019

I am trying to figure out how to configure check-hardware probe. Currently, I am seeing an occational dmesg entries like the following.

[ 4866.464697] sd 2:0:0:0: [sda] tag#1 abort
[ 4906.432614] sd 2:0:0:0: [sda] tag#1 abort
[ 9445.441825] sd 2:0:0:1: [sdb] tag#4 abort
[13624.437138] sd 2:0:0:1: [sdb] tag#1 abort
[16219.436717] sd 2:0:0:1: [sdb] tag#3 abort
[16255.564600] sd 2:0:0:1: [sdb] tag#0 abort
[16336.460350] sd 2:0:0:1: [sdb] tag#0 abort
[16648.427363] sd 2:0:0:1: [sdb] tag#31 abort
[16722.379150] sd 2:0:0:1: [sdb] tag#3 abort
[16808.378876] sd 2:0:0:1: [sdb] tag#0 abort
[17361.481198] sd 2:0:0:1: [sdb] tag#1 abort
[17412.377007] sd 2:0:0:1: [sdb] tag#49 abort
[17481.988986] sd 2:0:0:1: [sdb] tag#0 abort
[17520.956819] INFO: task jbd2/sdb-8:4403 blocked for more than 120 seconds.
[17520.968703]       Not tainted 4.4.0-141-generic #167-Ubuntu
[17520.973645] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[17520.980257] jbd2/sdb-8      D ffff8803f20a7ad8     0  4403      2 0x00000000
[17520.980265]  ffff8803f20a7ad8 ffff8803f20a7ad0 ffff88042d7a2a00 ffff88042a64d400
[17520.980270]  ffff8803f20a8000 ffff88043fcd7300 7fffffffffffffff ffffffff81855090
[17520.980275]  ffff8803f20a7c30 ffff8803f20a7af0 ffffffff81854895 0000000000000000
[17520.980280] Call Trace:
[17520.980290]  [<ffffffff81855090>] ? bit_wait+0x60/0x60
[17520.980295]  [<ffffffff81854895>] schedule+0x35/0x80
[17520.980300]  [<ffffffff81857a26>] schedule_timeout+0x1b6/0x270
[17520.980307]  [<ffffffff810cfd41>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20
[17520.980314]  [<ffffffff8106641e>] ? kvm_clock_get_cycles+0x1e/0x20
[17520.980335]  [<ffffffff810fb46e>] ? ktime_get+0x3e/0xb0
[17520.980349]  [<ffffffff81855090>] ? bit_wait+0x60/0x60
[17520.980360]  [<ffffffff81854004>] io_schedule_timeout+0xa4/0x110

These messages aren't detected by the check-hardware-fail probe

./check-hardware-fail.rb -s 300
CheckHardwareFail OK: OK

Is this probe not meant for the dmesg events?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant