Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Scrub causes hang (0.6.5.3) #4009

Closed
JonLaliberte opened this issue Nov 13, 2015 · 8 comments
Closed

Scrub causes hang (0.6.5.3) #4009

JonLaliberte opened this issue Nov 13, 2015 · 8 comments

Comments

@JonLaliberte
Copy link

I can consistently get this system to hang while performing a scrub.
I have (had) it scheduled to run weekly, so I know the system has no trouble maintaining 6-7 days of uptime if no scrub is running. The system hangs about maybe 20-30 minutes in to the scrub. No errors in the kern log, system is completely locked up.
The pool itself operates well and never shows any errors. Same goes for the kernal log, once up and running there are rarely any issues being reported.

I have tried uninstalling all zfs packages and reinstalling.

Linux Swimming-Ubuntu 3.19.0-33-generic #3814.04.1-Ubuntu SMP Fri Nov 6 18:17:28 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 14.04.3 LTS trusty
zfs-dkms 0.6.5.3-1
trusty
libzfs2 0.6.5.3-1trusty
libzpool2 0.6.5.3-1
trusty
spl 0.6.5.3-1trusty
spl-dkms 0.6.5.3-1
trusty
zfsutils 0.6.5.3-1trusty
ubuntu-zfs 8
trusty

Pool:
raidz2, 10 5TB drives

zpool get all:

NAME       PROPERTY                    VALUE                       SOURCE
Swimming  size                        45.2T                       -
Swimming  capacity                    21%                         -
Swimming  altroot                     -                           default
Swimming  health                      ONLINE                      -
Swimming  guid                        6325834864599512820         default
Swimming  version                     -                           default
Swimming  bootfs                      -                           default
Swimming  delegation                  on                          default
Swimming  autoreplace                 off                         default
Swimming  cachefile                   -                           default
Swimming  failmode                    wait                        default
Swimming  listsnapshots               off                         default
Swimming  autoexpand                  on                          local
Swimming  dedupditto                  0                           default
Swimming  dedupratio                  1.00x                       -
Swimming  free                        35.7T                       -
Swimming  allocated                   9.59T                       -
Swimming  readonly                    off                         -
Swimming  ashift                      0                           default
Swimming  comment                     -                           default
Swimming  expandsize                  -                           -
Swimming  freeing                     0                           default
Swimming  fragmentation               9%                          -
Swimming  leaked                      0                           default
Swimming  feature@async_destroy       enabled                     local
Swimming  feature@empty_bpobj         active                      local
Swimming  feature@lz4_compress        active                      local
Swimming  feature@spacemap_histogram  active                      local
Swimming  feature@enabled_txg         active                      local
Swimming  feature@hole_birth          active                      local
Swimming  feature@extensible_dataset  enabled                     local
Swimming  feature@embedded_data       active                      local
Swimming  feature@bookmarks           enabled                     local
Swimming  feature@filesystem_limits   disabled                    local
Swimming  feature@large_blocks        disabled                    local

zfs get all:

NAME       PROPERTY              VALUE                  SOURCE
Swimming  type                  filesystem             -
Swimming  creation              Wed Oct 21  0:43 2015  -
Swimming  used                  7.30T                  -
Swimming  available             26.1T                  -
Swimming  referenced            219K                   -
Swimming  compressratio         1.00x                  -
Swimming  mounted               yes                    -
Swimming  quota                 none                   default
Swimming  reservation           none                   default
Swimming  recordsize            128K                   default
Swimming  mountpoint            /Swimming             default
Swimming  sharenfs              off                    default
Swimming  checksum              fletcher4              local
Swimming  compression           off                    local
Swimming  atime                 off                    local
Swimming  devices               on                     default
Swimming  exec                  on                     default
Swimming  setuid                on                     default
Swimming  readonly              off                    default
Swimming  zoned                 off                    default
Swimming  snapdir               hidden                 default
Swimming  aclinherit            passthrough            local
Swimming  canmount              on                     default
Swimming  xattr                 on                     default
Swimming  copies                1                      default
Swimming  version               5                      -
Swimming  utf8only              off                    -
Swimming  normalization         none                   -
Swimming  casesensitivity       sensitive              -
Swimming  vscan                 off                    default
Swimming  nbmand                off                    default
Swimming  sharesmb              off                    default
Swimming  refquota              none                   default
Swimming  refreservation        none                   default
Swimming  primarycache          all                    default
Swimming  secondarycache        all                    default
Swimming  usedbysnapshots       0                      -
Swimming  usedbydataset         219K                   -
Swimming  usedbychildren        7.30T                  -
Swimming  usedbyrefreservation  0                      -
Swimming  logbias               latency                default
Swimming  dedup                 off                    default
Swimming  mlslabel              none                   default
Swimming  sync                  standard               default
Swimming  refcompressratio      1.00x                  -
Swimming  written               219K                   -
Swimming  logicalused           7.29T                  -
Swimming  logicalreferenced     35K                    -
Swimming  filesystem_limit      none                   default
Swimming  snapshot_limit        none                   default
Swimming  filesystem_count      none                   default
Swimming  snapshot_count        none                   default
Swimming  snapdev               hidden                 default
Swimming  acltype               off                    default
Swimming  context               none                   default
Swimming  fscontext             none                   default
Swimming  defcontext            none                   default
Swimming  rootcontext           none                   default
Swimming  relatime              off                    default
Swimming  redundant_metadata    all                    default
Swimming  overlay               off                    default
@JonLaliberte JonLaliberte changed the title System hangs while performing scrub 0.6.5.3 System hangs while performing scrub Nov 13, 2015
@JonLaliberte JonLaliberte changed the title 0.6.5.3 System hangs while performing scrub Scrub causes hang (0.6.5.3) Nov 13, 2015
@JonLaliberte
Copy link
Author

cat /proc/meminfo:

MemTotal:        8044216 kB
MemFree:          822040 kB
MemAvailable:    2513696 kB
Buffers:          380572 kB
Cached:          1432204 kB
SwapCached:            0 kB
Active:          1889944 kB
Inactive:         581852 kB
Active(anon):     660172 kB
Inactive(anon):    14464 kB
Active(file):    1229772 kB
Inactive(file):   567388 kB
Unevictable:           0 kB
Mlocked:               0 kB
SwapTotal:      16112636 kB
SwapFree:       16112636 kB

cat /proc/cpuinfo:

Note that there are 4 cores total.
processor   : 0
vendor_id   : AuthenticAMD
cpu family  : 16
model       : 4
model name  : AMD Phenom(tm) II X4 965 Processor
stepping    : 3
microcode   : 0x10000b6
cpu MHz     : 3415.710
cache size  : 512 KB
physical id : 0
siblings    : 4
core id     : 0
cpu cores   : 4
apicid      : 0
initial apicid  : 0
fpu     : yes
fpu_exception   : yes
cpuid level : 5
wp      : yes
flags       : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm 3dnowext 3dnow constant_tsc rep_good nopl nonstop_tsc extd_apicid pni monitor cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt hw_pstate npt lbrv svm_lock nrip_save vmmcall
bugs        : tlb_mmatch fxsave_leak
bogomips    : 6831.42
TLB size    : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 48 bits physical, 48 bits virtual
power management: ts ttp tm stc 100mhzsteps hwpstate

@alexanderhaensch
Copy link

There is a fix that solved a hanging resilver for me:
openzfs/spl@3e7e6f3
Maybe it help in your case? Just an idea as we have no backtraces from your system.

@JonLaliberte
Copy link
Author

I'll take a look. No backtraces to share as the system completely locks up, the crash kernel doesn't even take over?

@alexanderhaensch
Copy link

Is there no log on the console? sounds like a panic to me. Can you ping the system if the crash is happening?

@JonLaliberte
Copy link
Author

Nothing logged at all. Nothing in the kernel log or in /var/crash. Kernel log typically doesn't have anything at any time close to the crash.
Can't ping the system (let's call it my NAS) - but further, and I don't want to sound crazy here, but it brings the switch that it's connected to down as well. So the other system that is on the same switch (my primary work system), loses connectivity as well. As soon as I reset the NAS box the connection on my other machine comes back online immediately. (I have tried a different switch as well, it does the same thing on both switches.)

@alexanderhaensch
Copy link

Woar. Never heard something like that. The network card is going crazy in your system. Is it a Denial of Service with data flooding or something with the power?

@JonLaliberte
Copy link
Author

I have no idea, and I'm not sure how to test that either. I'll check to see if the switch is showing any activity on that port the next time it happens. It's 100% reproducible though. The NAS box goes down, the switch goes with it. Resetting the NAS box immediately restores the switch.

The system was relatively stable until I (naively) made a bunch of changes in a short period of time (upgraded the storage from 2 smaller zpools, to one large zpool), and upgraded Ubuntu from 12.04 to 14.04. Previously both zpools were being scrubbed weekly without incident, but I did have occasional hangs that I had not been able to get to the bottom of.

At this point I'm thinking that I should just upgrade the mobo. The hangs without any logging seems to me to be a hardware issue, but I am not having much luck getting to the bottom of it.

@JonLaliberte
Copy link
Author

After much more testing I tracked this down to a faulty SATA controller. Sorry for the wasted time on this one.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants