0.6.4 Slow NFS Performance Corrected? #2899

xflou · 2014-11-14T20:59:23Z

Hello, Not sure if this is the correct place to ask this, but browsing through and reading one of the issues relate to slow NFS performance, I noticed that the problem was corrected in release 0.6.4.

I am having a very similar issue and wanted to know if anyone can confirm that this release has indeed corrected the problem. I would like to apply this release but would like confirmation before removing 0.6.3 and installing 0.6.4.

Frank

FransUrbo · 2014-11-14T21:05:02Z

I noticed that the problem was corrected in release 0.6.4.

There is no 0.6.4 release. Yet. There's quite a number of issues left.

You COULD try building your own packages from the GIT master repository.

Where did you find the part that said it was fixed [in 0.6.4]?

xflou · 2014-11-14T21:37:25Z

I ran across this here --> "#2373"

My issue was not exact, but I had been experiencing the same symptoms. Extremely slow copies between client and file server nfs mounts, and even slower copy rates between zpool exported mounts.

Lame question, but Is there any information on how to build your own package you can point me to?

I've set some zfs parameters; which seem to be working so far, but I wanted to attempt correcting the NFS issue permanently with the 0.6.4 release, if possible.

behlendorf · 2014-11-14T22:01:05Z

@xflou It should be considerably improved in the next tag which will be 0.6.4. There are directions for how to build generic rpm and deb packages here. Alternately, there may be testing/development packages available for your distribution which contain this improvement.

http://zfsonlinux.org/generic-rpm.html
http://zfsonlinux.org/generic-deb.html

xflou · 2014-11-14T22:14:36Z

@behlendorf Thank you for the information. I have a non-production system I'm putting together to try this on before patching my production server. I'll give the generic-rpm a shot. Hopefully, it will work for Centos 6.4.

FransUrbo · 2014-11-14T22:30:55Z

Lame question, but Is there any information on how to build your own package you can point me to?

Depending on your distribution, it's either

http://zfsonlinux.org/generic-deb.html

or

http://zfsonlinux.org/generic-rpm.html

behlendorf · 2014-11-14T22:36:41Z

@xflou For Centos 6.4 you can just install from the zfs-testing repository. Just enable it in /etc/yum.repos.d/zfs.repo and disable the default zfs repository. By default the zfs repository tracks the stable tag and zfs-testing tracks master.

xflou · 2014-11-14T22:53:05Z

@behlendorf Thanks!! Using the zfs-testing repository will save me lots of time.

deajan · 2014-11-28T23:17:25Z

@xflou Did you test the current master of zfs against NFS performance yet ?
I'm having serious NFS write performance trouble here with 0.6.3.

xflou · 2014-12-14T21:06:36Z

Finally found a window when to apply the upgrade(Today). I have several problems I need serious help with:

First, the sequence of events:

I first tested the same upgrade on an exact system and things upgraded and seem to be fine including the zpool status command came back with all pool online
I applied the same upgrade to my production system "yum upgrade zfs" using the same zfs-test repo and this upgrade seem go fine - no errors during the yum upgrade.

Now. when I check the status of the pools on my production system after the upgrade, I have a bunch of "UNAVAIL" disks with the a 'DEGRADED" state in almost every pool with one pool not able to mount at all with the "raidz-0 DEGRADED" . The output of my zpool status command follows after my questions below.

I upgraded zfs, but did it with the zfs file systems "unmounted". Could this have caused this issue?
I attempted to place the disk that was "UNAVAIL" back online, but it returns the message below indicating that I should replace the disk.

warning: device 'sdao' onlined, but remains in faulted state
use 'zpool replace' to replace devices that are no longer present

Does the above message indicate that I need to replace the disk or is there another way to save this?
I also have a faulted disk and degraded raidz2-0 array, does this disk have to be replaced?
If forced to replace the disk, and replace it with a new disk, will it automatically resilver and can the system be used while this is being done.
In general, It seems that there are a lot of disk that went bad, could this be possible or accurate, given the information above?
Could "downgrading" help in this situation? I would not think so, but have to ask.

7.. zfs version output below:
OLD VERSION
libnvpair1.x86_64 0.6.3-1.1.el6 @zfs
libuutil1.x86_64 0.6.3-1.1.el6 @zfs
libzfs2.x86_64 0.6.3-1.1.el6 @zfs
libzpool2.x86_64 0.6.3-1.1.el6 @zfs
spl.x86_64 0.6.3-1.1.el6 @zfs
spl-dkms.noarch 0.6.3-1.1.el6 @zfs
zfs.x86_64 0.6.3-1.1.el6 @zfs
zfs-dkms.noarch 0.6.3-1.1.el6 @zfs
zfs-release.noarch 1-4.el6 @/zfs-release.el6.noarch
libzfs2-devel.x86_64 0.6.3-1.1.el6 zfs
lustre.x86_64 2.4.2-1dkms.el6 zfs
lustre-debuginfo.x86_64 2.4.2-1dkms.el6 zfs
lustre-dkms.noarch 2.4.2-1dkms.el6 zfs
lustre-source.x86_64 2.4.2-1dkms.el6 zfs
lustre-tests.x86_64 2.4.2-1dkms.el6 zfs
spl-debuginfo.x86_64 0.6.3-1.1.el6 zfs
zfs-debuginfo.x86_64 0.6.3-1.1.el6 zfs
zfs-devel.x86_64 0.6.2-1.el6 zfs
zfs-dracut.x86_64 0.6.3-1.1.el6 zfs
zfs-fuse.x86_64 0.6.9-6.20100709git.el6 epel
zfs-test.x86_64 0.6.3-1.1.el6 zfs

*** NEW VERSION **

zfs-dkms-0.6.3-1.el6.noarch
zfs-release-1-4.el6.noarch
libzfs2-0.6.3-1.el6.x86_64
zfs-0.6.3-159_gc944be5.el6.x86_64

nspluginwrapper-1.4.4-1.el6_3.x86_64
spl-dkms-0.6.3-1.el6.noarch
spl-0.6.3-1.el6.x86_64

libnvpair1.x86_64 0.6.3-1.el6 @zfs
libuutil1.x86_64 0.6.3-1.el6 @zfs
libzfs2.x86_64 0.6.3-1.el6 @zfs
libzpool2.x86_64 0.6.3-1.el6 @zfs
spl.x86_64 0.6.3-1.el6 @zfs
spl-dkms.noarch 0.6.3-1.el6 @zfs
zfs.x86_64 0.6.3-159_gc944be5.el6 @zfs-testing
zfs-dkms.noarch 0.6.3-1.el6 @zfs
zfs-release.noarch 1-4.el6 @/zfs-release.el6.noarch
libnvpair1.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
libuutil1.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
libzfs2.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
libzfs2-devel.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
libzpool2.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
lustre.x86_64 2.4.2-1dkms.el6 zfs-testing
lustre-debuginfo.x86_64 2.4.2-1dkms.el6 zfs-testing
lustre-dkms.noarch 2.4.2-1dkms.el6 zfs-testing
lustre-source.x86_64 2.4.2-1dkms.el6 zfs-testing
lustre-tests.x86_64 2.4.2-1dkms.el6 zfs-testing
spl.x86_64 0.6.3-52_g52479ec.el6 zfs-testing
spl-debuginfo.x86_64 0.6.3-52_g52479ec.el6 zfs-testing
spl-dkms.noarch 0.6.3-52_g52479ec.el6 zfs-testing
zfs-debuginfo.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
zfs-devel.x86_64 0.6.2-287_g2024041.el6 zfs-testing
zfs-dkms.noarch 0.6.3-159_gc944be5.el6 zfs-testing
zfs-dracut.x86_64 0.6.3-159_gc944be5.el6 zfs-testing
zfs-test.x86_64 0.6.3-159_gc944be5.el6 zfs-testing

below is the output from the three different errors with my production pools.

This particular pool will not mount since looks like 3 disks failed: (will I need to restore from alternate backup?)

pool: tools
state: UNAVAIL
status: One or more devices could not be used because the label is missing
or invalid. There are insufficient replicas for the pool to continue
functioning.
action: Destroy and re-create the pool from
a backup source.
see: http://zfsonlinux.org/msg/ZFS-8000-5E
scan: none requested
config:

NAME        STATE     READ WRITE CKSUM
tools       UNAVAIL      0     0     0  insufficient replicas
  raidz2-0  UNAVAIL      0     0     0  insufficient replicas
    sdar    UNAVAIL      0     0     0
    sdav    ONLINE       0     0     0
    sdaz    ONLINE       0     0     0
    sdas    ONLINE       0     0     0
    sdaw    UNAVAIL      0     0     0
    sdba    UNAVAIL      0     0     0
    sdat    UNAVAIL      0     0     0
    sdbb    ONLINE       0     0     0
    sdau    ONLINE       0     0     0
    sdbc    ONLINE       0     0     0

This pool has one disk UNAVAILABLE, and I attempted to place it back online but indicated that I must replace the disk. (Do I need to replace the disk in this case or can I use the same disk?)

pool: pub
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 2h59m with 0 errors on Sun Sep 21 17:30:30 2014
config:

NAME        STATE     READ WRITE CKSUM
publish     DEGRADED     0     0     0
  raidz2-0  DEGRADED     0     0     0
    sdd     ONLINE       0     0     0
    sdk     ONLINE       0     0     0
    sdl     ONLINE       0     0     0
    sds     ONLINE       0     0     0
    sdt     ONLINE       0     0     0
    sdaa    ONLINE       0     0     0
    sdab    ONLINE       0     0     0
    sdai    ONLINE       0     0     0
    sdaj    ONLINE       0     0     0
    sdao    UNAVAIL      0     0     0

errors: No known data errors

Last pool: Same question as before,(will I need to replace this disk or can I reuse it)

pool: data
state: DEGRADED
status: One or more devices could not be used because the label is missing or
invalid. Sufficient replicas exist for the pool to continue
functioning in a degraded state.
action: Replace the device using 'zpool replace'.
see: http://zfsonlinux.org/msg/ZFS-8000-4J
scan: scrub repaired 0 in 7h49m with 0 errors on Sun Sep 21 22:20:38 2014
config:

NAME        STATE     READ WRITE CKSUM
proj        DEGRADED     0     0     0
  raidz2-0  DEGRADED     0     0     0
    sdg     ONLINE       0     0     0
    sdh     ONLINE       0     0     0
    sdo     ONLINE       0     0     0
    sdp     ONLINE       0     0     0
    sdw     ONLINE       0     0     0
    sdx     ONLINE       0     0     0
    sdae    ONLINE       0     0     0
    sdaf    ONLINE       0     0     0
    sdam    ONLINE       0     0     0
    sdaq    FAULTED      0     0     0  corrupted data

errors: No known data errors

xflou · 2014-12-15T01:15:59Z

Please disregard. A quick reboot did the trick. Now I need to load my production system and see if NFS can handle the things better.

deajan · 2014-12-15T13:47:29Z

Please keep us up to date with benchs whenever you can.

dswartz · 2014-12-15T16:51:49Z

As far as I can tell, the slow NFS writes with an SSD SLOG seems to have been addressed by the AIO changes. I put a good 200GB over-provisioned and freshly erased SSD on my 3x2 raid10 pool. I added a vhd from that pool to my virtual win7 guest (vsphere) and re-ran crystadiskmark. Sequential reads 106 MB/sec writes 88 MB/sec (over a gigabit link).

deajan · 2014-12-30T10:37:02Z

How safe would it be to update to zfs testing in a production environment that has big NFS issues ?

FransUrbo · 2014-12-30T11:37:50Z

How safe would it be to update to zfs testing in a production environment that has big NFS issues ?

Generally 'very [safe]'. Your milage may vary, but I run latest GIT on my primary storage and all my machines. Usually 'we' recommend to run latest...

There HAVE been issues and problems introduced in GIT master/latest, but they are rare (can only remember one actually), and steps have been taken to avoid it in the future…

DO NOTE that if you're unlucky, features in the pool is/can be enabled when importing it with the new version, and some of these [features] don't exist in 0.6.3/tagged. If that happens, you won't be able to import the pool on an older version and have to stick with latest…

Next tagg (0.6.4) is probably a couple of months away, there's still 61 issues left (many of those are finished, they just need to be tested, verified and accepted).

I say 'tagged' because we shouldn't really talk 'stable'. The latest/GIT master is usually more stable than the tagged (because of the sheer number of issues/bugs fixed).=

deajan · 2014-12-30T12:47:38Z

Thanks a lot for your explanation, i'll dive into ZFS testing first in my home server, than my backup machine, and later in bigger backup machines :)

behlendorf · 2015-01-06T22:40:25Z

Since this has been confirmed fixed in master by several people I'm closing this issue. As mentioned above for those that need this fix now it's available from the zfs-testing repository and will be part of the 0.6.4 tag.

behlendorf added this to the 0.6.4 milestone Nov 14, 2014

behlendorf added the Type: Performance Performance improvement or performance problem label Nov 14, 2014

behlendorf closed this as completed Jan 6, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

0.6.4 Slow NFS Performance Corrected? #2899

0.6.4 Slow NFS Performance Corrected? #2899

xflou commented Nov 14, 2014

FransUrbo commented Nov 14, 2014

xflou commented Nov 14, 2014

behlendorf commented Nov 14, 2014

xflou commented Nov 14, 2014

FransUrbo commented Nov 14, 2014

behlendorf commented Nov 14, 2014

xflou commented Nov 14, 2014

deajan commented Nov 28, 2014

xflou commented Dec 14, 2014

xflou commented Dec 15, 2014

deajan commented Dec 15, 2014

dswartz commented Dec 15, 2014

deajan commented Dec 30, 2014

FransUrbo commented Dec 30, 2014

deajan commented Dec 30, 2014

behlendorf commented Jan 6, 2015

0.6.4 Slow NFS Performance Corrected? #2899

0.6.4 Slow NFS Performance Corrected? #2899

Comments

xflou commented Nov 14, 2014

FransUrbo commented Nov 14, 2014

xflou commented Nov 14, 2014

behlendorf commented Nov 14, 2014

xflou commented Nov 14, 2014

FransUrbo commented Nov 14, 2014

behlendorf commented Nov 14, 2014

xflou commented Nov 14, 2014

deajan commented Nov 28, 2014

xflou commented Dec 14, 2014

xflou commented Dec 15, 2014

deajan commented Dec 15, 2014

dswartz commented Dec 15, 2014

deajan commented Dec 30, 2014

FransUrbo commented Dec 30, 2014

deajan commented Dec 30, 2014

behlendorf commented Jan 6, 2015