Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs send stuck on 0.6.5.4-1~vivid (kernel 3.19.0-43-generic) #4229

Closed
Marlinc opened this issue Jan 16, 2016 · 10 comments
Closed

zfs send stuck on 0.6.5.4-1~vivid (kernel 3.19.0-43-generic) #4229

Marlinc opened this issue Jan 16, 2016 · 10 comments
Labels
Status: Inactive Not being actively updated Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@Marlinc
Copy link

Marlinc commented Jan 16, 2016

I just ran into a stuck zfs send process on ZoL version 0.6.5.4-1 on Linux kernel 3.19.0-43.

I did get the following error in the output: cannot receive incremental stream: invalid backup stream

Command

zfs send -R pool@snapshot -v | ssh root@remote-host zfs receive backup/pools -v -F

Traces

PID 23010

[<ffffffffc04ccc25>] taskq_wait_id+0x55/0x90 [spl]
[<ffffffffc07e5ef4>] spa_taskq_dispatch_sync+0x84/0xb0 [zfs]
[<ffffffffc07aa212>] dump_bytes+0x42/0x50 [zfs]
[<ffffffffc07acc5b>] backup_cb+0x6ab/0x880 [zfs]
[<ffffffffc07aee4b>] traverse_visitbp+0x44b/0x9b0 [zfs]
[<ffffffffc07aefa6>] traverse_visitbp+0x5a6/0x9b0 [zfs]
[<ffffffffc07aefa6>] traverse_visitbp+0x5a6/0x9b0 [zfs]
[<ffffffffc07aefa6>] traverse_visitbp+0x5a6/0x9b0 [zfs]
[<ffffffffc07aefa6>] traverse_visitbp+0x5a6/0x9b0 [zfs]
[<ffffffffc07aefa6>] traverse_visitbp+0x5a6/0x9b0 [zfs]
[<ffffffffc07aefa6>] traverse_visitbp+0x5a6/0x9b0 [zfs]
[<ffffffffc07afba3>] traverse_dnode+0x73/0xf0 [zfs]
[<ffffffffc07af07d>] traverse_visitbp+0x67d/0x9b0 [zfs]
[<ffffffffc07af532>] traverse_impl+0x182/0x3f0 [zfs]
[<ffffffffc07af7f2>] traverse_dataset+0x52/0x60 [zfs]
[<ffffffffc07aac2b>] dmu_send_impl+0x3fb/0x560 [zfs]
[<ffffffffc07ad107>] dmu_send_obj+0x157/0x1d0 [zfs]
[<ffffffffc081b4db>] zfs_ioc_send+0xcb/0x2c0 [zfs]
[<ffffffffc081f495>] zfsdev_ioctl+0x455/0x4e0 [zfs]
[<ffffffff81209990>] do_vfs_ioctl+0x2e0/0x4e0
[<ffffffff81209c11>] SyS_ioctl+0x81/0xa0
[<ffffffff817cca4d>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff

PID 2202

[<ffffffff816aa366>] sock_alloc_send_pskb+0x136/0x260
[<ffffffff817637b9>] unix_stream_sendmsg+0x2a9/0x450
[<ffffffff816a472b>] sock_aio_write+0x11b/0x140
[<ffffffff811f475a>] do_sync_write+0x5a/0x90
[<ffffffff811f51d5>] vfs_write+0x175/0x1f0
[<ffffffffc04cdf18>] vn_rdwr+0x68/0xe0 [spl]
[<ffffffffc07aae1b>] dump_bytes_strategy+0x8b/0xf0 [zfs]
[<ffffffffc04cd045>] taskq_thread+0x215/0x440 [spl]
[<ffffffff810959b9>] kthread+0xc9/0xe0
[<ffffffff817cc998>] ret_from_fork+0x58/0x90
[<ffffffffffffffff>] 0xffffffffffffffff

'Open files'

PID 23010

zfs       23010             root  cwd       DIR                8,2      4096     524289 /root
zfs       23010             root  rtd       DIR                8,2      4096          2 /
zfs       23010             root  txt       REG                8,2    103320    5243003 /sbin/zfs
zfs       23010             root  mem       REG                8,2     89616     262566 /lib/x86_64-linux-gnu/libgcc_s.so.1
zfs       23010             root  mem       REG                8,2   3168656    5510224 /usr/lib/locale/locale-archive
zfs       23010             root  mem       REG                8,2   1084840     262694 /lib/x86_64-linux-gnu/libm-2.21.so
zfs       23010             root  mem       REG                8,2    108920     262766 /lib/x86_64-linux-gnu/libz.so.1.2.8
zfs       23010             root  mem       REG                8,2     19000     266849 /lib/x86_64-linux-gnu/libuuid.so.1.3.0
zfs       23010             root  mem       REG                8,2     31680     262740 /lib/x86_64-linux-gnu/librt-2.21.so
zfs       23010             root  mem       REG                8,2   1869392     262653 /lib/x86_64-linux-gnu/libc-2.21.so
zfs       23010             root  mem       REG                8,2    142080     262734 /lib/x86_64-linux-gnu/libpthread-2.21.so
zfs       23010             root  mem       REG                8,2     14496     262313 /lib/libzfs_core.so.1.0.0
zfs       23010             root  mem       REG                8,2    271768     262312 /lib/libzfs.so.2.0.0
zfs       23010             root  mem       REG                8,2   1277992     262309 /lib/libzpool.so.2.0.0
zfs       23010             root  mem       REG                8,2     73736     262266 /lib/libuutil.so.1.0.1
zfs       23010             root  mem       REG                8,2     84984     262306 /lib/libnvpair.so.1.0.1
zfs       23010             root  mem       REG                8,2    154376     262629 /lib/x86_64-linux-gnu/ld-2.21.so
zfs       23010             root    0u      CHR              136,5       0t0          8 /dev/pts/5
zfs       23010             root    1w     FIFO                0,9       0t0   92201708 pipe
zfs       23010             root    2u      CHR              136,5       0t0          8 /dev/pts/5
zfs       23010             root    3u      CHR              10,56       0t0        580 /dev/zfs
zfs       23010             root    4r      REG                0,4         0   92203461 /proc/23010/mounts
zfs       23010             root    5r      REG                8,2         0     137112 /etc/dfs/sharetab
zfs       23010             root    6u      CHR              10,56       0t0        580 /dev/zfs
zfs       23010             root    7u     unix 0xffff880409b37700       0t0   92203462 socket
zfs       23010             root    8u     unix 0xffff880409b32300       0t0   92203463 socket
zfs       23010             root    9u      CHR              10,56       0t0        580 /dev/zfs
zfs       23010             root    1w     FIFO                0,9       0t0   92201708 pipe
@kernelOfTruth
Copy link
Contributor

referencing:

#485 "invalid backup stream" error on second incremental stream sent in with -RI
#2210 zfs receive fails receiving a deduplicated stream "invalid backup stream"
#3066 send -D -R (-I) regressions
#3292 Failure during restoring via zfs recv
#1788 zfs recv fails if received into a clone
#1059 zfs receive allows snapshot destruction during receive and fails

#386 zfs send/recv - 'invalid backup stream' on multiple receives
#432 zfs receive - invalid stream format on receive with dedup processing enabled on send
#391 Incremental zfs send to file always produces fixed file size and an invalid stream
...

pointers so far:

@Marlinc you use

deduplication or auto-snapshots ?

is the receiving side a clone ?

@RichardSharpe
Copy link
Contributor

Has this been resolved?

@ttyS4
Copy link

ttyS4 commented Mar 13, 2016

No, I can reproduce the issue with the latest zfs build.
I have daily snapshots created by zfSnap and I do zfs send to do incremental backups.
e.g:
zfs send -Rv -I @2016-03-12_07.45.32--10d mypool/ROOT/debian-1@2016-03-13_07.52.38--10d

#4169 may be related?

@RichardSharpe
Copy link
Contributor

Are you doing send/recv on the same node?

I am just trying to understand the circumstances under which this arises.

@ttyS4
Copy link

ttyS4 commented Mar 13, 2016

I pipe thru netcat, but sometimes tried to dump to an NFS share or local disk.
For the usual use case the other end is a bananapi running FreeBSD with zfs recv.

@RichardSharpe
Copy link
Contributor

Hmmm, looking at #4169, it seems to relate to send/recv on the same node.

@ttyS4
Copy link

ttyS4 commented Mar 13, 2016

@RichardSharpe anything i can do to debug this, just let me know.

@dweeezil
Copy link
Contributor

I wonder if b58986e may have inadvertently caused this. I've not investigated this problem at all but I suppose there's a chance a taskq/thread is needed under Linux. For a taskq, it would also seem that TQ_NOQUEUE along with dweeezil/spl@a3cce0a (or equivalent) might be needed.

Whomever is seeing this problem may want to try modifying dump_bytes() to always use a taskq and then to disable dynamic taskqs if the problem persists.

@ttyS4
Copy link

ttyS4 commented Feb 16, 2019

Sorry cannot test at the moment as that machine needs a reinstall.

@stale
Copy link

stale bot commented Aug 24, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 24, 2020
@stale stale bot closed this as completed Nov 25, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Inactive Not being actively updated Status: Stale No recent activity for issue Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

6 participants
@RichardSharpe @Marlinc @dweeezil @ttyS4 @kernelOfTruth and others