WORKAROUND for zthr_cancelled() under spa_raidz_expand_cb() #32
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The assert backtrace:
#0 0x00007f9a75cda974 in zthr_iscancelled (t=0x0) at ../../module/zfs/zthr.c:447
447 ASSERT3P(t->zthr_thread, ==, curthread);
[Current thread is 1 (Thread 0x7f99faa29700 (LWP 282061))]
(gdb) bt
#0 0x00007f9a75cda974 in zthr_iscancelled (t=0x0) at ../../module/zfs/zthr.c:447
#1 0x00007f9a75c31ca7 in spa_raidz_expand_cb (arg=0x564206c2f820, zthr=) at ../../module/zfs/vdev_raidz.c:3927
#2 0x00007f9a75cda508 in zthr_procedure (arg=0x5642072aa6f0) at ../../module/zfs/zthr.c:241
#3 0x00007f9a75921609 in start_thread (arg=) at pthread_create.c:477
#4 0x00007f9a75848293 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
From other side:
(gdb) frame 1
#1 0x00007f9a75c31ca7 in spa_raidz_expand_cb (arg=0x564206c2f820, zthr=) at ../../module/zfs/vdev_raidz.c:3927
3927 i < raidvd->vdev_ms_count &&
(gdb) list
3922 vdev_t *raidvd = vdev_lookup_top(spa, vre->vre_vdev_id);
3923
3924 uint64_t guid = raidvd->vdev_guid;
3925
3926 for (uint64_t i = vre->vre_offset >> raidvd->vdev_ms_shift;
3927 i < raidvd->vdev_ms_count &&
3928 !zthr_iscancelled(spa->spa_raidz_expand_zthr) &&
3929 vre->vre_failed_offset == UINT64_MAX; i++) {
3930 metaslab_t *msp = raidvd->vdev_ms[i];
3931
(gdb) print spa->spa_raidz_expand_zthr
$1 = (zthr_t *) 0x5642072aa6f0
So, we can see, that zthr_iscancelled() argument on stack is NULL, but the passed value
spa->spa_raidz_expand_zthr is correct. The only way, I can see, how it can happen:
The spa_raidz_expand_cb() was preempted on zthr_iscancelled() call, when spa->spa_raidz_expand_zthr was NULL.
The sleep() in the beggining of spa_raidz_expand_cb() allows to avoid this situation.
Not sure, if this race could be reproduced on kernel side.