Skip to content

Commit

Permalink
Timeout waiting for ZVOL device to be created
Browse files Browse the repository at this point in the history
We've seen cases where after creating a ZVOL, the ZVOL device node in
"/dev" isn't generated after 20 seconds of waiting, which is the point
at which our applications gives up on waiting and reports an error.

The workload when this occurs is to "refresh" 400+ ZVOLs roughly at the
same time, based on a policy set by the user. This refresh operation
will destroy the ZVOL, and re-create it based on a snapshot.

When this occurs, we see many hundreds of entries on the "z_zvol" taskq
(based on inspection of the /proc/spl/taskq-all file). Many of the
entries on the taskq end up in the "zvol_remove_minors_impl" function,
and I've measured the latency of that function:

Function = zvol_remove_minors_impl
msecs               : count     distribution
  0 -> 1          : 0        |                                        |
  2 -> 3          : 0        |                                        |
  4 -> 7          : 1        |                                        |
  8 -> 15         : 0        |                                        |
 16 -> 31         : 0        |                                        |
 32 -> 63         : 0        |                                        |
 64 -> 127        : 1        |                                        |
128 -> 255        : 45       |****************************************|
256 -> 511        : 5        |****                                    |

That data is from a 10 second sample, using the BCC "funclatency" tool.
As we can see, in this 10 second sample, most calls took 128ms at a
minimum. Thus, some basic math tells us that in any 20 second interval,
we could only process at most about 150 removals, which is much less
than the 400+ that'll occur based on the workload.

As a result of this, and since all ZVOL minor operations will go through
the single threaded "z_zvol" taskq, the latency for creating a single
ZVOL device can be unreasonably large due to other ZVOL activity on the
system. In our case, it's large enough to cause the application to
generate an error and fail the operation.

When profiling the "zvol_remove_minors_impl" function, I saw that most
of the time in the function was spent off-cpu, blocked in the function
"taskq_wait_outstanding". How this works, is "zvol_remove_minors_impl"
will dispatch calls to "zvol_free" using the "system_taskq", and then
the "taskq_wait_outstanding" function is used to wait for all of those
dispatched calls to occur before "zvol_remove_minors_impl" will return.

As far as I can tell, "zvol_remove_minors_impl" doesn't necessarily have
to wait for all calls to "zvol_free" to occur before it returns. Thus,
this change removes the call to "taskq_wait_oustanding", so that calls
to "zvol_free" don't affect the latency of "zvol_remove_minors_impl".

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: John Gallagher <john.gallagher@delphix.com>
Signed-off-by: Prakash Surya <prakash.surya@delphix.com>
Closes openzfs#9380
  • Loading branch information
Prakash Surya authored and behlendorf committed Oct 1, 2019
1 parent 3283f13 commit 99573cc
Show file tree
Hide file tree
Showing 2 changed files with 14 additions and 6 deletions.
13 changes: 13 additions & 0 deletions module/os/linux/zfs/zvol_os.c
Original file line number Diff line number Diff line change
Expand Up @@ -845,6 +845,11 @@ zvol_alloc(dev_t dev, const char *name)
* At this time, the structure is not opened by anyone, is taken off
* the zvol_state_list, and has its private data set to NULL.
* The zvol_state_lock is dropped.
*
* This function may take many milliseconds to complete (e.g. we've seen
* it take over 256ms), due to the calls to "blk_cleanup_queue" and
* "del_gendisk". Thus, consumers need to be careful to account for this
* latency when calling this function.
*/
static void
zvol_free(zvol_state_t *zv)
Expand Down Expand Up @@ -1089,6 +1094,14 @@ zvol_fini(void)
{
zvol_remove_minors_impl(NULL);

/*
* The call to "zvol_remove_minors_impl" may dispatch entries to
* the system_taskq, but it doesn't wait for those entires to
* complete before it returns. Thus, we must wait for all of the
* removals to finish, before we can continue.
*/
taskq_wait_outstanding(system_taskq, 0);

zvol_fini_impl();
blk_unregister_region(MKDEV(zvol_major, 0), 1UL << MINORBITS);
unregister_blkdev(zvol_major, ZVOL_DRIVER);
Expand Down
7 changes: 1 addition & 6 deletions module/zfs/zvol.c
Original file line number Diff line number Diff line change
Expand Up @@ -1170,7 +1170,7 @@ zvol_remove_minors_impl(const char *name)
{
zvol_state_t *zv, *zv_next;
int namelen = ((name) ? strlen(name) : 0);
taskqid_t t, tid = TASKQID_INVALID;
taskqid_t t;
list_t free_list;

if (zvol_inhibit_dev)
Expand Down Expand Up @@ -1217,8 +1217,6 @@ zvol_remove_minors_impl(const char *name)
(task_func_t *)ops->zv_free, zv, TQ_SLEEP);
if (t == TASKQID_INVALID)
list_insert_head(&free_list, zv);
else
tid = t;
} else {
mutex_exit(&zv->zv_state_lock);
}
Expand All @@ -1230,9 +1228,6 @@ zvol_remove_minors_impl(const char *name)
list_remove(&free_list, zv);
ops->zv_free(zv);
}

if (tid != TASKQID_INVALID)
taskq_wait_outstanding(system_taskq, tid);
}

/* Remove minor for this specific volume only */
Expand Down

0 comments on commit 99573cc

Please sign in to comment.