Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

4.8/wb buf throttle #376

Open
wants to merge 8 commits into
base: master
Choose a base branch
from

Commits on Oct 4, 2016

  1. block: add WRITE_BG

    This adds a new request flag, REQ_BG, that callers can use to tell
    the block layer that this is background (non-urgent) IO.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    b2a5e10 View commit details
    Browse the repository at this point in the history
  2. writeback: add wbc_to_write_flags()

    Add wbc_to_write_flags(), which returns the write modifier flags to use,
    based on a struct writeback_control. No functional changes in this
    patch, but it prepares us for factoring other wbc fields for write type.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    Reviewed-by: Jan Kara <jack@suse.cz>
    axboe committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    b74d363 View commit details
    Browse the repository at this point in the history
  3. writeback: use WRITE_BG for kupdate and background writeback

    If we're doing background type writes, then use the appropriate
    write command for that.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    0e4e6dc View commit details
    Browse the repository at this point in the history
  4. writeback: track if we're sleeping on progress in balance_dirty_pages()

    Note in the bdi_writeback structure whenever a task ends up sleeping
    waiting for progress. We can use that information in the lower layers
    to increase the priority of writes.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    45137dd View commit details
    Browse the repository at this point in the history
  5. block: add code to track actual device queue depth

    For blk-mq, ->nr_requests does track queue depth, at least at init
    time. But for the older queue paths, it's simply a soft setting.
    On top of that, it's generally larger than the hardware setting
    on purpose, to allow backup of requests for merging.
    
    Fill a hole in struct request with a 'queue_depth' member, that
    drivers can call to more closely inform the block layer of the
    real queue depth.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    2ea3fb3 View commit details
    Browse the repository at this point in the history
  6. block: add scalable completion tracking of requests

    For legacy block, we simply track them in the request queue. For
    blk-mq, we track them on a per-sw queue basis, which we can then
    sum up through the hardware queues and finally to a per device
    state.
    
    The stats are tracked in, roughly, 0.1s interval windows.
    
    Add sysfs files to display the stats.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 4, 2016
    Configuration menu
    Copy the full SHA
    2e768f7 View commit details
    Browse the repository at this point in the history

Commits on Oct 5, 2016

  1. wbt: add general throttling mechanism

    We can hook this up to the block layer, to help throttle buffered
    writes. Or NFS can tap into it, to accomplish the same.
    
    wbt registers a few trace points that can be used to track what is
    happening in the system:
    
    wbt_lat: 259:0: latency 2446318
    wbt_stat: 259:0: rmean=2446318, rmin=2446318, rmax=2446318, rsamples=1,
                   wmean=518866, wmin=15522, wmax=5330353, wsamples=57
    wbt_step: 259:0: step down: step=1, window=72727272, background=8, normal=16, max=32
    
    This shows a sync issue event (wbt_lat) that exceeded it's time. wbt_stat
    dumps the current read/write stats for that window, and wbt_step shows a
    step down event where we now scale back writes. Each trace includes the
    device, 259:0 in this case.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 5, 2016
    Configuration menu
    Copy the full SHA
    5a4812e View commit details
    Browse the repository at this point in the history

Commits on Oct 13, 2016

  1. writeback: throttle buffered writeback

    Test patch that throttles buffered writeback to make it a lot
    more smooth, and has way less impact on other system activity.
    Background writeback should be, by definition, background
    activity. The fact that we flush huge bundles of it at the time
    means that it potentially has heavy impacts on foreground workloads,
    which isn't ideal. We can't easily limit the sizes of writes that
    we do, since that would impact file system layout in the presence
    of delayed allocation. So just throttle back buffered writeback,
    unless someone is waiting for it.
    
    The algorithm for when to throttle takes its inspiration in the
    CoDel networking scheduling algorithm. Like CoDel, blk-wb monitors
    the minimum latencies of requests over a window of time. In that
    window of time, if the minimum latency of any request exceeds a
    given target, then a scale count is incremented and the queue depth
    is shrunk. The next monitoring window is shrunk accordingly. Unlike
    CoDel, if we hit a window that exhibits good behavior, then we
    simply increment the scale count and re-calculate the limits for that
    scale value. This prevents us from oscillating between a
    close-to-ideal value and max all the time, instead remaining in the
    windows where we get good behavior.
    
    Unlike CoDel, blk-wb allows the scale count to to negative. This
    happens if we primarily have writes going on. Unlike positive
    scale counts, this doesn't change the size of the monitoring window.
    When the heavy writers finish, blk-bw quickly snaps back to it's
    stable state of a zero scale count.
    
    The patch registers two sysfs entries. The first one, 'wb_window_usec',
    defines the window of monitoring. The second one, 'wb_lat_usec',
    sets the latency target for the window. It defaults to 2 msec for
    non-rotational storage, and 75 msec for rotational storage. Setting
    this value to '0' disables blk-wb. Generally, a user would not have
    to touch these settings.
    
    We don't enable WBT on devices that are managed with CFQ, and have
    a non-root block cgroup attached. If we have a proportional share setup
    on this particular disk, then the wbt throttling will interfere with
    that. We don't have a strong need for wbt for that case, since we will
    rely on CFQ doing that for us.
    
    Signed-off-by: Jens Axboe <axboe@fb.com>
    axboe committed Oct 13, 2016
    Configuration menu
    Copy the full SHA
    76aa120 View commit details
    Browse the repository at this point in the history