Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

metrics: prevent negative counter from iowait decrease #18835

Merged
merged 1 commit into from
Oct 24, 2023
Merged

Commits on Oct 23, 2023

  1. metrics: prevent negative counter from iowait decrease

    The iowait metric obtained from `/proc/stat` can under some circumstances
    decrease. The relevant condition is when an interrupt arrives on a different
    core than the one that gets woken up for the IO, and a particular counter in the
    kernel for that core gets interrupted. This is documented in the man page for
    the `proc(5)` pseudo-filesystem, and considered an unfortunate behavior that
    can't be changed for the sake of ABI compatibility.
    
    In Nomad, we get the current "busy" time (everything except for idle) and
    compare it to the previous busy time to get the counter incremeent. If the
    iowait counter decreases and the idle counter increases more than the increase
    in the total busy time, we can get a negative total. This previously caused a
    panic in our metrics collection (see #15861) but that is being prevented by
    reporting an error message.
    
    Fix the bug by putting a zero floor on the values we return from the host CPU
    stats calculator.
    
    Fixes: #15861
    Fixes: #18804
    tgross committed Oct 23, 2023
    Configuration menu
    Copy the full SHA
    6df1a3c View commit details
    Browse the repository at this point in the history