Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(metrics): fix race when accessing metric registry #2409

Merged
merged 1 commit into from
Jan 10, 2023

Commits on Dec 30, 2022

  1. fix(metrics): fix race when accessing metric registry

    A race condition was introduced in
    5b04c98 (feat(metrics): track
    consumer-fetch-response-size) when passing the metric registry around to
    get additional metrics. Notably, `handleResponsePromise()` could access
    the registry after the broker has been closed and is tentatively being
    reopened. This triggers a data race because `b.metricRegistry` is being
    set during `Open()` (as it is part of the configuration).
    
    We fix this by reverting the addition of `handleResponsePromise()` as a
    method to `Broker`. Instead, we provide it with the metric registry as
    an argument. An alternative would have been to get the metric registry
    before the `select` call. However, removing it as a method make it
    clearer than this function is not allowed to access the broker internals
    as they are not protected by the lock and the broker may not be alive
    any more.
    
    All the following calls to `b.metricRegistry` are done while the lock is
    held:
    
    - inside `Open()`, the lock is held, including inside the goroutine
    - inside `Close()`, the lock is held
    - `AsyncProduce()` has a contract that it must be called while the broker
      is open, we keep a copy of the metric registry to use inside the callback
    - `sendInternal()`, has a contract that the lock should be held
    - `authenticateViaSASLv1()` is called from `Open()` and
      `sendWithPromise()`, both of them holding the lock
    - `sendAndReceiveSASLHandshake()` is called from
    - `authenticateViaSASLv0/v1()`, which are called from `Open()` and
      `sendWithPromise()`
    
    I am unsure about `responseReceiver()`, however, it is also calling
    `b.readFull()` which accesses `b.conn`, so I suppose it is safe.
    
    This leaves `sendAndReceive()` which is calling `send()`, which is
    calling `sendWithPromise()` which puts a lock. We move the lock to
    `sendAndReceive()` instead. `send()` is only called from
    `sendAndReceiver()` and we put a lock for `sendWithPromise()` other
    caller.
    
    The test has been stolen from IBM#2393 from @samuelhewitt. IBM#2393 is an
    alternative proposal using a RW lock to protect `b.metricRegistry`.
    
    Fix IBM#2320
    vincentbernat committed Dec 30, 2022
    Configuration menu
    Copy the full SHA
    f5433ad View commit details
    Browse the repository at this point in the history