Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't persist allocs of destroyed alloc runners #6207

Merged
merged 2 commits into from
Aug 26, 2019

Commits on Aug 25, 2019

  1. Don't persist allocs of destroyed alloc runners

    This fixes a bug where allocs that have been GCed get re-run again after client
    is restarted.  A heavily-used client may launch thousands of allocs on startup
    and get killed.
    
    The bug is that an alloc runner that gets destroyed due to GC remains in
    client alloc runner set.  Periodically, they get persisted until alloc is
    gced by server.  During that  time, the client db will contain the alloc
    but not its individual tasks status nor completed state.  On client restart,
    client assumes that alloc is pending state and re-runs it.
    
    Here, we fix it by ensuring that destroyed alloc runners don't persist any alloc
    to the state DB.
    
    This is a short-term fix, as we should consider revamping client state
    management.  Storing alloc and task information in non-transaction non-atomic
    concurrently while alloc runner is running and potentially changing state is a
    recipe for bugs.
    
    Fixes #5984
    Related to #5890
    Mahmood Ali committed Aug 25, 2019
    Configuration menu
    Copy the full SHA
    a80643e View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2019

  1. Write to client store while holding lock

    Protect against a race where destroying and persist state goroutines
    race.
    
    The downside is that the database io operation will run while holding
    the lock and may run indefinitely.  The risk of lock being long held is
    slow destruction, but slow io has bigger problems.
    Mahmood Ali committed Aug 26, 2019
    Configuration menu
    Copy the full SHA
    ff3dedd View commit details
    Browse the repository at this point in the history