Skip to content
This repository has been archived by the owner on Jan 30, 2020. It is now read-only.

Shutdown fixes/improvements #1452

Closed
wants to merge 10 commits into from

Commits on Feb 24, 2016

  1. server: centralise shutdown through Monitor

    The Server has a global stop channel which is used both internally (by
    Monitor) and externally (by the Stop method) to shut down the server.
    This is bad; invoking the method simultaneously by multiple goroutines
    is not safe (and as seen in coreos#1044 can cause panics due to a
    doubly-closed channel).
    
    This change centralises the shutdown procedure through the Monitor, so
    that when an external user of the Server wants to shut it down, it
    triggers an error to propagate up from the monitor. Hence there is only
    a single path in which the stopchannel (which terminates all other
    Server goroutines) can be called.
    jonboulle authored and antrik committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    3cda598 View commit details
    Browse the repository at this point in the history
  2. fleetd: protect access to global Server instance

    There are three different paths in the main fleetd goroutine that can
    access the global `srv` Server - reconfigurations, shutdowns and
    statedumps. Right now there's nothing preventing racy access to this
    instance, so introduce a mutex to protect it.
    
    One potential issue with this is that it means that a reconfigure or
    state dump can "block" a shutdown, but IMHO if this occurs it will
    expose behaviour that is broken and needs to be fixed anyway.
    jonboulle authored and antrik committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    5c6371e View commit details
    Browse the repository at this point in the history
  3. server: introduce shutdown timeout

    - add all background server components to a WaitGroup
    - when shutting down the server, wait on this group or until a timeout
      (defaulting to one minute) before restarting or exiting.
    - if timeout occurs, shut down hard and let a
    - move Monitor into server package
    - Server.Monitor -> Server.Supervise to remove ambiguity/duplication
    jonboulle authored and antrik committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    0bc3521 View commit details
    Browse the repository at this point in the history
  4. *: change kill/stop channels to struct{}

    Channels that are just used to "broadcast" messages (e.g. they are only
    ever closed) do not need a type; it is better to be more explicit about
    this by using a struct{}.
    
    Similarly, the channels can be receive-only.
    jonboulle authored and antrik committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    60df897 View commit details
    Browse the repository at this point in the history
  5. server: wg.Done -> wg.Wait (doh)

    jonboulle authored and antrik committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    a586f82 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    76437a7 View commit details
    Browse the repository at this point in the history
  7. server: Stop -> Kill

    To make things a little clearer for ol' man Crawford, rename the "Stop"
    function to "Kill" to align better with the channel names and be a
    little more explicit that it is invoked in response to a kill signal.
    jonboulle authored and antrik committed Feb 24, 2016
    Configuration menu
    Copy the full SHA
    38bd394 View commit details
    Browse the repository at this point in the history

Commits on Mar 2, 2016

  1. Add testcase for panic on shutdown

    Before the monitor/shutdown handling rework, fleetd would panic with a
    channel double-close if it was shut down while a restart (triggered by
    the monitor on a failed health check) was is progress.
    
    Added a functional test for this situation -- and also for regular
    shutdown as a baseline.
    antrik committed Mar 2, 2016
    Configuration menu
    Copy the full SHA
    e1527e7 View commit details
    Browse the repository at this point in the history
  2. Functional tests: Reduce fleet agent timeout

    The default agent_ttl setting of 30s makes TestMonitorVsShutdown run
    pretty long. (And probably also other tests related to connectivity
    loss, that will be added in the future...)
    
    With no actual network delays involved, making the timeout much shorter
    shouldn't cause any trouble -- so we can simply set it globally, rather
    than trying to override it only for the specific test in question.
    antrik committed Mar 2, 2016
    Configuration menu
    Copy the full SHA
    0efd29a View commit details
    Browse the repository at this point in the history

Commits on Mar 10, 2016

  1. Cleanup: Restore usage of NewMonitor() instead of inlining the code

    The function moved to a different package, but otherwise can be used as
    before -- I presume inlining it was some kind of accident...
    antrik committed Mar 10, 2016
    Configuration menu
    Copy the full SHA
    e9b6108 View commit details
    Browse the repository at this point in the history