Skip to content

Commit

Permalink
loggging: Attempt to recover logmon failures
Browse files Browse the repository at this point in the history
Currently, when logmon fails to reattach, we will retry reattachment to
the same pid until the task restart specification is exhausted.

Because we cannot clear hook state during error conditions, it is not
possible for us to signal to a future restart that it _shouldn't_
attempt to reattach to the plugin.

Here we revert to explicitly detecting reattachment seperately from a
launch of a new logmon, so we can recover from scenarios where a logmon
plugin has failed.

This is a net improvement over the current hard failure situation, as it
means in the most common case (the pid has gone away), we can recover.

Other reattachment failure modes where the plugin may still be running
could potentially cause a duplicate process, or a subsequent failure to launch
a new plugin.

If there was a duplicate process, it could potentially cause duplicate
logging. This is better than a production workload outage.

If there was a subsequent failure to launch a new plugin, it would fail
in the same (retry until restarts are exhausted) as the current failure
mode.
  • Loading branch information
endocrimes committed Apr 18, 2019
1 parent 15c6487 commit 7d24952
Showing 1 changed file with 13 additions and 6 deletions.
19 changes: 13 additions & 6 deletions client/allocrunner/taskrunner/logmon_hook.go
Original file line number Diff line number Diff line change
Expand Up @@ -95,22 +95,29 @@ func reattachConfigFromHookData(data map[string]string) (*plugin.ReattachConfig,
func (h *logmonHook) Prestart(ctx context.Context,
req *interfaces.TaskPrestartRequest, resp *interfaces.TaskPrestartResponse) error {

// Create a logmon client by reattaching or launching a new instance
if h.logmonPluginClient == nil || h.logmonPluginClient.Exited() {
// Attempt to reattach to logmon
if h.logmonPluginClient == nil {
reattachConfig, err := reattachConfigFromHookData(req.PreviousState)
if err != nil {
h.logger.Error("failed to load reattach config", "error", err)
return err
}
if reattachConfig != nil {
if err := h.launchLogMon(reattachConfig); err != nil {
h.logger.Error("failed to reattach to logmon process", "error", err)
}
}

}

// Launch or reattach logmon instance for the task.
if err := h.launchLogMon(reattachConfig); err != nil {
// Retry errors launching logmon as logmon may have crashed and
// We did not reattach to a plugin and one is still not running.
if h.logmonPluginClient == nil || h.logmonPluginClient.Exited() {
if err := h.launchLogMon(nil); err != nil {
// Retry errors launching logmon as logmon may have crashed on start and
// subsequent attempts will start a new one.
h.logger.Error("failed to launch logmon process", "error", err)
return structs.NewRecoverableError(err, true)
}

}

err := h.logmon.Start(&logmon.LogConfig{
Expand Down

0 comments on commit 7d24952

Please sign in to comment.