Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix service.check_restart stanza propagation #3718

Merged
merged 5 commits into from
Jan 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,10 @@ __BACKWARDS INCOMPATIBILITIES:__
that absolute URLs are not allowed, but it was not enforced. Absolute URLs
in HTTP check paths will now fail to validate. [[GH-3685](https://github.com/hashicorp/nomad/issues/3685)]

IMPROVEMENTS:
* discovery: Allow `check_restart` to be specified in the `service` stanza.
[GH-3718]

BUG FIXES:
* core: Fix search endpoint forwarding for multi-region clusters [[GH-3680](https://github.com/hashicorp/nomad/issues/3680)]
* core: Fix an issue in which batch jobs with queued placements and lost
Expand Down Expand Up @@ -663,7 +667,7 @@ BUG FIXES:
* client: Killing an allocation doesn't cause allocation stats to block
[[GH-1454](https://github.com/hashicorp/nomad/issues/1454)]
* driver/docker: Disable swap on docker driver [[GH-1480](https://github.com/hashicorp/nomad/issues/1480)]
* driver/docker: Fix improper gating on privileged mode [[GH-1506](https://github.com/hashicorp/nomad/issues/1506)]
* driver/docker: Fix improper gating on priviledged mode [[GH-1506](https://github.com/hashicorp/nomad/issues/1506)]
* driver/docker: Default network type is "nat" on Windows [[GH-1521](https://github.com/hashicorp/nomad/issues/1521)]
* driver/docker: Cleanup created volume when destroying container [[GH-1519](https://github.com/hashicorp/nomad/issues/1519)]
* driver/rkt: Set host environment variables [[GH-1581](https://github.com/hashicorp/nomad/issues/1581)]
Expand Down
14 changes: 6 additions & 8 deletions api/tasks.go
Original file line number Diff line number Diff line change
Expand Up @@ -128,15 +128,15 @@ func (c *CheckRestart) Merge(o *CheckRestart) *CheckRestart {
return nc
}

if nc.Limit == 0 {
if o.Limit > 0 {
nc.Limit = o.Limit
}

if nc.Grace == nil {
if o.Grace != nil {
nc.Grace = o.Grace
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This merge seems backwards. Usually we prefer the passed in object's values. https://github.com/hashicorp/nomad/blob/master/api/jobs.go#L389

}

if nc.IgnoreWarnings {
if o.IgnoreWarnings {
nc.IgnoreWarnings = o.IgnoreWarnings
}

Expand Down Expand Up @@ -185,13 +185,11 @@ func (s *Service) Canonicalize(t *Task, tg *TaskGroup, job *Job) {
s.AddressMode = "auto"
}

s.CheckRestart.Canonicalize()

// Canonicallize CheckRestart on Checks and merge Service.CheckRestart
// into each check.
for _, c := range s.Checks {
c.CheckRestart.Canonicalize()
c.CheckRestart = c.CheckRestart.Merge(s.CheckRestart)
for i, check := range s.Checks {
s.Checks[i].CheckRestart = s.CheckRestart.Merge(check.CheckRestart)
s.Checks[i].CheckRestart.Canonicalize()
}
}

Expand Down
49 changes: 49 additions & 0 deletions api/tasks_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ package api
import (
"reflect"
"testing"
"time"

"github.com/hashicorp/nomad/helper"
"github.com/stretchr/testify/assert"
Expand Down Expand Up @@ -266,3 +267,51 @@ func TestTaskGroup_Canonicalize_Update(t *testing.T) {
tg.Canonicalize(job)
assert.Nil(t, tg.Update)
}

// TestService_CheckRestart asserts Service.CheckRestart settings are properly
// inherited by Checks.
func TestService_CheckRestart(t *testing.T) {
job := &Job{Name: helper.StringToPtr("job")}
tg := &TaskGroup{Name: helper.StringToPtr("group")}
task := &Task{Name: "task"}
service := &Service{
CheckRestart: &CheckRestart{
Limit: 11,
Grace: helper.TimeToPtr(11 * time.Second),
IgnoreWarnings: true,
},
Checks: []ServiceCheck{
{
Name: "all-set",
CheckRestart: &CheckRestart{
Limit: 22,
Grace: helper.TimeToPtr(22 * time.Second),
IgnoreWarnings: true,
},
},
{
Name: "some-set",
CheckRestart: &CheckRestart{
Limit: 33,
Grace: helper.TimeToPtr(33 * time.Second),
},
},
{
Name: "unset",
},
},
}

service.Canonicalize(task, tg, job)
assert.Equal(t, service.Checks[0].CheckRestart.Limit, 22)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use "require" package

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case I want it to continue checking in case it's just one field off.

assert.Equal(t, *service.Checks[0].CheckRestart.Grace, 22*time.Second)
assert.True(t, service.Checks[0].CheckRestart.IgnoreWarnings)

assert.Equal(t, service.Checks[1].CheckRestart.Limit, 33)
assert.Equal(t, *service.Checks[1].CheckRestart.Grace, 33*time.Second)
assert.True(t, service.Checks[1].CheckRestart.IgnoreWarnings)

assert.Equal(t, service.Checks[2].CheckRestart.Limit, 11)
assert.Equal(t, *service.Checks[2].CheckRestart.Grace, 11*time.Second)
assert.True(t, service.Checks[2].CheckRestart.IgnoreWarnings)
}
26 changes: 24 additions & 2 deletions command/agent/job_endpoint_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1212,6 +1212,10 @@ func TestJobs_ApiJobToStructsJob(t *testing.T) {
Name: "serviceA",
Tags: []string{"1", "2"},
PortLabel: "foo",
CheckRestart: &api.CheckRestart{
Limit: 4,
Grace: helper.TimeToPtr(11 * time.Second),
},
Checks: []api.ServiceCheck{
{
Id: "hello",
Expand All @@ -1228,10 +1232,17 @@ func TestJobs_ApiJobToStructsJob(t *testing.T) {
InitialStatus: "ok",
CheckRestart: &api.CheckRestart{
Limit: 3,
Grace: helper.TimeToPtr(10 * time.Second),
IgnoreWarnings: true,
},
},
{
Id: "check2id",
Name: "check2",
Type: "tcp",
PortLabel: "foo",
Interval: 4 * time.Second,
Timeout: 2 * time.Second,
},
},
},
},
Expand Down Expand Up @@ -1425,10 +1436,21 @@ func TestJobs_ApiJobToStructsJob(t *testing.T) {
InitialStatus: "ok",
CheckRestart: &structs.CheckRestart{
Limit: 3,
Grace: 10 * time.Second,
Grace: 11 * time.Second,
IgnoreWarnings: true,
},
},
{
Name: "check2",
Type: "tcp",
PortLabel: "foo",
Interval: 4 * time.Second,
Timeout: 2 * time.Second,
CheckRestart: &structs.CheckRestart{
Limit: 4,
Grace: 11 * time.Second,
},
},
},
},
},
Expand Down
1 change: 1 addition & 0 deletions jobspec/parse.go
Original file line number Diff line number Diff line change
Expand Up @@ -912,6 +912,7 @@ func parseServices(jobName string, taskGroupName string, task *api.Task, service
"port",
"check",
"address_mode",
"check_restart",
}
if err := helper.CheckHCLKeys(o.Val, valid); err != nil {
return multierror.Prefix(err, fmt.Sprintf("service (%d) ->", idx))
Expand Down
36 changes: 36 additions & 0 deletions jobspec/parse_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -631,6 +631,42 @@ func TestParse(t *testing.T) {
},
false,
},
{
"service-check-restart.hcl",
&api.Job{
ID: helper.StringToPtr("service_check_restart"),
Name: helper.StringToPtr("service_check_restart"),
Type: helper.StringToPtr("service"),
TaskGroups: []*api.TaskGroup{
{
Name: helper.StringToPtr("group"),
Tasks: []*api.Task{
{
Name: "task",
Services: []*api.Service{
{
Name: "http-service",
CheckRestart: &api.CheckRestart{
Limit: 3,
Grace: helper.TimeToPtr(10 * time.Second),
IgnoreWarnings: true,
},
Checks: []api.ServiceCheck{
{
Name: "random-check",
Type: "tcp",
PortLabel: "9001",
},
},
},
},
},
},
},
},
},
false,
},
}

for _, tc := range cases {
Expand Down
21 changes: 21 additions & 0 deletions jobspec/test-fixtures/service-check-restart.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
job "service_check_restart" {
type = "service"
group "group" {
task "task" {
service {
name = "http-service"
check_restart {
limit = 3
grace = "10s"
ignore_warnings = true
}
check {
name = "random-check"
type = "tcp"
port = "9001"
}
}
}
}
}

12 changes: 11 additions & 1 deletion website/source/docs/job-specification/check_restart.html.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,12 @@ description: |-
# `check_restart` Stanza

<table class="table table-bordered table-striped">
<tr>
<th width="120">Placement</th>
<td>
<code>job -> group -> task -> service -> **check_restart**</code>
</td>
</tr>
<tr>
<th width="120">Placement</th>
<td>
Expand All @@ -22,7 +28,10 @@ As of Nomad 0.7 the `check_restart` stanza instructs Nomad when to restart
tasks with unhealthy service checks. When a health check in Consul has been
unhealthy for the `limit` specified in a `check_restart` stanza, it is
restarted according to the task group's [`restart` policy][restart_stanza]. The
`check_restart` settings apply to [`check`s][check_stanza].
`check_restart` settings apply to [`check`s][check_stanza], but may also be
placed on [`service`s][service_stanza] to apply to all checks on a service.
If `check_restart` is set on both the check and service, the stanzas are
merged with the check values taking precedence.

```hcl
job "mysql" {
Expand Down Expand Up @@ -140,3 +149,4 @@ details.

[check_stanza]: /docs/job-specification/service.html#check-parameters "check stanza"
[restart_stanza]: /docs/job-specification/restart.html "restart stanza"
[service_stanza]: /docs/job-specification/service.html "service stanza"