Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

connect: canonicalize before adding sidecar #6855

Merged
merged 2 commits into from
Dec 13, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ IMPROVEMENTS:
BUG FIXES:

* cli: Fixed a bug where `nomad monitor -node-id` would cause a cli panic when no nodes where found. [[GH-6828](https://github.com/hashicorp/nomad/issues/6828)]
* consul/connect: Fixed a bug where Connect-enabled jobs failed to validate when service names used interpolation. [[GH-6855](https://github.com/hashicorp/nomad/issues/6855)]

## 0.10.2 (December 4, 2019)

Expand Down Expand Up @@ -54,10 +55,10 @@ BUG FIXES:
* cli: Fixed a bug where a cli user may fail to query FS/Allocation API endpoints if they lack `node:read` capability [[GH-6423](https://github.com/hashicorp/nomad/issues/6423)]
* client: client: Return empty values when host stats fail [[GH-6349](https://github.com/hashicorp/nomad/issues/6349)]
* client: Fixed a bug where a client may not restart dead internal processes upon client's restart on Windows [[GH-6426](https://github.com/hashicorp/nomad/issues/6426)]
* consul/connect: Fixed registering multiple Connect-enabled services in the same task group [[GH-6646](https://github.com/hashicorp/nomad/issues/6646)]
* drivers: Fixed a bug where client may panic if a restored task failed to shutdown cleanly [[GH-6763](https://github.com/hashicorp/nomad/issues/6763)]
* driver/exec: Fixed a bug where exec tasks can spawn processes that live beyond task lifecycle [[GH-6722](https://github.com/hashicorp/nomad/issues/6722)]
* driver/docker: Added mechanism for detecting running unexpectedly running docker containers [[GH-6325](https://github.com/hashicorp/nomad/issues/6325)]
* nomad: Fixed registering multiple connect enabled services in the same task group [[GH-6646](https://github.com/hashicorp/nomad/issues/6646)]
* scheduler: Changes to devices in resource stanza should cause rescheduling [[GH-6644](https://github.com/hashicorp/nomad/issues/6644)]
* scheduler: Fixed a bug that allowed inplace updates after affinity or spread were changed [[GH-6703](https://github.com/hashicorp/nomad/issues/6703)]
* vault: Allow overriding implicit Vault version constraint [[GH-6687](https://github.com/hashicorp/nomad/issues/6687)]
Expand Down
2 changes: 1 addition & 1 deletion nomad/job_endpoint.go
Original file line number Diff line number Diff line change
Expand Up @@ -58,8 +58,8 @@ func NewJobEndpoints(s *Server) *Job {
srv: s,
logger: s.logger.Named("job"),
mutators: []jobMutator{
jobConnectHook{},
jobCanonicalizer{},
jobConnectHook{},
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note to reviewers: this is one of those fun things that looks super minor but is actually the key to the entire PR.

Previously we were canonicalizing after adding Connect tasks.
This change makes it so we canonicalize before adding Connect tasks.

This solves the interpolation issue linked from the PR, but is also generally safer as just about everywhere in nomad/ we touch Job structs we expect them to be canonicalized. So this should make future work in mutators much more predictable.

The downside is that any mutations must canonicalize themselves. The conservative version of this approach would just canonicalize twice: once at the beginning and once at the end. Since there is so little mutator code today I went with this minimal approach where the connect hook canonicalizes anything it adds.

jobImpliedConstraints{},
},
validators: []jobValidator{
Expand Down
15 changes: 9 additions & 6 deletions nomad/job_endpoint_hook_connect.go
Original file line number Diff line number Diff line change
Expand Up @@ -59,7 +59,7 @@ func (jobConnectHook) Mutate(job *structs.Job) (_ *structs.Job, warnings []error
continue
}

if err := groupConnectHook(g); err != nil {
if err := groupConnectHook(job, g); err != nil {
return nil, nil, err
}
}
Expand Down Expand Up @@ -96,15 +96,15 @@ func isSidecarForService(t *structs.Task, svc string) bool {
return t.Kind == structs.TaskKind(fmt.Sprintf("%s:%s", structs.ConnectProxyPrefix, svc))
}

func groupConnectHook(g *structs.TaskGroup) error {
func groupConnectHook(job *structs.Job, g *structs.TaskGroup) error {
for _, service := range g.Services {
if service.Connect.HasSidecar() {
// Check to see if the sidecar task already exists
task := getSidecarTaskForService(g, service.Name)

// If the task doesn't already exist, create a new one and add it to the job
if task == nil {
task = newConnectTask(service)
task = newConnectTask(service.Name)

// If there happens to be a task defined with the same name
// append an UUID fragment to the task name
Expand All @@ -121,6 +121,9 @@ func groupConnectHook(g *structs.TaskGroup) error {
service.Connect.SidecarTask.MergeIntoTask(task)
}

// Canonicalize task since this mutator runs after job canonicalization
task.Canonicalize(job, g)

// port to be added for the sidecar task's proxy port
port := structs.Port{
Label: fmt.Sprintf("%s-%s", structs.ConnectProxyPrefix, service.Name),
Expand All @@ -147,11 +150,11 @@ func groupConnectHook(g *structs.TaskGroup) error {
return nil
}

func newConnectTask(service *structs.Service) *structs.Task {
func newConnectTask(serviceName string) *structs.Task {
task := &structs.Task{
// Name is used in container name so must start with '[A-Za-z0-9]'
Name: fmt.Sprintf("%s-%s", structs.ConnectProxyPrefix, service.Name),
Kind: structs.TaskKind(fmt.Sprintf("%s:%s", structs.ConnectProxyPrefix, service.Name)),
Name: fmt.Sprintf("%s-%s", structs.ConnectProxyPrefix, serviceName),
Kind: structs.TaskKind(fmt.Sprintf("%s:%s", structs.ConnectProxyPrefix, serviceName)),
Driver: "docker",
Config: connectDriverConfig,
ShutdownDelay: 5 * time.Second,
Expand Down
45 changes: 37 additions & 8 deletions nomad/job_endpoint_hook_connect_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ import (
"fmt"
"testing"

"github.com/hashicorp/nomad/helper/testlog"
"github.com/hashicorp/nomad/nomad/mock"
"github.com/hashicorp/nomad/nomad/structs"
"github.com/stretchr/testify/require"
)
Expand Down Expand Up @@ -49,7 +51,8 @@ func Test_isSidecarForService(t *testing.T) {

func Test_groupConnectHook(t *testing.T) {
// Test that connect-proxy task is inserted for backend service
tgIn := &structs.TaskGroup{
job := mock.Job()
job.TaskGroups[0] = &structs.TaskGroup{
Networks: structs.Networks{
{
Mode: "bridge",
Expand All @@ -73,11 +76,16 @@ func Test_groupConnectHook(t *testing.T) {
},
}

tgOut := tgIn.Copy()
// Expected tasks
tgOut := job.TaskGroups[0].Copy()
tgOut.Tasks = []*structs.Task{
newConnectTask(tgOut.Services[0]),
newConnectTask(tgOut.Services[1]),
newConnectTask(tgOut.Services[0].Name),
newConnectTask(tgOut.Services[1].Name),
}

// Expect sidecar tasks to be properly canonicalized
tgOut.Tasks[0].Canonicalize(job, tgOut)
tgOut.Tasks[1].Canonicalize(job, tgOut)
tgOut.Networks[0].DynamicPorts = []structs.Port{
{
Label: fmt.Sprintf("%s-%s", structs.ConnectProxyPrefix, "backend"),
Expand All @@ -89,10 +97,31 @@ func Test_groupConnectHook(t *testing.T) {
},
}

require.NoError(t, groupConnectHook(tgIn))
require.Exactly(t, tgOut, tgIn)
require.NoError(t, groupConnectHook(job, job.TaskGroups[0]))
require.Exactly(t, tgOut, job.TaskGroups[0])

// Test that hook is idempotent
require.NoError(t, groupConnectHook(tgIn))
require.Exactly(t, tgOut, tgIn)
require.NoError(t, groupConnectHook(job, job.TaskGroups[0]))
require.Exactly(t, tgOut, job.TaskGroups[0])
}

// TestJobEndpoint_ConnectInterpolation asserts that when a Connect sidecar
// proxy task is being created for a group service with an interpolated name,
// the service name is interpolated *before the task is created.
//
// See https://github.com/hashicorp/nomad/issues/6853
func TestJobEndpoint_ConnectInterpolation(t *testing.T) {
t.Parallel()

server := &Server{logger: testlog.HCLogger(t)}
jobEndpoint := NewJobEndpoints(server)

j := mock.ConnectJob()
j.TaskGroups[0].Services[0].Name = "${JOB}-api"
j, warnings, err := jobEndpoint.admissionMutators(j)
require.NoError(t, err)
require.Nil(t, warnings)

require.Len(t, j.TaskGroups[0].Tasks, 2)
require.Equal(t, "connect-proxy-my-job-api", j.TaskGroups[0].Tasks[1].Name)
}
5 changes: 5 additions & 0 deletions nomad/mock/mock.go
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,11 @@ func MaxParallelJob() *structs.Job {
func ConnectJob() *structs.Job {
job := Job()
tg := job.TaskGroups[0]
tg.Networks = []*structs.NetworkResource{
{
Mode: "bridge",
},
}
tg.Services = []*structs.Service{
{
Name: "testconnect",
Expand Down
19 changes: 17 additions & 2 deletions nomad/structs/structs.go
Original file line number Diff line number Diff line change
Expand Up @@ -5552,6 +5552,8 @@ func (t *Task) Validate(ephemeralDisk *EphemeralDisk, jobType string, tgServices
if t.Leader {
mErr.Errors = append(mErr.Errors, fmt.Errorf("Connect proxy task must not have leader set"))
}

// Ensure the proxy task has a corresponding service entry
serviceErr := ValidateConnectProxyService(t.Kind.Value(), tgServices)
if serviceErr != nil {
mErr.Errors = append(mErr.Errors, serviceErr)
Expand Down Expand Up @@ -5746,15 +5748,28 @@ const ConnectProxyPrefix = "connect-proxy"
// valid Connect config.
func ValidateConnectProxyService(serviceName string, tgServices []*Service) error {
found := false
names := make([]string, 0, len(tgServices))
for _, svc := range tgServices {
if svc.Name == serviceName && svc.Connect != nil && svc.Connect.SidecarService != nil {
if svc.Connect == nil || svc.Connect.SidecarService == nil {
continue
}

if svc.Name == serviceName {
found = true
break
}

// Build up list of mismatched Connect service names for error
// reporting.
names = append(names, svc.Name)
}

if !found {
return fmt.Errorf("Connect proxy service name not found in services from task group")
if len(names) == 0 {
return fmt.Errorf("No Connect services in task group with Connect proxy (%q)", serviceName)
} else {
return fmt.Errorf("Connect proxy service name (%q) not found in Connect services from task group: %s", serviceName, names)
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the code churn here. The previous error message was just extremely unhelpful. It might as well have been a hardcoded UUID because it only really made sense once you grepped the code for it.

The new errors are still pretty weird because it's a pretty weird condition that is probably due to a code error. The errors at least include a little more context.

}

return nil
Expand Down
14 changes: 5 additions & 9 deletions nomad/structs/structs_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -1599,20 +1599,20 @@ func TestTask_Validate_ConnectProxyKind(t *testing.T) {
Service: &Service{
Name: "redis",
},
ErrContains: "Connect proxy service name not found in services from task group",
ErrContains: `No Connect services in task group with Connect proxy ("redis:test")`,
},
{
Desc: "Service name not found in group",
Kind: "connect-proxy:redis",
ErrContains: "Connect proxy service name not found in services from task group",
ErrContains: `No Connect services in task group with Connect proxy ("redis")`,
},
{
Desc: "Connect stanza not configured in group",
Kind: "connect-proxy:redis",
TgService: []*Service{{
Name: "redis",
}},
ErrContains: "Connect proxy service name not found in services from task group",
ErrContains: `No Connect services in task group with Connect proxy ("redis")`,
},
{
Desc: "Valid connect proxy kind",
Expand Down Expand Up @@ -1640,12 +1640,8 @@ func TestTask_Validate_ConnectProxyKind(t *testing.T) {
// Ok!
return
}
if err == nil {
t.Fatalf("no error returned. expected: %s", tc.ErrContains)
}
if !strings.Contains(err.Error(), tc.ErrContains) {
t.Fatalf("expected %q but found: %v", tc.ErrContains, err)
}
require.Errorf(t, err, "no error returned. expected: %s", tc.ErrContains)
require.Containsf(t, err.Error(), tc.ErrContains, "expected %q but found: %v", tc.ErrContains, err)
})
}

Expand Down