Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix client restart behavior for Connect #6315

Merged
merged 4 commits into from
Sep 25, 2019
Merged

Fix client restart behavior for Connect #6315

merged 4 commits into from
Sep 25, 2019

Conversation

tgross
Copy link
Member

@tgross tgross commented Sep 12, 2019

Fixes #6310

I've pulled the test infrastructure work this requires out into #6313 and #6314 to keep the diff size reasonable. (Unit tests / linting won't pass till those are merged.) Done.

This fixes the client restart bug by:

  • making sure we don't run AllocRunner post-run hooks during shutdown
  • making the CreateNetwork call safe to re-run on start without recreating existing networks
  • making sure the Consul proxy socket is recreated

cc @nickethier @schmichael

@tgross tgross added this to the 0.10.0 milestone Sep 16, 2019
@tgross tgross force-pushed the e2e-client-restarts branch 4 times, most recently from b3a1aa6 to 09585ab Compare September 19, 2019 13:24
@tgross tgross marked this pull request as ready for review September 19, 2019 14:09
@tgross tgross force-pushed the e2e-client-restarts branch 3 times, most recently from 7778ab6 to fff3b4c Compare September 19, 2019 18:50
All: true,
})

container, err := d.containerByName(config.Name)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note for reviewers: I refactored this big chunk of logic in createContainer out to containerByName because I wanted to reuse it in CreateNetwork. We want callers to be able to determine what to do if the container already existed.

@tgross
Copy link
Member Author

tgross commented Sep 19, 2019

@nickethier and @schmichael it looks like I've got this working, so this should be ready for review. I've tested out both client restarts and job stops by hand, and run the included e2e test:

▶ go test -v ./e2e/ -run 'TestE2E/Connect/connect\.ConnectClientStateE2ETest/TestClientRestart'
=== RUN   TestE2E
=== RUN   TestE2E/Connect
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest/TestClientRestart
--- PASS: TestE2E (18.62s)
    --- PASS: TestE2E/Connect (18.62s)
        --- PASS: TestE2E/Connect/*connect.ConnectClientStateE2ETest (18.24s)
            --- PASS: TestE2E/Connect/*connect.ConnectClientStateE2ETest/TestClientRestart (18.11s)
PASS
ok      github.com/hashicorp/nomad/e2e  18.881s

@tgross tgross changed the title Fix network namespace parent containers Fix client restart behavior for Connect Sep 19, 2019
@tgross tgross added the theme/consul/connect Consul Connect integration label Sep 20, 2019
Copy link
Member

@nickethier nickethier left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only blocker is the question about os.Remove handling in consulsock hook.

client/allocrunner/alloc_runner.go Outdated Show resolved Hide resolved
client/allocrunner/consulsock_hook.go Outdated Show resolved Hide resolved
drivers/docker/driver.go Outdated Show resolved Hide resolved
drivers/docker/driver.go Outdated Show resolved Hide resolved
plugins/drivers/proto/driver.proto Show resolved Hide resolved
@tgross
Copy link
Member Author

tgross commented Sep 23, 2019

@nickethier this should be ready for re-review.

drivers/docker/driver.go Outdated Show resolved Hide resolved
}

container, err := d.createContainer(client, *config, d.config.InfraImage)
specFromContainer := func(c *docker.Container) *drivers.NetworkIsolationSpec {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could probably just create container here and skip the func, but this is a neat way to skip memory allocations for structs that may not be used.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little wary of creating a partial NetworkIsolationSpec and then populating the fields later just because it's the sort of thing that's easy for someone to come along later and return it early and expose a nil pointer exception in a field access. If you think the indirection of the function here isn't idiomatic I'd be happy to change it though.

@tgross tgross force-pushed the e2e-client-restarts branch 2 times, most recently from 5241f8f to fb07ec5 Compare September 24, 2019 20:49
@nickethier
Copy link
Member

The appveyor failure looks like a flake

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
theme/consul/connect Consul Connect integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Network namespace parent containers lost after client restart
3 participants