Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow custom ports for services and checks when using driver address_mode #3619

Merged
merged 16 commits into from
Dec 11, 2017

Conversation

schmichael
Copy link
Member

@schmichael schmichael commented Dec 5, 2017

Fixes #3380

Adds address_mode to checks (but no auto) and allows services and checks
to set literal port numbers when using address_mode=driver.

This allows SDNs, overlays, etc to advertise internal and host addresses
as well as do checks against either.

task_runner.go changes

Fixes a bug found when using fewer mocks in TaskRunner tests: changing only an interpolated variable used in a service or a check wouldn't actually update the service or check.

@schmichael
Copy link
Member Author

schmichael commented Dec 5, 2017

Some test binaries for the brave:

old
linux_amd64.zip
windows_amd64.zip

2017-02-06 4:10pm Pacific
linux_amd64.zip

@schmichael schmichael force-pushed the f-3380-custom-ports branch 3 times, most recently from 9239f64 to 994cc07 Compare December 7, 2017 00:04
@schmichael schmichael changed the title WIP: allow custom service/check ports with driver networking Allow custom ports for services and checks when using driver address_mode Dec 7, 2017
case <-h.doneCh:
// already closed
default:
close(h.doneCh)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran into a double close panic on the mock driver when testing, so I guarded all of these closes. Not sure why we had never hit it before...

@@ -133,7 +162,8 @@ func (m *MockDriver) Start(ctx *ExecContext, task *structs.Task) (*StartResponse
}
m.logger.Printf("[DEBUG] driver.mock: starting task %q", task.Name)
go h.run()
return &StartResponse{Handle: &h}, nil

return &StartResponse{Handle: &h, Network: net}, nil
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

mock_driver now supports DriverNetwork! I should have added this when I added DriverNetwork originally...

allocDir *allocdir.AllocDir
vault *vaultclient.MockVaultClient
consul *consul.MockAgent
consulClient *consul.ServiceClient
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this use of *mockConsulServiceClient is what exposed the task runner bug.

There's still a few tests that use that mock as they inspect the order of Consul operations which the mock makes very easy.

@@ -1079,3 +1083,44 @@ func isNomadService(id string) bool {
const prefix = nomadServicePrefix + "-executor"
return strings.HasPrefix(id, prefix)
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Below is the core address resolution logic for both services and checks.

@@ -580,6 +628,7 @@ func TestParse(t *testing.T) {
for _, d := range pretty.Diff(actual, tc.Result) {
t.Logf(d)
}
//t.Logf(pretty.Sprint(actual))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove

@@ -120,6 +122,11 @@ the script will run inside the Docker container. If your task is running in a
chroot, it will run in the chroot. Please keep this in mind when authoring check
scripts.

- `address_mode` `(string: "host")` - Same as `address_mode` on `service`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add a full example for this and link it to this

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added a full section at the bottom of that page with a link

@@ -3444,7 +3462,12 @@ func validateServices(t *Task) error {
knownServices[service.Name+service.PortLabel] = struct{}{}

if service.PortLabel != "" {
servicePorts[service.PortLabel] = append(servicePorts[service.PortLabel], service.Name)
if _, err := strconv.Atoi(service.PortLabel); service.AddressMode == "driver" && err == nil {
Copy link
Member

@preetapan preetapan Dec 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is hard to read, and I am not sure if the logic is correct. If addressmode is not driver, and the strconv succeeds (if portlabel is an integer), then should it be allowed or should that throw an error? if not, why even do the parsing?

@thetooth
Copy link

thetooth commented Dec 7, 2017

I have setup a test nomad job but am not getting working health checks:
agent: socket connection failed ':0': dial tcp :0: getsockopt: connection refused

      service {
        name = "poc-server"
        address_mode = "driver"
        port = "6379"
        check {
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }

and port. `driver` advertises the IP used in the driver (e.g. Docker's
internal IP) and uses the container's port specified in the port map. The
default is `auto` which behaves the same as `host` unless the driver
determines its IP should be used. This setting supported Docker since Nomad
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Supported in Docker

default is `auto` which behaves the same as `host` unless the driver
determines its IP should be used. This setting supported Docker since Nomad
0.6 and rkt since Nomad 0.7. It will advertise the container IP if a network
plugin is used (e.g. weave).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make each valid option sub items or a table with option/behavior as headings, and 1/2 lines explaining them. I would also start with the default option auto.

if err != nil {
return "", 0, fmt.Errorf("invalid port %q: %v", portLabel, err)
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this have a check for port > 0 - If some job file misconfiguration leads to someone specifying 0 as the port label, we should not try to use it and fail fast..

Copy link
Member

@preetapan preetapan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Had some questions and small suggestions to the documentation.

@schmichael schmichael force-pushed the f-3380-custom-ports branch 3 times, most recently from 4c41676 to c68ed09 Compare December 8, 2017 06:10
@schmichael
Copy link
Member Author

@thetooth Thanks for testing! Fixed in f4e341e

Since checks inherit the service's port (a numeric port in this case) but do not inherit the address_mode - the numeric port 6379 was being looked as a port label due to the check's default host address mode. I didn't have validation to catch this error and the empty value (0) was being used for the port!

I can see that it's a bit confusing to have port inherited by check but not address_mode, but I think the behavior matches many people's configuration (Consul is running on the host so service checks should be against the host address). Let me know if there's anything other than this validation that could help prevent such errors!

Here's another build:

linux_amd64.zip

@thetooth
Copy link

thetooth commented Dec 8, 2017

@schmichael Some usability issues:

The following configuration does not work as expected. Since I am not using a port map I tried a deploy without a port label in resources,

      service {
        name = "poc-server"
        address_mode = "driver"
        port = "6379"
        check {
          type     = "tcp"
          interval = "10s"
          timeout  = "2s"
        }
      }
      resources {
        cpu    = 512 # MHz 
        memory = 512 # MB 
        network {
          mbits = 10
        }
      }

The error for this was unable to get address for check "service: \"poc-server\" check": invalid port "6379": port label not found, so then I defined a port label:

          port "redis" {
            static = 6379
          }

Which cased unable to get address for service "poc-server": invalid port "redis": strconv.Atoi: parsing "redis": invalid syntax, finally I changed the port label to the port itself:

          port "6379" {
            static = 6379
          }

This worked and I have working service checks to the docker network(weave mesh) however as I hinted at in my earlier comment, since the port label is used docker has been instructed to publish those ports to localhost: 127.0.0.1:6379->6379/tcp, 127.0.0.1:6379->6379/udp. It would be very useful from a security perspective to not have services exposed to local sessions as an artifact of using health checks.

determines its IP should be used. This setting is supported in Docker since
Nomad 0.6 and rkt since Nomad 0.7. Nomad will advertise the container IP if a
network plugin is used (e.g. weave). See [below for
details.](#using-driver-address-mode)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you see the other comment I had about making each of the modes sub list headers for clarity?

if err != nil {
return "", 0, fmt.Errorf("invalid port %q: %v", portLabel, err)
}
if port == 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change to <=0

Copy link
Member

@preetapan preetapan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two small comments, LGTM otherwise

Fixes #3380

Adds address_mode to checks (but no auto) and allows services and checks
to set literal port numbers when using address_mode=driver.

This allows SDNs, overlays, etc to advertise internal and host addresses
as well as do checks against either.
Rely less on the mockConsulServiceClient because the real
consul.ServiceClient needs all the testing it can get!
Previously if only an interpolated variable used in a service or check
was changed we interpolated the old and new services and checks with the
new variable, so nothing appeared to have changed.
Also skip getting an address for script checks which don't use them.

Fixed a weird invalid reserved port in a TaskRunner test helper as well
as a problem with our mock Alloc/Job. Hopefully the latter doesn't cause
other tests to fail, but we were referencing an invalid PortLabel and
just not catching it before.
Fixes #3620

Previously we concatenated tags into task service IDs. This could break
deregistration of tag names that contained double //s like some Fabio
tags.

This change breaks service ID backward compatibility so on upgrade all
users services and checks will be removed and re-added with new IDs.

This change has the side effect of including all service fields in the
ID's hash, so we no longer have to track PortLabel and AddressMode
changes independently.
@schmichael
Copy link
Member Author

@thetooth Thanks for testing. The check is inheriting port=6379 from the service but not the service's address mode. This is by design as we want to maintain the old behavior of check.address_mode=host by default.

However, the error you're receiving is entirely unhelpful! The fix is to specify address_mode = "driver" on the check or expose a port via a resource & port map on the host for the check to use. I'll try to update the validation to reflect this.

@schmichael
Copy link
Member Author

Not sure why TestConfig_Listener is failing on Travis, but it passes for me locally. Merging this 😬

--- FAIL: TestConfig_Listener (0.02s)
	config_test.go:524: err: listen tcp 127.0.0.1:24000: bind: address already in use

@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Mar 16, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants