Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

e2e test for connect with consul acls #6982

Merged
merged 8 commits into from
Jan 31, 2020
Merged

Conversation

shoenig
Copy link
Member

@shoenig shoenig commented Jan 24, 2020

This PR introduces a way to enable Consul ACLs in our e2e environment (as provisioned with TF), and adds some basic tests for using Consul Connect from Nomad with Consul ACLs enabled.

The bootstrap script is meant to be re-usable inside a single e2e run, so future tests could also make use of it if they wish. It's very slow on the first run, and kind of slow on subsequent runs since agents need to be restarted to pick up ACL config changes. Hopefully this will all be made obsolete by shipyard in the near future!

The queue of branches to merge for connect w/ acls
master <- dev-connect-acls (tracking) <- f-connect-acl-cleanup (#6905) <- f-e2e-consul-acls (this)

Addresses #6917 (at least to the point needed for these Connect tests)

@shoenig shoenig changed the title wip e2e test for connect with consul acls e2e test for connect with consul acls Jan 24, 2020
Copy link
Member

@tgross tgross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍 I've left a few comments/questions but nothing that should be a blocker.

e2e/consulacls/acl-enable.hcl Show resolved Hide resolved

## Usage

The `consul-acls-manage.sh` script can be used to manipulate the Consul cluster
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README should probably mention that it must be run from the e2e/ directory?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1,25 +1,57 @@
package provisioning
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for taking another crack at this file... I admit it was fairly under-developed. 😀

e2e/consulacls/consul-acls-manage.sh Outdated Show resolved Hide resolved
sleep 1
done

for linux_client in ${linux_clients}; do
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like we're not stopping/starting the Windows clients. Does this cut the Window clients out from having Consul workloads once they're activated on the servers? (That's probably ok for now but we'll want to fix that when we get the other workload tests added for Windows.)

e2e/connect/acls.go Outdated Show resolved Hide resolved
operatorToken := tc.createOperatorToken(policyID, f)
t.Log("created operator token:", operatorToken)

// === Register the Nomad job ===
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic you have here looks solid, but it looks like it could be generalized into e2eutil/utils.go to make what's already there more robust? It looks from your commit like what we have with require.Eventually there is brittle, so if we could reuse this that would be an across-the-board improvement.

func (runner *LinuxRunner) Run(script string) error {
ctx, cancel := context.WithTimeout(context.Background(), 10*time.Minute)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing to do, just thinking out loud: We probably want to be able to pass a context to the provisioner or make this timeout configurable at some point.

@@ -31,3 +63,22 @@ func (runner *LinuxRunner) Copy(local, remote string) error {
}

func (runner *LinuxRunner) Close() {}

func (runner *LinuxRunner) Logf(format string, args ...interface{}) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we give LinuxRunner a testlog?

Having the NOMAD_TEST_STDOUT env var option might be nice since thats usually what we use to dump logs to stdout.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason for the way I did the ssh runner was so that we'd get logs while the provisioning process was running, and not just when it was complete as one does for normal test logging in go (even with the -v flag, the logs only get dumped at the end of the run). But I'm definitely open to improvements on that!

@shoenig shoenig force-pushed the f-e2e-consul-acls branch 2 times, most recently from b18e9c0 to 71e23f5 Compare January 29, 2020 20:28
Provide script for managing Consul ACLs on a TF provisioned cluster for
e2e testing. Script can be used to 'enable' or 'disable' Consul ACLs,
and automatically takes care of the bootstrapping process if necessary.

The bootstrapping process takes a long time, so we may need to
extend the overall e2e timeout (20 minutes seems fine).

Introduces basic tests for Consul Connect with ACLs.
This test is causing panics. Unlike the other similar tests, this
one is using require.Eventually which is doing something bad, and
this change replaces it with a for-loop like the other tests.

Failure:

=== RUN   TestE2E/Connect
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestConnectDemo
=== RUN   TestE2E/Connect/*connect.ConnectE2ETest/TestMultiServiceConnect
=== RUN   TestE2E/Connect/*connect.ConnectClientStateE2ETest
panic: Fail in goroutine after TestE2E/Connect/*connect.ConnectE2ETest has completed

goroutine 38 [running]:
testing.(*common).Fail(0xc000656500)
	/opt/google/go/src/testing/testing.go:565 +0x11e
testing.(*common).Fail(0xc000656100)
	/opt/google/go/src/testing/testing.go:559 +0x96
testing.(*common).FailNow(0xc000656100)
	/opt/google/go/src/testing/testing.go:587 +0x2b
testing.(*common).Fatalf(0xc000656100, 0x1512f90, 0x10, 0xc000675f88, 0x1, 0x1)
	/opt/google/go/src/testing/testing.go:672 +0x91
github.com/hashicorp/nomad/e2e/connect.(*ConnectE2ETest).TestMultiServiceConnect.func1(0x0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/e2e/connect/multi_service.go:72 +0x296
github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually.func1(0xc0004962a0, 0xc0002338f0)
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1494 +0x27
created by github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert.Eventually
	/home/shoenig/go/src/github.com/hashicorp/nomad/vendor/github.com/stretchr/testify/assert/assertions.go:1493 +0x272
FAIL	github.com/hashicorp/nomad/e2e	21.427s
@shoenig shoenig merged commit 0d6254e into f-connect-acl-cleanup Jan 31, 2020
@shoenig shoenig deleted the f-e2e-consul-acls branch January 31, 2020 00:38
@github-actions
Copy link

I'm going to lock this pull request because it has been closed for 120 days ⏳. This helps our maintainers find and focus on the active contributions.
If you have found a problem that seems related to this change, please open a new issue and complete the issue template so we can capture all the details necessary to investigate further.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jan 19, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants