Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deploy secrets in DeployTask like other resources #424

Merged
merged 14 commits into from
Mar 1, 2019

Conversation

DazWorrall
Copy link
Member

@DazWorrall DazWorrall commented Feb 25, 2019

Ref: #421 #411
Closes: #209

What are you trying to accomplish with this PR?

Have ejson secrets provisioned like any other resource, rather than separately/specially in EjsonSecretProvisioner. This PR also adds support for deploying secrets from yaml templates.

How is this accomplished?

The individual commits tell the story, it might be easier to review this PR by stepping through them individually.

What could go wrong?

A scary-ish change in this PR is adding secrets to the prune whitelist on deploy task. We were pruning secrets before of course, but in a more targeted way. Some questions I have:

  • Do we still need/want the 'managed secret' annotation?
  • What additional testing would add confidence this isn't going to break anything in the wild?
  • Would anyone prefer I separate out the 'sensitive output' refactor?
  • I flagged secrets for predeployment, on the basis that pods deployed later may need the new versions. Is that sound?

Still tbd after this: support for adding labels from ejson, and resolving how to 'template' ejson secrets. This PR is big enough already.

@DazWorrall
Copy link
Member Author

Test failures on CI are due to small output differences across k8s versions, I'll fix those.

@benlangfeld
Copy link
Contributor

I had been working on exactly the same change this morning, was nearly done with it too 😅

I flagged secrets for predeployment, on the basis that pods deployed later may need the new versions. Is that sound?

This is a requirement for my use case.

Copy link
Contributor

@benlangfeld benlangfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome

Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor thoughts

lib/kubernetes-deploy/deploy_task.rb Outdated Show resolved Hide resolved
lib/kubernetes-deploy/deploy_task.rb Show resolved Hide resolved
lib/kubernetes-deploy/ejson_secret_provisioner.rb Outdated Show resolved Hide resolved
@benlangfeld
Copy link
Contributor

I need this and #421 quite urgently. Is there anything I can do to help move either forward? I do not have access to buildkite to see what the failures are :/

@DazWorrall
Copy link
Member Author

@benlangfeld some context on timelines: I'm actively working on this, but not exclusively, and as I'm separated from my reviewers by a few timezones this will probably take in the order of ~days to complete - not a few hours, but (hopefully) not ~weeks either.

Also, sorry for not communicating more loudly I was doing this before you started hacking yourself.

@DazWorrall DazWorrall force-pushed the secrets-as-resources branch 3 times, most recently from 63bd3aa to 9cfa46f Compare February 26, 2019 10:27
@DazWorrall
Copy link
Member Author

An annoyance to fix later: I couldn't replicate the policial failures locally, so I guess we have some config drift.

@DazWorrall DazWorrall force-pushed the secrets-as-resources branch 2 times, most recently from df9a98b to 57d3e9a Compare February 26, 2019 10:53
@DazWorrall DazWorrall marked this pull request as ready for review February 26, 2019 11:09
@DazWorrall
Copy link
Member Author

This is ready for another pass.

@rendhalver
Copy link

* Do we still need/want the 'managed secret' annotation?

I can't find a reference to this annotation?
By the name I would assume it would stop a secret deployed by something else from being removed by kubernetes-deploy?
Am I correct?

I would think keeping that annotation is a good idea and would be useful for our setup.

@DazWorrall
Copy link
Member Author

DazWorrall commented Feb 26, 2019

I can't find a reference to this annotation?

Secrets created from ejson are currently annotated with kubernetes-deploy.shopify.io/ejson-secret=true.

By the name I would assume it would stop a secret deployed by something else from being removed by kubernetes-deploy? Am I correct?

No, it's used by the current implementation to ensure only 'managed' secrets are pruned, but after this the lifecycle of secrets will be managed like any other resource.

So I guess the answer to my own question is "no it's not needed", but want to make sure.

Copy link
Contributor

@benlangfeld benlangfeld left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This works great for me in a dummy app. I can even deploy Secrets both via EJSON and normal templates in the same deployment. The ejson-keys Secret is not pruned, but others are pruned correctly. Perfect from my perspective.

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we still need/want the 'managed secret' annotation?

I think it's helpful, and its current text actually still makes sense, but our references to it as a "management" thing should change to avoid confusion for future contributors.

What additional testing would add confidence this isn't going to break anything in the wild?

I don't think there's any additional unit/integration tests we can do unless we decide to make a transitional version (see my first inline comment). Each org is going to need to audit its namespaces for secrets that have been applied. I don't really see a viable way around that requirement other than disabling secret pruning (which I'd rather not do).

Would anyone prefer I separate out the 'sensitive output' refactor?

No, this is fine.

I flagged secrets for predeployment, on the basis that pods deployed later may need the new versions. Is that sound?

Yep!

Still tbd after this: support for adding labels from ejson, and resolving how to 'template' ejson secrets. This PR is big enough already.

Agreed.

lib/kubernetes-deploy/ejson_secret_provisioner.rb Outdated Show resolved Hide resolved
lib/kubernetes-deploy/ejson_secret_provisioner.rb Outdated Show resolved Hide resolved
lib/kubernetes-deploy/kubernetes_resource/secret.rb Outdated Show resolved Hide resolved
test/helpers/fixture_set.rb Outdated Show resolved Hide resolved
test/integration/kubernetes_deploy_test.rb Outdated Show resolved Hide resolved
end

def test_can_deploy_template_dir_with_only_secrets_ejson
ejson_cloud = FixtureSetAssertions::EjsonCloud.new(@namespace)
ejson_cloud.create_ejson_keys_secret
assert_deploy_success(deploy_fixtures("ejson-cloud", subset: ["secrets.ejson"]))
assert_logs_match_all([
"Deploying kubernetes secrets from secrets.ejson",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add an assertion on the line you replaced this with either here or in one of the main ejson provisioning tests?

test/unit/kubernetes-deploy/kubectl_test.rb Show resolved Hide resolved
CHANGELOG.md Outdated
@@ -1,5 +1,8 @@
## next

*Features*
- Support for deploying Secrets from templates ([#424](https://github.com/Shopify/kubernetes-deploy/pull/424)).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we also need a flashy "Breaking change" entry here because of the pruning implications and the critical nature of secrets. These are the cases I've thought of so far:

  • If you previously used this gem to deploy secrets from EJSON and the first time commit you deploy using this version removes one or more of them, they will not be pruned.
    • We could actually avoid this one by making a transitional release, but I'm not sure it is justified, since the impact is not deleting something, we're not at 1.0, and we have been maintaining this changelog strictly for folks for a long time.
  • If you previously manually kubectl apply'd secrets that are not passed to kubernetes-deploy, your first deploy using this version is going to delete them
    • We could potentially make a transitional version log warnings about these, but I kinda doubt people would notice them. Users (including us at Shopify) are going to have to audit their cluster(s) for applied secrets before rolling out this version regardless. ⚠️
  • If you previously passed secrets manifests to kubernetes-deploy (we would have ¯\_(ツ)_/¯ applied them) and they are no longer in the set you pass to the first deploy using this version, it will delete them

Can you think of any others? One case I think actually doesn't cause trouble is deploying with the new version and then rolling back to deploying with the old (as long as we keep our ejson secret annotation).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you think of any others?

I cannot 👍

One case I think actually doesn't cause trouble is deploying with the new version and then rolling back to deploying with the old (as long as we keep our ejson secret annotation).

Which is a solid reason to keep the annotation in itself!

@KnVerey
Copy link
Contributor

KnVerey commented Feb 27, 2019

Did a 🎩 (PRINT_LOGS=1) of some of the tests and have a couple additional thoughts based on the output below.

image

image

image

I think we should:

  • Remove the first two lines in the box. Seems on the too-verbose side now that secret creation isn't a separate phase.
  • Either suppress the "Discovering templates" when there are none, or even better, make the ejson secrets just show up in that list (maybe like - Secret/monitoring-token (from ejson))
  • Search the provisioner for references to "Creation" that need to be changed to "Generation"

@benlangfeld
Copy link
Contributor

@DazWorrall I'm addressing some of those code review comments in #425

DazWorrall and others added 4 commits March 1, 2019 15:20
Thanks so much @benlangfeld for your help with these:

* Remove excess logging

* Stop referring to EJSON secrets as generically "managed"

* Consistent timeout for Secret resources

Unifying the constant used for simple resources of this type is left as an exercise for another change, mostly because the name of such a thing may be controversial and I don't want to block merging this on detailed review.

* Give secrets the same statsd tags as other resources

* Removes duplicate spec

Doesn't test anything more than test_create_and_update_secrets_from_ejson

* Replaces log assertion

* Point out breaking changes in CHANGELOG

* Include ejson generated secrets in discovery log
* Rebase on master
* Fix changelog after rebase
* Remove unnecessary logging
* Fix unknown Secret status
* Update test name to not include 'update'
* Write new unrecognized resource test
@DazWorrall
Copy link
Member Author

@KnVerey rebased and addressed your feedback. I had to use some unpleasant stubbing to get the tests done but I'm happy we've covered all the known failure states now.

I had to add a couple of serial tests - they proved flaky during CI when ran in parallel, but running in isolation they're fine so there's a race somewhere I don't understand.

@DazWorrall test failures fixed at #435

Sorry I missed this @benlangfeld, I was iterating between meetings and not paying attention to my inbox :( I appreciate the thought!

@benlangfeld
Copy link
Contributor

Sorry I missed this @benlangfeld, I was iterating between meetings and not paying attention to my inbox :( I appreciate the thought!

No problem. Just doing whatever I can to get this thing merged.

Copy link
Contributor

@KnVerey KnVerey left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple more small comments, but LGTM.

refute_logs_match("kind: Deployment") # content of the sensitive template
end

def test_apply_failure_with_sensitive_resources_hides_raw_output
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is the stubbing that makes the test above concurrency-unfriendly (mocha isn't threadsafe), but this one looks really normal and should be able to run in parallel. What flakes did you see?

Copy link
Contributor

@benlangfeld benlangfeld Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These pass for me. aab294c

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If that commit fails in CI, might I propose that this be merged with the serial tests in place and we come back to this issue at lower priority? I don't think this should be a condition of merging this feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's ok to let the one with stubbing run serially permanently, but I think this one belongs in the regular file. There's not any follow-up work to do--just trying to get them committed in the right place. 😄

secret["type"] = "something/invalid"
end
assert_deploy_failure(result)
refute_logs_match(/Kubectl err:/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assertion is too general. There are a handful of other kubectl commands that get run during the test, some of which might fail and be retried. Maybe that was a source of flakiness?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9e13b30

logger.level = 0
# An invalid PATCH produces the kind of error we want to catch, so first create a valid secret:
assert_deploy_success(deploy_fixtures("hello-cloud", subset: %w(secret.yml)))
# Then try to PATCH an immutable field
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Smart! 👏

@@ -302,10 +302,10 @@ def split_templates(filename)
raise FatalDeploymentError, "Failed to render and parse template"
end

def record_invalid_template(err:, filename:, content:)
def record_invalid_template(err:, filename:, content: nil)
debug_msg = ColorizedString.new("Invalid template: #{filename}\n").red
debug_msg += "> Error message:\n#{FormattedLogger.indent_four(err)}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't we also be suppressing (or replacing) the error itself? Just because it had a filename in it doesn't really tell us what else it contains

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Like this? 97a4fd3

deployment["spec"]["template"]["spec"]["containers"].first["ports"].first["name"] = bad_port_name
end
assert_deploy_failure(result)
refute_logs_match(/Kubectl err:/)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same comment as below--need to be more specific with this assertion or it will be flakey

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in 9e13b30

@benlangfeld benlangfeld mentioned this pull request Mar 1, 2019
@benlangfeld
Copy link
Contributor

@KnVerey If I prepare a PR for those last review comments, would this get merged today?

@KnVerey
Copy link
Contributor

KnVerey commented Mar 1, 2019

@KnVerey If I prepare a PR for those last review comments, would this get merged today?

I'm in a bunch of meetings right now, but I'll do my best

@benlangfeld
Copy link
Contributor

@KnVerey Could #438 possibly get run through CI?

Copy link
Contributor

@dturn dturn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

minor stuff, but I'd be ok with this as is

README.md Outdated
4. Add the a basic example of the type to the hello-cloud [fixture set](https://github.com/Shopify/kubernetes-deploy/tree/master/test/fixtures/hello-cloud) and appropriate assertions to `#assert_all_up` in [`hello_cloud.rb`](https://github.com/Shopify/kubernetes-deploy/blob/master/test/helpers/fixture_sets/hello_cloud.rb). This will get you coverage in several existing tests, such as `test_full_hello_cloud_set_deploy_succeeds`.
5. Add tests for any edge cases you foresee.
4. Add the new class to list of resources in
[`deploy_task.rb`](https://github.com/Shopify/kubernetes-deploy/blob/6a0dd662735bbcc0c0cf110d049a08a044a07dd1/lib/kubernetes-deploy/deploy_task.rb#L8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason these don't point to master?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #438

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason these don't point to master?

I used a specific commit so the line numbers don't rot, not a huge deal.

warn_msg = "WARNING: Any resources not mentioned in the error(s) below were likely created/updated. " \
"You may wish to roll back this deploy."
@logger.summary.add_paragraph(ColorizedString.new(warn_msg).yellow)

unidentified_errors = []
sensitive_filenames = resources.select(&:kubectl_output_is_sensitive?).map { |r| File.basename(r.file_path) }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: sensitive_filenames -> filenames_with_sensitive_content

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed in #438

@@ -21,6 +21,7 @@ def assert_all_up
assert_stateful_set_up
assert_job_up
assert_network_policy_up
assert_secret_created
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why _created and not _present

Copy link
Contributor

@benlangfeld benlangfeld Mar 1, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to avoid confusing this with the more generic method of the same name in the superclass.

@benlangfeld
Copy link
Contributor

Looks like this is ready 💃

@KnVerey KnVerey merged commit 00e6b55 into master Mar 1, 2019
@KnVerey KnVerey deleted the secrets-as-resources branch March 1, 2019 22:02
@benlangfeld
Copy link
Contributor

Thank you to everyone involved. This change is very important for my use case and I'm very grateful for it getting to master ❤️

@KnVerey
Copy link
Contributor

KnVerey commented Mar 1, 2019

@benlangfeld if you need this immediately, can you use a git ref to reference it from master for a few days? I'd really like to include #415 in the next release too; it's very nearly ready but Tim was off today.

@benlangfeld
Copy link
Contributor

Absolutely. I also need #421 , but at least I can now rebase that on less of a moving target.

This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants