Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certificate Duration Minimums #60

Closed
anEXPer opened this issue Jul 12, 2021 · 9 comments
Closed

Certificate Duration Minimums #60

anEXPer opened this issue Jul 12, 2021 · 9 comments

Comments

@anEXPer
Copy link

anEXPer commented Jul 12, 2021

Use Case

Operators with complex sets of BOSH deployments using Credhub as a config server do not always have control or insight into every manifest generated in their foundation, especially in cases where On Demand Brokers are in play. This greatly complicates certificate policy management, and makes it difficult to manage the frequency of certificate rotations, for both leaf and CA certs.

Certificate rotations can be complex, difficult, heavily manual, risky processes, and operators would like to be able to place upper bounds on their frequency by placing lower bounds on generated certificate durations.

Proposal

We would like to add two properties to the credhub spec:

  1. credhub.certificates.minimum_ca_duration
  2. credhub.certificates.minimum_leaf_cert_duration

Both would be specified in days, consistent with the parameters.duration request field on the Credhub API and the duration generation option for certificates in the BOSH variable definition/config server system.

The minimum server-side behavior looks something like this:

When certificate generation occurs,
check if the supplied duration (including default duration)
is less than the appropriate minimum, if set
(minimum_ca_duration if is_ca is true, minimum_leaf_cert_duration otherwise).
If it is, instead generate a certificate using the specified minimum.

Let's say, if we stop here, that we have Iteration 0 (i0) of the design - enough to satisfy the immediate concern, but with some work left to do to clean up the edge cases and implications.

I'll list some some of the i0 behaviors that I consider problems as if they were part of the specified behavior, and then move on to expanding the spec to deal with them:

i0

store the non-minimum duration as the duration of the variable.
When reporting the duration of the stored certificate,
report the stored (false, overridden) duration without comment.
If a regeneration request with converge is received,
compare the requested duration to the stored (false, overridden) duration.

This results in the perverse case that the certificate will be regenerated in a case where the requested duration is changed to match the duration that the currently stored certificate was actually generated with.

It's also, of course, very confusing from an audit/troubleshooting perspective. There may not even be a log line recording the strange behavior of the generation system, and even if there was, the log line would likely be lost to time.

A stronger design would store the fact that the certificate was overridden, report it in at least the CLI, specify logging behavior, and discuss and design for what happens when this override is turned off after having been used to generate some certificates. These require storing override options in the database.

I am submitting this issue for tracking/linking/discussion, but intend to expand it with at least one additional iteration better-resolving these issues and more thoroughly explaining the use-case.

i1

So how can we address these problems? Here, the fact that I'm still ramping on the underlying code might become more apparent, but hopefully nothing I'm assuming is too different from how the system currently works, and while the code changes to make this happen might be a little broader, I think we can get a much more satisfactory system.

If we store the fact that there was a minimum override in place when the credential was generated, we can refer to the stored alternative when testing equality, while preserving the original requested value.

This also opens our options in terms of what we do when handling changes to the configured overrides.

So we need additional design constraints for safety:

  1. Credhub must accurately report the current expiration date of the stored certificate
  2. Credhub CLI must - at generation time - note if the duration was overridden
  3. The duration used to generate a given certificate must be accurately reported/used for the stored credential.
  4. Enabling this feature, or having it enabled after certificates were generated without it, must not trigger regenerations that otherwise would not have happened
  5. Disabling this feature, or having it disabled after certificates were generated with it, must not trigger regenerations that otherwise would not have happened.

Based on how Credhub actually works, the i0 description doesn't really apply, and these additional constraints are easy to satisfy. If we had known this going into writing this issue, we might not have raised the whole i0 idea at all.

Specifically, Credhub does not store the duration a certificate was generated with - to the extent it is used elsewhere (for instance, in "regenerate" request with no duration specified) it is derived from the not-valid-before and valid-until dates in the actual stored certificate. The additional clauses necessary to the original set of behaviors in order to satisfy the i1 constraints are

Do not trigger converge update strategy regeneration
based on differences in requested and derived duration.

This would be a global behavioral change,
but might be considered a fix rather than a feature in the first place,
as duration is arguably not appropriate to converge on, anyway.

and

Make it visible via the API if the overrides are active.
If they are, report this in the CLI on stderr on generation or regeneration.

The final spec for this change, which satisfies all above constraints, is:

When certificate generation occurs,
check if the supplied duration (including default duration)
is less than the appropriate minimum, if set
(minimum_ca_duration if is_ca is true, minimum_leaf_cert_duration otherwise).
If it is, instead generate a certificate using the specified minimum.
Do not trigger converge update strategy regeneration
based on differences in requested and derived duration.
Make it visible via the API if the overrides are active.
If they are, report this in the CLI on stderr on generation or regeneration.

@cf-gitbot
Copy link
Collaborator

We have created an issue in Pivotal Tracker to manage this:

https://www.pivotaltracker.com/story/show/178849355

The labels on this github issue will be updated when the story is started.

@xyloman
Copy link

xyloman commented Jul 12, 2021

@anEXPer I agree with these concerns. Waiting to see where you take this. +1

@pvaramballypivot
Copy link
Contributor

@bruce-ricard @peterhaochen47 @anEXPer
cc @ystros

Hello team,

We summarized points 2 and 3 under i1. We have some questions about the requirements and also wanted to discuss the different approaches to implement the CLI requirement.

Requirements

  • CLI

    • Show that the duration was overridden in the generation/regeneration output. Since there is an option to get the output in json, it might break any code that is parsing this output and expecting a fixed set of fields. Thoughts?
  • Logs

    • Include "something" in the logs. What information would be useful to the security auditors?
      • Requested duration?
      • Overridden or not?

Implementation approaches for the CLI requirement

  1. Expose the minimum durations in a generic credhub cert API endpoint. Ex: /info. When the generate and regenerate CLI commands are called, then they would call 2 APIs
    • First API would be a call to generate/regenerate
    • Second API call would be to /info (example) to fetch the minimum durations in order to check if the cert was generated using this or the requested duration and then modify the CLI output to show whether the duration was overridden or not.
  2. The CLI could make 2 API calls
    • First API would be a call to generate/regenerate
    • Second API call would be to get the cert details (api/v1/data) in order to check if the duration used is same as the requested duration or not. Might run into permission issues?
  3. Add a field to the API responses for generate and regenerate. This might break any code that is parsing this response and expecting a fixed set of fields.

Ask

  • An answer to the requirement questions. Perhaps @anEXPer ?
  • Which of the 3 approaches does the credhub team think is preferred and less risky for the CLI requirement?

@peterhaochen47
Copy link
Member

peterhaochen47 commented Jul 20, 2021

@pvaramballypivot
My opinion is that a flavor option 3 might work the best:

There's probably a way to preserve the original json response format while also returns a message to the CLI user. Currently, according to the doc, the CredHub API response to a generate request looks like:

{
  "type" : "certificate",
  "version_created_at" : "2019-02-01T20:37:52Z",
  "id" : "450409f3-0057-42c7-b4f3-2b8c85ee7fde",
  "name" : "/some-certificate-name",
  "metadata" : {
    "description" : "example metadata"
  },
  "value" : {
    "ca" : "zzz",
    "certificate" : "yyy",
    "private_key" : "xxx",
    "transitional" : false,
    "certificate_authority" : true,
    "self_signed" : false,
    "generated" : true,
    "expiry_date" : "2020-09-03T18:30:11Z"
  }
}

And the CLI response looks similar, but with the value redacted. Perhaps we can add the "override notice" in the metadata field? The nested map under metadata field does not have any fixed set of fields, as it is designed for users to store any key value pairs they want. This approach deviates from the original intention of the metadata field a little bit (see this github issue), as currently this field is solely populated by user input; but I don't think it's bad as long as the key we put under metadata is unique enough.

Another consideration is the API response itself, as users can interact with CredHub API directly without CredHub CLI. In that case, we also want to give users the "override notice" as well in the raw API response, right? And Option 1 & 2 do not cover the raw API response.

Another option is to add a new top level field in the raw API response, which is your original option 3, which you said "might break any code that is parsing this response and expecting a fixed set of fields." But I'm thinking that an additive field does not constitute a breaking change, since most API users should just parse the response and extract the keys they look for, instead of expecting a fixed set of fields? This article agrees. We might have to bump CredHub's minor version if we go for this option though.

I'll ask my teammate @bruce-ricard to also chime in.

@ystros
Copy link

ystros commented Jul 20, 2021

Your description of the metadata field matches our understanding. Since it's user-provided content, we were hesitant to modify it for any reason.

I agree that adding new fields to an API response should not constitute a breaking change. However, I have a vague memory of adding a new field to one of the Ops Manager API responses in the past that ended up breaking something the om CLI, which parses the response. I thought we found this was in relation to how the responses were being parsed in Go, which was surprising to me.

It is quite possible I'm misremembering what the situation actually was, however.

In general I'd be in favor of adding a new field to the API response if ya'll do not think it's risky to do so. The implementation approaches we presented were mostly to get around having to add to the response.

@anEXPer
Copy link
Author

anEXPer commented Jul 22, 2021

Yeah, the om CLI was too sensitive to these things, as a detail of how it was unmarshalling responses. We fixed it, because that sensitivity was an error.

@pvaramballypivot
Copy link
Contributor

pvaramballypivot commented Jul 29, 2021

@peterhaochen47 Proposed Release Notes:

Release Notes:

Improvements:

  • Add logic to use minimum certificate duration for certificate generation.

Behavioral Changes:

  • Update converge logic to exclude the duration parameter when regenerating certificates. Since duration is not an identifying property of a certificate, converge should not have considered it previously.

@anEXPer
Copy link
Author

anEXPer commented Jul 29, 2021

I've updated the issue to accurately describe the conclusions we've reached. Great work, errybody.

@peterhaochen47
Copy link
Member

Closing this issue as the feature has been implemented.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants