Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cli,security: add --cert-principal-map #45819

Merged
merged 1 commit into from
Mar 13, 2020

Conversation

petermattis
Copy link
Collaborator

@petermattis petermattis commented Mar 6, 2020

Add a --cert-principal-map flag to start which specifies a comma
separated list of "cert-principal:db-principal" mappings which are used
to map the principals found in certificates to DB principals.

Renamed security.GetCertificateUser to security.GetCertificateUsers
and changed it to return a list of principals in the servers. This list
contains the Subject CommonName and all of the DNS
SubjectAlternateNames. Changed the two users of this function to search
the list for the user they are authenticating.

Added COCKROACH_CERT_NODE_USER env var which controls the principal to
place in node certs created by cert create-node. This is intended for
testing purposes only.

Fixes #28408

Release note (cli change): Add --cert-principal-map flag to start
which specifies a comma separated list of "cert-principal:db-principal"
mappings which are used to map the principals found in certificates to
DB principals. This allows the usage of "node" and "root" certificates
where the common name contains dots or a host name, which in turn allows
such certificates to be generated by certificate authorities which place
restrictions on the contents of the common name field.

Release note (security update): Allow client and node certificates to
specify the principal in either the Subject CommonName field or the
SubjectAlternateNames field. Previously, the principal could only be
specified in the Subject CommonName field.

Release justification: Category 4. Low risk, high reward change to
existing functionality. Should only affect users who specify
--cert-principal-map, and the change itself is limited in scope.

@petermattis petermattis requested a review from a team as a code owner March 6, 2020 21:02
@cockroach-teamcity
Copy link
Member

This change is Reviewable

@petermattis
Copy link
Collaborator Author

@aaron-crl, @knz this works, though I'm not convinced of the use of a regexp for this purpose. The regexp is difficult to specify on the roachprod command line because we pass --args through shell. This might cause similar difficulties for users. We could get fancy and treat a regular expression which doesn't contain a parenthesized subexpression as a suffix.

The --cert-name-pattern flag implies we need a different flag, --cert-node-user-suffix, in order for cert create-node to create a node certificate with a different common name. If we instead had --cert-name-suffix, we could make that flag pull double duty. Better? Worse?

This PR needs more testing, and an end-to-end test. I did verify manually this works end-to-end.

@petermattis
Copy link
Collaborator Author

Another question: currently we "fail open" in the sense that if the regexp doesn't match, we return the full CommonName. Is that the right behavior?

@knz
Copy link
Contributor

knz commented Mar 6, 2020

we return the full CommonName. Is that the right behavior?

This seems incorrect.

I'm glad the PR is out but I won't be able to look at it until tomorrow. also, Aaron is OOO until Monday. I see the wisdom in getting this done for 20.1 but I do not see wisdom in rushing it through the weekend.

@petermattis
Copy link
Collaborator Author

This seems incorrect.

Looking forward to your sage guidance on this.

I'm glad the PR is out but I won't be able to look at it until tomorrow. also, Aaron is OOO until Monday. I see the wisdom in getting this done for 20.1 but I do not see wisdom in rushing it through the weekend.

Completely agreed. The creation of this PR occupied my trans-atlantic flight. The review can wait until next week. I would not be disappointed if we discover a fatal flaw in this approach. We will have learned something.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. I just realized while reviewing this that you've kept the extraction working on CommonName exclusively. That's a problem: ACM does not enable customization of CN. The field we want to read the username from is SubjectAlternateName(s) - a list of strings. So it's not just a regexp that's needed, also a field name.

    I'd argue for --cert-name-pattern=<fieldname>:<regexp> with a default of CN:(.*).

  2. the rest of the change looks good modulo the previous discussion on the fallback. IMHO if the regexp doesn't match this should be rejected with an error., instead of producing the CN field silently. There are multiple reasons for that:

  • it's more likely than not that the user made a mistake with their regexp (especially given your concern with escaping).
  • by silently extending to the entire field's value, we may be extending the set of allowed users beyond what the DBA intends
  • the need to match a prefix / sub-part of the string is even more important when working off SubjectAlternativeName.

Reviewed 9 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @knz, and @petermattis)


pkg/cli/cliflags/flags.go, line 638 at r1 (raw file):

must specify exactly one parenthesized subexpression which becomes the user if
the regular expression matches a certificate's CommonName. If the regular
expression does not match, the entire CommonName field is used for the user.

Something about which TLS field is looked at is needed here.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 5 of 15 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @knz, and @petermattis)

@petermattis
Copy link
Collaborator Author

I just realized while reviewing this that you've kept the extraction working on CommonName exclusively. That's a problem: ACM does not enable customization of CN. The field we want to read the username from is SubjectAlternateName(s) - a list of strings. So it's not just a regexp that's needed, also a field name.

I'd argue for --cert-name-pattern=: with a default of CN:(.*).

I thought ACM did allow customization of CN in so far as you could specify any subdomain of a domain you owned. And doesn't ACM require that SubjectAlternateName is also a DNS name you own. If it allowed arbitrary bits in there that seems like it could be a problem.

I'm not sure about restricting the pattern to CN or SAN. What if the cert contains multiple SubjectAlternateNames? Should we return the first SAN that matches the regexp? Perhaps we can have that behavior in general. Match the regexp against CN. If it matches, return the submatch. If it doesn't match, return the first match in the list of SAN(s). If none match, return an error.

the rest of the change looks good modulo the previous discussion on the fallback. IMHO if the regexp doesn't match this should be rejected with an error., instead of producing the CN field silently. There are multiple reasons for that:

  • it's more likely than not that the user made a mistake with their regexp (especially given your concern with escaping).
  • by silently extending to the entire field's value, we may be extending the set of allowed users beyond what the DBA intends
  • the need to match a prefix / sub-part of the string is even more important when working off SubjectAlternativeName.

Ack. I'll make this change once we reach agreement on the field matching above.

@knz
Copy link
Contributor

knz commented Mar 9, 2020

Ok so first thing first I was mistaken in one area in my previous comment - ACM does indeed populate the main (first) target of the cert in CN. I had misread the output of my commands.

This means that this PR as-is is already moving the needle forward.

I thought ACM did allow customization of CN in so far as you could specify any subdomain of a domain you owned.

The requirement on ownership is only mandated when you use the ACM itself as CA. When using a private CA, there's no check.

And doesn't ACM require that SubjectAlternateName is also a DNS name you own. If it allowed arbitrary bits in there that seems like it could be a problem.

I agree but that's a red herring. If you define a cert for, say root._crdb-user-prefix.crdb.io, you can then make the cert validation succeed by either:

  • adding a mail routing rule from *@*.crdb.io to something you own (most common, also most convenient)
  • adding a TXT /CNAME record temporarily for root._crdb-user-prefix.crdb.io since you control the DNS zone.

(Also this is only needed for ACM's public CA, see above)

I'm not sure about restricting the pattern to CN or SAN. What if the cert contains multiple SubjectAlternateNames?

yes! This is good and a feature - I'd definitely like my cert to be valid for both mynodeaddr.mydomain (for the TCP auth) and myuser._crdb-user.mynodeaddr.mydomain (for the username extraction).

Should we return the first SAN that matches the regexp?

I suppose we could yes.

Perhaps we can have that behavior in general. Match the regexp against CN. If it matches, return the submatch. If it doesn't match, return the first match in the list of SAN(s). If none match, return an error.

I like that.

Copy link

@aaron-crl aaron-crl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that while this solution may solve this particular need, I feel it will further complicate our certificate user story. We are overloading the use of the node certs in ways that place additional external configuration and development burden on users and I'd really like to get away from that. I may be missing some context on the current design pattern so please forgive me (and point me to the right docs) if I am. I'm working on an RFC for an improved certificate story presently, and would rather push changes to the certificate and trust management mechanisms out until we have a chance to look at it more comprehensively than make feature changes that may be hard to support in retrospect or become inconsistent with a more robust IAM approach.

I’ve shared my working notes on certificate stories with both of you internally and would welcome feedback.

@petermattis
Copy link
Collaborator Author

@aaron-crl Let's touch base about this tomorrow. I might have invoked Cunningham's Law with this PR: "the best way to get the right answer on the internet is not to ask a question; it's to post the wrong answer."

@knz
Copy link
Contributor

knz commented Mar 9, 2020

Besides, I have cross-checked this PR and the proposed doc and they seem in agreement with each other. Maybe the only step needed is to make the link and overlap more explicit, e.g. in the release note.

@petermattis
Copy link
Collaborator Author

Besides, I have cross-checked this PR and the proposed doc and they seem in agreement with each other. Maybe the only step needed is to make the link and overlap more explicit, e.g. in the release note.

@aaron-crl gently explained that the approach in this PR opens the door to privilege escalation in certain scenarios. Doing so does require an attacker to be able to mint a certificate, but that is feasible if the certs are being provided by a domain certificate authority. His broad advice (@aaron-crl please correct me if I'm misstating this) is that we shouldn't be abusing the certs created by CAs used for domain certs for node identity. Rather, we should be considering this node identity and authorization an internal crypto primitive handled by a cluster-local CA. (This is essentially option 2 in his doc). I need to think this through a bit more, and possibly come up with more bad ideas for @aaron-crl to shoot down. I do really like the cluster-local CA idea, but it isn't feasible until 20.2.

@knz
Copy link
Contributor

knz commented Mar 10, 2020

Understood for the node identity. What of SQL client principals?

@aaron-crl
Copy link

What of SQL client principals?

This is potentially susceptible to the same type of broken auth; I would strongly discourage this pattern in general.

For example:
Let's assume the pattern extracts the principal name as all characters preceding the first "."

Normal user certs would be of the form alice.acme.example.org The public key for the CA for acme.example.org would be installed into CRDB as the trust anchor for users.

An attacker that gained access to a dev environment and had the ability to create a service named alice.random-service.user-generated.dont-trust-this-outside-dev.acme.example.org would still present as a valid alice user.

This presents a difficult to contextualize risk for operators where they may not realize how certs, wildcards, and the database interact and inadvertently create an insecure deployment.

@petermattis
Copy link
Collaborator Author

Let's assume the pattern extracts the principal name as all characters preceding the first "."

Would this be safe if we always stripped off a fixed suffix? For example, what if we stripped off the suffix acme.example.org here? Then this prefix attack wouldn't work. Would other attacks still be feasible? I'm pretty sure I don't have the right mindset for this stuff.

@aaron-crl
Copy link

It would certainly halt the prefix attack though I could see cases where it would lead to more complex but successful patterns where attackers might still gain access to reserved accounts (happy to chat about these offline). This raises the cost for an attacker but also limits the ways in which a user may specify their cert principles.

I would avoid approaches that extract AuthN primitives from CN and SAN substrings. It adds complexity and reinforces our overloaded certificate-use practice by forcing the security principle in the cert to present differently in different circumstances. This undermines the flexibility of the credential and strength of the issuing trust authority.

@knz
Copy link
Contributor

knz commented Mar 10, 2020

So let me stretch your viewpoint in a tangential technical direction, to make sure I understand.

Today a SQL client presents a TLS client cert and also a principal name in the first payload packet after the TLS handshake has completed. We verify 3 things:

  • that the principal name as presented in the payload exists in the DB
  • what the authentication method is selected for that client conn (the DBA can choose to request either password, cert or cert-password which means 'cert OR password').
  • if the selected authn method allows or requires client cert, we check that the principal name in the TLS client cert (CN) matches the name announced in the payload. If it doesn't, then authentication fails (or degrades to asking for a password if password auth is allowed via cert-password).

My questions:

  1. can we preserve the notion of "certificate-based authentication" if we can't extract a principal name from a cert any more?

  2. Would we be "degrading" the security of cert-only authentication if we trust the principal in the payload and don't check the CN any more, and just validate that the cert has the right CA?

I guess what I'm trying to get from this convo is "what happens with our current cert-based authn story" and in particular what can we offer to users who prefer to NOT use passwords for authentication.

@aaron-crl
Copy link

  1. can we preserve the notion of "certificate-based authentication" if we can't extract a principal name from a cert any more?
  2. Would we be "degrading" the security of cert-only authentication if we trust the principal in the payload and don't check the CN any more, and just validate that the cert has the right CA?

We should continue to extract the principal name and check it unless the operator has disabled certificate AuthN. This approach should not change current function from a user AuthN perspective except that it permits/requires the user's CA public key be distinct from the database's internal communications and service ones. I'm advocating against extracting substrings from these principals only.

Certificate specified security principals should match database security principals.

@petermattis
Copy link
Collaborator Author

Certificate specified security principals should match database security principals.

This is the part I'm still struggling to internalize. If the cert says the principle is foo.bar.blah, we could treat that as a node user by adding a flag such as --cert-alt-node-principle=foo.bar.blah. In general, we could map from cert principle to any user principle rather than requiring an exact match. Certain mappings could be problematic (i.e. the prefix attack mentioned earlier), but others seem innocuous on the surface. I'd like to internalize why they aren't innocuous.

@aaron-crl
Copy link

I'm not sure I understand here.

we could treat that as a node user by adding a flag such as --cert-alt-node-principle=foo.bar.blah

Where would we do this, would this be a flag for every potential node that would connect to a running node? 🤔

@petermattis
Copy link
Collaborator Author

we could treat that as a node user by adding a flag such as --cert-alt-node-principle=foo.bar.blah

Where would we do this, would this be a flag for every potential node that would connect to a running node?

Yes, that is what I was imagining. The general form of this would be to have a table mapping cert principle name to DB principle name. Something like: --cert-principle-map=foo.bar.blah:node,root.mydomain.com:root.

@aaron-crl
Copy link

If I'm understanding correctly, this moves away from parsing a value out of the field and instead moves to whitelisting the principal names for all nodes at node start time?

@aaron-crl
Copy link

Ah, good catch, this is one of the advantages of using SAN over CN (it's a typed value).

Given our current intended use (node principal names for FQDN certs issued to nodes). It's probably safe to just use the strings from DNSNames. We can revisit if we change this usage pattern.

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given our current intended use (node principal names for FQDN certs issued to nodes). It's probably safe to just use the strings from DNSNames. We can revisit if we change this usage pattern.

👍

I'm going to flesh out the testing. I'll ping this PR again when it is ready for another look.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @knz, and @petermattis)

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fleshed out the testing, and added an end-to-end test. This is ready for another look.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @knz, and @petermattis)


pkg/cli/interactive_tests/test_cert_advisory_validation.tcl, line 65 at r5 (raw file):

eexpect "interrupted"
expect $prompt
end_test

I'm currently only checking that --cert-principal-map works for the node cert, but there is a common code path used for other certs. Is it worth checking that it works for client certs as well?


pkg/security/certs.go, line 269 at r5 (raw file):

	}

	nodeUser := envutil.EnvOrDefaultString("COCKROACH_CERT_NODE_USER", "node")

This was added to make testing easier. See cli/interactive_tests.

@aaron-crl
Copy link

If it's validating client certs, then it should be able to accept client CN and SAN patterns. This would only be gated by what client principle names we permit AND what field type the SAN carries that value in. My expectation would be that this could then include EmailAddress and IP but not URI but I don't have data to support this. It might be worth asking our customer facing teams if they have exemplars.

If my assumption holds here, I would recommend string compares against those two fields too then wait until we separate cert usages before revisiting.

@petermattis petermattis force-pushed the pmattis/cert-name-pattern branch 2 times, most recently from e4fe2f6 to 2154391 Compare March 11, 2020 23:48
Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl, @knz, and @petermattis)


pkg/security/certificate_loader.go, line 451 at r5 (raw file):

	// Check Subject Common Name.
	//
	// TODO(peter): Test this function when a certPrincipalMap is present.

I just noticed I still have to handle these TODOs. I'll be tackling them tomorrow.

@petermattis petermattis changed the title cli,security: add --cert-name-pattern cli,security: add --cert-principal-map Mar 12, 2020
Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides one comment with @knz regarding the flag help text, I think this is ready to go.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @aaron-crl and @knz)


pkg/cli/cliflags/flags.go, line 638 at r1 (raw file):

Previously, knz (kena) wrote…

Something about which TLS field is looked at is needed here.

I still need to address this.


pkg/security/certificate_loader.go, line 451 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I just noticed I still have to handle these TODOs. I'll be tackling them tomorrow.

These have been tackled. See TestManagerWithPrincipalMap.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is ready to go.

With some tech nits below, I agree.

Reviewed 4 of 13 files at r4, 7 of 7 files at r6.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @petermattis)


pkg/cli/context.go, line 120 at r6 (raw file):

	startCtx.serverInsecure = baseCfg.Insecure
	startCtx.serverSSLCertsDir = base.DefaultCertsDirectory
	startCtx.serverListenAddr = ""

Add the default for the new field here too. (= nil I think)


pkg/cli/interactive_tests/test_cert_advisory_validation.tcl, line 65 at r5 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I'm currently only checking that --cert-principal-map works for the node cert, but there is a common code path used for other certs. Is it worth checking that it works for client certs as well?

I think this is OK.


pkg/rpc/context.go, line 131 at r6 (raw file):

			// NodeUser. This is not a security concern, as RootUser has access to
			// read and write all data, merely good hygiene. For example, there is
			// no reason to permit the root user to send raw Raft RPCs.

@aaron-crl for bookkeeping please add this TODO as one of the items to work on in 20.2: "Separate the credentials needed for node-node RPCs from those needed for client-node RPCs."


pkg/rpc/context.go, line 132 at r6 (raw file):

			// read and write all data, merely good hygiene. For example, there is
			// no reason to permit the root user to send raw Raft RPCs.
			var ok bool

@petermattis maybe share this loop and the check using a single exported function in the security package, shared between here and the UserAuthCertHook function.


pkg/security/auth.go, line 38 at r6 (raw file):

type UserAuthHook func(string, bool) error

// SetCertPrincipalMap TODO(peter)

add docstring for the function.


pkg/security/auth.go, line 122 at r6 (raw file):

		// except if the certificate user is NodeUser, which is allowed to
		// act on behalf of all other users.
		var ok bool

... here (see my comment above)


pkg/security/certificate_manager_test.go, line 77 at r6 (raw file):

	defer func() { _ = security.SetCertPrincipalMap(nil) }()
	defer func() {
		_ = os.Setenv("COCKROACH_CERT_NODE_USER", security.NodeUser)

don't you need to also defer envutil.ClearEnvCache() here? we're not re-reading the env var otherwise in other tests run in the same package.

@aaron-crl
Copy link

Aside from @knz 's nits, this looks good to me. 👍

Copy link
Collaborator Author

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz)


pkg/cli/context.go, line 120 at r6 (raw file):

Previously, knz (kena) wrote…

Add the default for the new field here too. (= nil I think)

Done.


pkg/cli/cliflags/flags.go, line 638 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I still need to address this.

Done.


pkg/rpc/context.go, line 132 at r6 (raw file):

Previously, knz (kena) wrote…

@petermattis maybe share this loop and the check using a single exported function in the security package, shared between here and the UserAuthCertHook function.

I was wondering if that repetition would bother you too much. Done.


pkg/security/auth.go, line 38 at r6 (raw file):

Previously, knz (kena) wrote…

add docstring for the function.

Done.


pkg/security/auth.go, line 122 at r6 (raw file):

Previously, knz (kena) wrote…

... here (see my comment above)

Done.


pkg/security/certificate_manager_test.go, line 77 at r6 (raw file):

Previously, knz (kena) wrote…

don't you need to also defer envutil.ClearEnvCache() here? we're not re-reading the env var otherwise in other tests run in the same package.

Yep. That's why tests were failing. Thanks.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 7 of 7 files at r7.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained

Add a `--cert-principal-map` flag to `start` which specifies a comma
separated list of "cert-principal:db-principal" mappings which are used
to map the principals found in certificates to DB principals.

Renamed `security.GetCertificateUser` to `security.GetCertificateUsers`
and changed it to return a list of principals in the servers. This list
contains the Subject CommonName and all of the DNS
SubjectAlternateNames. Changed the two users of this function to search
the list for the user they are authenticating.

Added `COCKROACH_CERT_NODE_USER` env var which controls the principal to
place in node certs created by `cert create-node`. This is intended for
testing purposes only.

Fixes cockroachdb#28408

Release note (cli change): Add `--cert-principal-map` flag to `start`
which specifies a comma separated list of "cert-principal:db-principal"
mappings which are used to map the principals found in certificates to
DB principals. This allows the usage of "node" and "root" certificates
where the common name contains dots or a host name, which in turn allows
such certificates to be generated by certificate authorities which place
restrictions on the contents of the common name field.

Release note (security update): Allow client and node certificates to
specify the principal in either the Subject CommonName field or the
SubjectAlternateNames field. Previously, the principal could only be
specified in the Subject CommonName field.

Release justification: Category 4. Low risk, high reward change to
existing functionality. Should only affect users who specify
`--cert-principal-map`, and the change itself is limited in scope.
@petermattis
Copy link
Collaborator Author

TFTR! CI indicates there was a missing error check. Should be fixed now.

@petermattis
Copy link
Collaborator Author

bors r+

@craig
Copy link
Contributor

craig bot commented Mar 13, 2020

Build succeeded

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

security: customizable/required fields in certificate
4 participants