Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server,rpc,security: use a separate sql-node.crt for SQL pods #71190

Draft
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

knz
Copy link
Contributor

@knz knz commented Oct 6, 2021

Fixes #71106.

Release note (security update): Multitenant SQL servers now use a
separate certificate with CN=sql-node and filename sql-node.crt.
Using the node.crt generated for the system tenant is not possible
any more.

@knz knz requested a review from a team as a code owner October 6, 2021 02:49
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@andy-kimball andy-kimball left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this code that we'd expect to remove once we move to multiple CAs? If so, do we need to add to the CLI, since CC generates its own certificates rather than using the CLI? Also, if so, are there any changes that would be difficult to back-out later on?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @catj-cockroach and @dhartunian)

@knz
Copy link
Contributor Author

knz commented Oct 6, 2021

The CLI change exist for the benefit of manual and automated testing. This is also true of the other cli commands that generate MT certs.

The code as implemented here uses the tenant CA to sign (and verify) the new sql-node. This was an arbitrary choice on my part, and I can revert that to use the "main" CA found by a SQL server in the file ca.crt (and we'd assume that CA is specific to SQL pods, not shared with the host.cluster). I can also implement a fallback behavior, where we try one CA cert first, and if that doesn't exist, try another one.

@catj-cockroach please instruct further.

@knz
Copy link
Contributor Author

knz commented Oct 6, 2021

@chrisseto @catj-cockroach instead of the approach I've taken here (new sql-node cert), I could also update this PR to use the tenant cert directly (the one used to connect to KV pods) to connect SQL pods to each other. What do you think?

@chrisseto
Copy link
Contributor

I think I like the separation of concerns by using a sql-node cert rather than reusing the tenant -> kv cert. It feels a bit more intuitive to me. I'll defer to @catj-cockroach as the TLS expert, however.

@andy-kimball
Copy link
Contributor

I think the separate sql-node certificate is better from a separation of concerns point of view ... but if we plan to throw this out anyways, I'd lean towards whatever results in the minimum code change that's easiest to later remove.

@catj-cockroach
Copy link
Contributor

I'm sorry I should have updated here as well! Personally I'm fine with using the same certificate for both. It's fewer keys to rotate in the event of a server compromise and all the certificate is doing is attesting that it is a SQL server to the KV layer and the rest of the SQL layer. So whatever is easier for you @knz! :)

Release note (security update): Multitenant SQL servers now use a
separate certificate with `CN=sql-node` and filename `sql-node.crt`.
Using the `node.crt` generated for the system tenant is not possible
any more.
@knz
Copy link
Contributor Author

knz commented Oct 6, 2021

RFAL

Copy link
Contributor

@catj-cockroach catj-cockroach left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 33 of 37 files at r1, 29 of 29 files at r2, all commit messages.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @dhartunian and @knz)


pkg/security/certificate_loader.go, line 470 at r2 (raw file):

// values of certain fields.
// This should only be called on the NodePem CertInfo when there is no specific
// client certificate for the 'node' user.

Could we update this comment to:

// This should only be called on the NodePem CertInfo when there is no specific
// client certificate for the 'node' or 'sql-node' user.

pkg/security/certificate_manager.go, line 130 at r2 (raw file):

	uiCACert       *CertInfo // optional: certificate to verify UI certficates
	nodeCert       *CertInfo // certificate for nodes (always server cert, sometimes client cert)
	tenantNodeCert *CertInfo // certificate for SQL tenant servers (always server cert, sometimes client cert)

I'm not certain I understand this comment. Does this mean "sometimes the CertInfo value is referring to a client certificate in addition to when it's referring to a server certificate"?


pkg/security/certs.go, line 297 at r2 (raw file):

	overwrite bool,
	hosts []string,
	forTenant bool,

Why not expose baseNodeUser here instead of using a boolean?

Edit: oh, I see, we use this to determine the label and the filesystem paths! I'd argue we should have a createNodePair and createTenantNodePair and move the duplicated code that makes sense to into a new function. We could easily wrap everything from the start of the function to where we're setting the labels and determining filenames from the looks of things.


pkg/security/username.go, line 75 at r2 (raw file):

// IsNodeUser is true iff the username designates the node user.
func (s SQLUsername) IsNodeUser() bool { return s.u == NodeUser }

Should we have an equivalent version of this function for SQLNodeUser?

@knz knz marked this pull request as draft October 11, 2021 04:25
@knz knz removed the request for review from dhartunian October 11, 2021 04:46
@knz
Copy link
Contributor Author

knz commented Oct 11, 2021

we're going to deprioritize this approach in favor of #71248.

craig bot pushed a commit that referenced this pull request Oct 11, 2021
71248: rpc,security: use the tenant client cert for pod-pod communication r=catj-cockroach a=knz

Fixes #71106
Alternative design to  #71190
Epic: SEC-665

As of this patch, we have the following file usage:

- KV nodes on host cluster:
  - ui.crt (optional):
    - used as server cert for HTTP
  - ui-ca.crt (optional):
    - used in unit tests to verify the server's identity for HTTP conns
  - node.crt:
    - used as client cert for node-to-node comms
    - used as server cert for node-to-node comms
    - used as server cert for SQL clients
    - used as server cert for incoming conns from SQL tenant servers
    - used as server cert for HTTP, if ui.crt doesn't exist
  - tenant-client-ca.crt (optional):
    - used to verify client certs for SQL tenant servers
  - client-ca.crt (optional);
    - used to verify client certs for SQL clients
    - used to verify client certs for SQL tenant servers, if tenant-client-ca.crt doesn't exist
  - ca.crt:
    - used to verify other node client certs for node-to-node comms
    - used in unit tests to verify the server's identity for SQL and RPC conns
    - used to verify client certs for SQL clients, if client-ca.crt doesn't exist
    - used to verify client certs for SQL tenant servers, if neither tenant-client.ca.crt nor client-ca.crt exist

- SQL servers:
  - ui.crt (optional):
    - used as server cert for HTTP
  - ui-ca.crt (optional):
    - used in unit tests to verify the server's identity for HTTP conns
  - client-tenant.NN.crt:
    - used as client cert for node-to-node comms (SQL server to SQL server)
    - used as server cert for node-to-node comms (SQL server to SQL server)
    - used as client cert for conns to KV nodes
    - used as server cert for SQL clients
    - used as server cert for HTTP, if ui.crt doesn't exist
  - tenant-client-ca.crt (optional):
    - used to verify client certs for SQL tenant servers
  - client-ca.crt (optional);
    - used to verify client certs for SQL clients
    - used to verify client certs for SQL tenant servers, if tenant-client-ca.crt doesn't exist
  - ca.crt:
    - used to verify other SQL server certs for node-to-node comms, if tenant-client-ca.crt doens't exist
    - used to verify client certs for SQL clients, if client-ca.crt doesn't exist
    - used to verify client certs for SQL tenant servers, if neither tenant-client.ca.crt nor client-ca.crt exist
    - used in unit tests to verify the server's identity for SQL and  RPC conns

Release note (security update): Multitenant SQL servers now reuse
the tenant client certificate (`client-tenant.NN.crt`) for SQL-to-SQL
communication. Existing deployments must regenerate the certificates
with dual purpose (client and server authentication).

71330: Use include_cached to speed up build time, adding comment tags. r=knz a=ianjevans

This PR has minor changes to the Markdown output of the release notes script. The docs team now uses the `include_cached` plugin for Jekyll for common included files to speed up build times. And I wrapped the comment for the docs team in Liquid comment tags. 

release notes: none

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: ianjevans <ian@cockroachlabs.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

server: pod-to-pod communication should use same TLS cert as pod-kv comms, not node.crt
5 participants