Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TLS, fine-grain Authorization, and List subscriptions #390

Merged
merged 8 commits into from
Aug 29, 2024

Conversation

jgraettinger
Copy link
Contributor

@jgraettinger jgraettinger commented Jun 25, 2024

This is a stack of changes which improve the security and authorization posture of Gazette. They:

  • Add TLS support for transport security, using either server certificates or full mutual TLS verification, and including private CA's.

  • Introduce a fine-grain authorization and authentication model using symmetric-key JWTs and with authorization implemented in terms of label selectors.

  • Convert the journal List RPC to a (mostly) backward-compatible server streaming RPC, with support for watches.

    • This resolves issues in clusters with lots of journals, which bump up against gRPC message limits
    • It also allows an external client to monitor and react in real-time to changes of journal topology for a given List selector.
  • Additionally, label selectors now support generalized "prefix" matches.

    • This extends the special-case handling for the prefix selector for Journals and makes it a first class concept within labels and selectors.
    • The motivation is that prefixes are a fantastic tool for expressing scopes of authorizations to resources, including user-defined resources represented as label sets attached to journals or shards.

Best to review commit-by-commit.

This change is Reviewable

The approach closely parallels the Etcd certificate authorization
model and can support mutual (client and server) certificate
verification.

In server contexts, for $foo (of broker, consumer):

--foo.server-cert-file / FOO_SERVER_CERT_FILE is a path to a cerificate
  for the server to present to clients.

  This option toggles whether TLS is used. If absent, all other TLS
  settings are ignored.

--foo.server-cert-key-file / FOO_SERVER_CERT_KEY_FILE is the corresponding private key.

--foo.server-ca-file / FOO_SERVER_CA_FILE is a trusted certificate authority
  for verification of client certificates.

  If this option is omitted, client certificates are verified if
  presented but are not required.

  If this option is set, client certificates are required and verified.

--foo.peer-cert-file / FOO_PEER_CERT_FILE is a path to a cerificate
  for the server to present to peers in a peer-to-peer client context.

--foo.peer-cert-key-file / FOO_PEER_CERT_KEY_FILE is the corresponding private key.

--foo.peer-ca-file / FOO_PEER_CA_FILE is a trusted certificate authority
  for verification of peer server certificates. If not specified,
  the system CA pool is used instead.

In client contexts, for $foo (of etcd, broker, consumer):

--foo.cert-file / FOO_CERT_FILE is a path to a client cerificate to
  present to the server context.

--foo.cert-key-file / FOO_CERT_KEY_FILE is the corresponding private key.

--foo.trusted-ca-file / FOO_TRUSTED_CA_FILE is a trusted certificate authority
  for verification of server certificates.

Gazette's "dispatcher" GRPC load balancer is also updated to be aware of
TLS or non-TLS contexts, as well as the appropriate dynamic authority which
should be verified against a TLS certificate.

`protocol.NewDispatchedCredentials` should be used with the
`grpc.WithTransportCredentials` DialOption to allow dynamic load
balancing across a mix of TLS (https://) and insecure (http://) hosts.
In uses of Gazette, labels often encode a hierarchical namespace
such as "foo/bar/baz", and for such labels it's desireable to be able to
create a LabelSelector that includes or excludes an entire sub-hierarchy
of potential label values, such as including or excluding any label
that begins with "foo/bar/".

Extend Label to have a Prefix field which may only be set in the context
of a LabelSelector (and is a Validate() error otherwise).

In a LabelSelector context, Prefix instructs selector matching to match
any label value which is prefixed by the given selector label.

Introduce a convention of "my-label:prefix" as a special suffix which
indicates that prefix matching is desired, and update LabelSelector
parsing to round-trip such ":prefix" suffixes.

Update the implementation of the special meta-label "prefix" to be in
terms of a "name:prefix" selector label, and back out the bespoke
implementation that has until-now been used for journal name prefix
matching.
The List RPC was originally a unary API, which has two issues:
* In large deployments, unary responses bump against gRPC maximum messages sizes.
* Unary responses can't represent a long-lived watch of changes.

Often callers want to react to changes in journals, such as reading a
newly created journal or spreading appended messages across the current
set of journals.

client.NewPolledList _somewhat_ enabled this functionality, but is
inefficient and requires selection of a polling interval.

To address both issues, List is updated to become a server-streaming API
while retaining backward comatibility with MOST clients expecting a
unary response, so long as the response size is under one thousand
journals.

The List RPC is also extended with a Watch mode. When enabled, brokers
will continue to stream real-time updates upon every change to the set
of journals which match the selector. This lets clients react instantly
to changes in journal topology.

client.NewPolledList becomes client.NewWatchedList, with a nearly
identical API (the polling interval is removed).

The shard List RPC is not modified by this change, but could follow a
similar path in the future.

Refactor pbx.NewUnroutedHeader, which is a foot-gun because it
obtains a re-entrant lock (which is not supported). Instead, make it the
caller's responsibility to obtain an RLock prior to the call.
Authorization is implemented in terms of LabelSelectors and
Capabilities.

In order for a client to call an API, the client must present an
Authorization with a suitable Claim having:

 - The required Capability, such as READ, LIST, or APPEND.

 - A LabelSelector which scopes the resources visible to the client.

Journals which do not match the authorized Selector are not visible to
the client and are indistinguishable from not existing at all.

Authorizer and Verifier interfaces are introduced which can implement
varying strategies for obtaining and checking Authorization tokens.

AuthJournalClient drives an Authorizer to obtain and attach a suitable
credential prior to starting an RPC. AuthJournalServer uses a Verifier
to verify a caller's claims prior to dispatching a service handler.

A new `auth` module introduces NewKeyedAuth() which returns an
Authorizer and Verifier implemented in terms of pre-shared symmetric
keys. Symmetric keys are a good fit for Gazette given its distributed
nature and the requirement that brokers be able to self-sign
peer-to-peer requests, such as Replicate and proxied Read or Append
RPCs.

Client applications may want to use alternative Authorizer
implementations, such as an authorization server, but that's out of
scope for Gazette proper.
This change closely parallels the corersponding change to the Gazette
broker and uses much of the same infrastructure.

Built-in Shard APIs now expect and verify Authorizations which carry a
set of capabilities and a label selector which scopes the resources
(shards) which are authorized to the caller.

Shards which don't match the claim's selector are not visible to the
client and are indistinguishable from not existing at all.

AuthShardClient drives an Authorizer to obtain and attach a suitable
credential prior to starting an RPC. AuthShardServer uses a Verifier
to verify a caller's claims prior to dispatching a service handler.

Custom and application-specific APIs will likely want to use the
Service.Authorizer and Service.Verifier fields to power their own
authorizing client and server middleware.
Previously we failed to clean these up if NewStore() errored out.
@jgraettinger jgraettinger changed the title Johnny/auth TLS, fine-grain Authorization, and List subscriptions Aug 24, 2024
@jgraettinger jgraettinger marked this pull request as ready for review August 26, 2024 05:06
Copy link
Contributor

@psFried psFried left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks good to me. Noting also that this resolves #353 (IMO the solution here is superior to what's described in the issue).

Port string `long:"port" env:"PORT" description:"Service port for HTTP and gRPC requests. A random port is used if not set. Port may also take the form 'unix:///path/to/socket' to use a Unix Domain Socket"`
ServerCertFile string `long:"server-cert-file" env:"SERVER_CERT_FILE" default:"" description:"Path to the server TLS certificate. This option toggles whether TLS is used. If absent, all other TLS settings are ignored."`
ServerCertKeyFile string `long:"server-cert-key-file" env:"SERVER_CERT_KEY_FILE" default:"" description:"Path to the server TLS private key"`
ServerCAFile string `long:"server-ca-file" env:"SERVER_CA_FILE" default:"" description:"Path to the trusted CA for server verification of client certificates. When present, client certificates are required and verified against this CA. When absent, client certificates are not required but are verified against the system CA pool if presented."`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this option is omitted, client certificates are verified if presented but are not required.

What does verification mean in this scenario? Would the client certificate CA need to be in the system's trusted CA store?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If SERVER_CA_FILE is set, then clients MUST present a certificate and it must validate against SERVER_CA_FILE.

Otherwise, clients MAY present a certificate and if they do, it's verified.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess if you omit it, then it'd verify client certs using the system certs, since there's really no other sane default. And that seems like it's just the default behavior of Go, not something you setup explicitly.

@@ -731,7 +736,7 @@ message Header {
// Journal is the Gazette broker service API for interacting with Journals.
service Journal {
// List Journals, their JournalSpecs and current Routes.
rpc List(ListRequest) returns (ListResponse);
rpc List(ListRequest) returns (stream ListResponse);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd just like to make sure I'm understanding the compatibility story correctly:
Would we expect an older client that's still expecting a unary response to work correctly as long as all the journals can fit into a single ListResponse message? I could imagine grpc being backward compatible in this way, but couldn't quickly find mention of it in their docs.

Copy link
Contributor Author

@jgraettinger jgraettinger Aug 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, I tested this explicitly. An old build of gazctl works fine against the updated List RPC so long as the number of listed journals is less than one thousand.

This works because the gRPC route is the same, and the gRPC frames that are sent from server => client are also the same in this case (a single response with all journals, then EOF).

When Watch is set, then the server instead sends one or more non-empty ListResponse frames followed by an empty ListResponse frame, which represents end-of-snapshot but not end-of-RPC.

The place where it will confuse an old client, is a !Watch request with more than one thousand journals, because the server will send multiple ListResponse with 1<=N<=1000 journals in each, followed by EOF.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool, and in the case where a user has 1000+ journals, the old version isn't going to work anyway

@jgraettinger jgraettinger merged commit 0463356 into master Aug 29, 2024
1 check passed
@jgraettinger jgraettinger deleted the johnny/auth branch August 29, 2024 14:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants