Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

api: enhance v3 CommonTlsContext for agentless support #11061

Merged
merged 14 commits into from
May 21, 2020

Conversation

sanjaypujare
Copy link
Contributor

Signed-off-by: Sanjay Pujare sanjaypujare@users.noreply.github.com

For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md

Commit Message: api: enhance v3 CommonTlsContext for agentless support
Additional Description: add protos/fields needed to support agentless (without SDS) secret access
Risk Level: low
Testing: pre-submit and do_ci fix format checks
Docs Changes: none
Release Notes: -NA-
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]

@htuch @costinm @markdroth PR for agentless secrets as discussed.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.
CC @envoyproxy/api-watchers: FYI only for changes made to api/.

🐱

Caused by: #11061 was opened by sanjaypujare.

see: more, trace.

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, some comments to get started.
/wait

string secret_name = 2;

// plugin specific config based on the plugin name
google.protobuf.Any typed_config = 3;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When https://github.com/envoyproxy/envoy/pull/10908/files#r420136445 lands, we should switch to this here. This should happen before this PR merges.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, will use TypedExtensionConfig instead of Any once that one is in.

api/envoy/extensions/transport_sockets/tls/v3/tls.proto Outdated Show resolved Hide resolved
api/envoy/extensions/transport_sockets/tls/v3/tls.proto Outdated Show resolved Hide resolved
api/envoy/extensions/transport_sockets/tls/v3/tls.proto Outdated Show resolved Hide resolved
api/envoy/extensions/transport_sockets/tls/v3/tls.proto Outdated Show resolved Hide resolved
@ggreenway
Copy link
Contributor

Can you please elaborate on the description of this change? It sounds like there was a previous conversation related to this, but for people (such as myself) that weren't part of the conversation, it isn't clear what the purpose of this change is.

@sanjaypujare
Copy link
Contributor Author

Can you please elaborate on the description of this change? It sounds like there was a previous conversation related to this, but for people (such as myself) that weren't part of the conversation, it isn't clear what the purpose of this change is.

Sure. gRPC recently added xDS support for load balancing and other routing features. This enables users to create proxyless service meshes. However for transport security (TLS/mTLS etc) the xDS protocol assumes the presence of SDS servers (for example Istio Agent or Node agent). To make the solution agentless as well as proxyless we are considering an SDS-server-less approach where we bypass the SDS server and have a CA-client in gRPC directly to talk to a service mesh CA to acquire certificates. This PR is related to that. Hope that answers your question.

@ggreenway
Copy link
Contributor

Thank you, that's very helpful context.

So is this logically similar to talking to an SDS server, but with a different, previously existing/implemented protocol?

@sanjaypujare
Copy link
Contributor Author

Thank you, that's very helpful context.

So is this logically similar to talking to an SDS server, but with a different, previously existing/implemented protocol?

In case of SDS, the SDS server creates the CSR and returns cert + private key to the client (Envoy) . With the CA client approach, the client (Envoy) creates the CSR so private key is not shared with anybody.


// Config for fetching validation context via CA client plugin.
// [#not-implemented-hide:]
CAClientPluginConfig validation_context_ca_client_plugin = 10;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be inside oneof validation_context_type and be mutually exclusive with SdsSecretConfig validation_context_sds_secret_config = 7;?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Yes it should be moved under oneof validation_context_type. Once moved there it will be mutually exclusive with all the other members in there not just validation_context_sds_secret_config. I assume there is no backward compatibility issue with adding a new member to a "oneof"

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. And maybe the one above (CAClientPluginConfig tls_certificate_ca_client_plugin = 9) also needs to be mutually exclusive with repeated SdsSecretConfig tls_certificate_sds_secret_configs = 6. I'm not very sure though since tls_certificate_sds_secret_configs is a repeated field. Maybe Envoy meant to support fetching tls certificate from multiple sources; in that case the current way is probably ok as it just adds one more source to the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main problem here is backward compatibility: moving an existing field (tls_certificate_sds_secret_configs) to oneof creates backward compatibility issues so we decided against it. However this might be a good candidate for the field_migrate annotations. @htuch do you agree?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just use the oneof promotion annotation for that, document in the interim.

api/envoy/extensions/transport_sockets/tls/v3/tls.proto Outdated Show resolved Hide resolved

// Config for fetching TLS certificates via CA client plugin.
// [#not-implemented-hide:]
CAClientPluginConfig tls_certificate_ca_client_plugin = 9;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to make the different fields for getting the tls_certificates also a oneof? If so, could we mark them with appropriate annotation to promote to oneof as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think I will go ahead and just do that for all oneof candidates.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thinking more about it, I see that they already had oneof validation_context_type to select only one among the 3 existing options (before I added the 4th one). But they didn't do the same for tls_certificates and tls_certificate_sds_secret_configs which could have been put in oneof.

@htuch could you throw some light on this? Should my field_migrate.oneof_promotion annotation include all 3 (tls_certificates, tls_certificate_sds_secret_configs and tls_certificate_ca_client_plugin) or just the last 2 or you think oneof is not warranted here? And the reason for that?

@markdroth
Copy link
Contributor

I think this PR is premature. I don't think we have enough consensus yet on exactly how this should work. I think we need to finalize our design before we add anything to the xDS API.

message CommonTlsContext {
option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.auth.CommonTlsContext";

// Config for CA client plugin to get secrets
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One of the problems with SDS was the verbose config - if you have 5000 services you'll need to duplicate that message for each cluster and listener. It would be ideal to not repeat it :-)

There are discussions about better naming and referencing of resources - could this be expressed as just a reference (the name of the provider) ? For most servers there will be a single 'plugin config' - maybe few, but not one per cluster and listener.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a good point and I have thought about it too but it's not easy to do it in a stateless way. If you have any ideas that do not impose any architectural requirements, let me know.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do others think about creating a new resource type just for this? That way the config is available as a resource and its name can be referenced here. I would prefer not to do it in this PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, let's punt this to another PR. Ideally we motivate any additional API decomposition with concrete use cases.

@htuch
Copy link
Member

htuch commented May 6, 2020

Agreed with @markdroth; can we have a public design document describing how this will work? Thanks.

@sanjaypujare
Copy link
Contributor Author

Agreed with @markdroth; can we have a public design document describing how this will work? Thanks.

Sure, I'll create a design doc and attach to this PR. Hopefully that will help us reach broad consensus.

@mattklein123
Copy link
Member

FWIW I'm definitely in favor of doing this change. I have been thinking about it for a while. It will make it much easier to natively implement ACME (#96) as an extension. We need to think about async reload support for these extensions also, so there is some plumbing overlap with SDS.

@sanjaypujare
Copy link
Contributor Author

FWIW I'm definitely in favor of doing this change. I have been thinking about it for a while. It will make it much easier to natively implement ACME (#96) as an extension. We need to think about async reload support for these extensions also, so there is some plumbing overlap with SDS.

@mattklein123 thanks. Can you explain what you mean by "async reload" support and "plumbing overlap with SDS" in this context? We have used "async reload" at the transport/handshake level where a cert is acquired dynamically and asynchronously for a new (m)TLS connection. But this is implied in xDS via the *TlsContext fields for clusters/listeners and that semantic is not changed by this proposal.

@mattklein123
Copy link
Member

Can you explain what you mean by "async reload" support and "plumbing overlap with SDS" in this context?

Sorry I just meant that we need to make sure that the Envoy side implementation supports async fetch from the extension. This would allow certificates to be fetched over the network, refreshed, etc., much like how SDS works.

Optimally, SDS would be implemented in a future version as an extension, but I think that is out of scope for this set of changes.

@sanjaypujare
Copy link
Contributor Author

Is there a preferred format for a design doc? Is Google doc okay?

@sanjaypujare
Copy link
Contributor Author

sanjaypujare commented May 7, 2020

@ggreenway
Copy link
Contributor

Please make the doc world-readable.

@sanjaypujare
Copy link
Contributor Author

Please make the doc world-readable.

Just did. I had to make a copy to do that. I edited my comment to include the new link.

@mattklein123
Copy link
Member

@sanjaypujare sorry can you please make it world commentable?

@sanjaypujare
Copy link
Contributor Author

@sanjaypujare sorry can you please make it world commentable?

Done.

@ggreenway
Copy link
Contributor

/wait

@sanjaypujare
Copy link
Contributor Author

/wait

I notice this added the "waiting" label - curious what it signifies.

@ggreenway
Copy link
Contributor

/wait

I notice this added the "waiting" label - curious what it signifies.

It allows filtering PRs by label. PRs with the waiting label are waiting for something and so they don't need immediate review.

Pushing a commit automatically clears the label.

In this case, since the current work is happening in the linked design doc, I added the waiting label here.

@sanjaypujare
Copy link
Contributor Author

@htuch and others: I have addressed your comments in the design doc as well as in the PR.

Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
@sanjaypujare sanjaypujare force-pushed the add-agentless-fields branch from 27dc9dd to 71b3508 Compare May 19, 2020 22:12
@sanjaypujare
Copy link
Contributor Author

And I think this PR should be ready to merge? It looks like there is more interest in the edge case so I am looking into it and may add a para or 2 in the design doc.

Yes seems fine to move forward and merge. Can you merge master to fix CI and check for merge conflicts?

@mattklein123 addressed the pending comments and rebased to master. Hopefully all okay now.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM with small PGV comments. @htuch @markdroth any further comments?

/wait

message CertificateProvider {
// opaque name used to specify certificate instances or types. For example, "ROOTCA" to specify
// a root-certificate (validation context) or "TLS" to specify a new tls-certificate.
string name = 1;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be required by validation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes it should be required. Adding that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

// to maintain a single certificate-provider instance. The sharing can happen, for
// example, among multiple clusters or between the tls_certificate and validation_context
// certificate providers of a cluster.
oneof config {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a required validation to the oneof?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@markdroth
Copy link
Contributor

I'm still a little nervous that we haven't fleshed out the details for ACME, but if Matt is confident enough that this API will work for that when the time comes, then I'm happy with it as well.

Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
@mattklein123
Copy link
Member

Can you check CI?

/wait

Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo two tiny docs comments.

SdsSecretConfig validation_context_sds_secret_config = 2
[(validate.rules).message = {required: true}];
// Config for fetching validation context via SDS API. Note SDS API allows certificates to be
// fetched/refreshed over the network "asynchronously" with respect to the TLS handshake.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: why quotes around "asynchronously"? This seems to imply it's not really async?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed quotes

(udpa.annotations.field_migrate).oneof_promotion = "dynamic_validation_context"
];

// Certificate provider for fetching validation context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For documentation purposes, best to spell out the mutex relationship between the SDS and certificate provider. The oneof_promotion annotations are removed during doc generation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
@sanjaypujare
Copy link
Contributor Author

Can you check CI?

/wait

fixed the issue. Hope it can be merged now

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@mattklein123 mattklein123 merged commit 639c226 into envoyproxy:master May 21, 2020
easwars added a commit to easwars/envoy that referenced this pull request Jun 10, 2020
This is a follow up from envoyproxy#11061
which added support for sending arbitrary certificateProvider plugin
configuration.

Signed-off-by: Easwar Swaminathan <easwars@google.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants