-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
api: enhance v3 CommonTlsContext for agentless support #11061
api: enhance v3 CommonTlsContext for agentless support #11061
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, some comments to get started.
/wait
string secret_name = 2; | ||
|
||
// plugin specific config based on the plugin name | ||
google.protobuf.Any typed_config = 3; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When https://github.com/envoyproxy/envoy/pull/10908/files#r420136445 lands, we should switch to this here. This should happen before this PR merges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, will use TypedExtensionConfig
instead of Any
once that one is in.
Can you please elaborate on the description of this change? It sounds like there was a previous conversation related to this, but for people (such as myself) that weren't part of the conversation, it isn't clear what the purpose of this change is. |
Sure. gRPC recently added xDS support for load balancing and other routing features. This enables users to create proxyless service meshes. However for transport security (TLS/mTLS etc) the xDS protocol assumes the presence of SDS servers (for example Istio Agent or Node agent). To make the solution agentless as well as proxyless we are considering an SDS-server-less approach where we bypass the SDS server and have a CA-client in gRPC directly to talk to a service mesh CA to acquire certificates. This PR is related to that. Hope that answers your question. |
Thank you, that's very helpful context. So is this logically similar to talking to an SDS server, but with a different, previously existing/implemented protocol? |
In case of SDS, the SDS server creates the CSR and returns cert + private key to the client (Envoy) . With the CA client approach, the client (Envoy) creates the CSR so private key is not shared with anybody. |
|
||
// Config for fetching validation context via CA client plugin. | ||
// [#not-implemented-hide:] | ||
CAClientPluginConfig validation_context_ca_client_plugin = 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be inside oneof validation_context_type
and be mutually exclusive with SdsSecretConfig validation_context_sds_secret_config = 7;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch. Yes it should be moved under oneof validation_context_type
. Once moved there it will be mutually exclusive with all the other members in there not just validation_context_sds_secret_config
. I assume there is no backward compatibility issue with adding a new member to a "oneof"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool. And maybe the one above (CAClientPluginConfig tls_certificate_ca_client_plugin = 9
) also needs to be mutually exclusive with repeated SdsSecretConfig tls_certificate_sds_secret_configs = 6
. I'm not very sure though since tls_certificate_sds_secret_configs
is a repeated field. Maybe Envoy meant to support fetching tls certificate from multiple sources; in that case the current way is probably ok as it just adds one more source to the list.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main problem here is backward compatibility: moving an existing field (tls_certificate_sds_secret_configs
) to oneof creates backward compatibility issues so we decided against it. However this might be a good candidate for the field_migrate annotations. @htuch do you agree?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just use the oneof promotion annotation for that, document in the interim.
|
||
// Config for fetching TLS certificates via CA client plugin. | ||
// [#not-implemented-hide:] | ||
CAClientPluginConfig tls_certificate_ca_client_plugin = 9; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to make the different fields for getting the tls_certificates
also a oneof
? If so, could we mark them with appropriate annotation to promote to oneof as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think I will go ahead and just do that for all oneof candidates.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking more about it, I see that they already had oneof validation_context_type
to select only one among the 3 existing options (before I added the 4th one). But they didn't do the same for tls_certificates
and tls_certificate_sds_secret_configs
which could have been put in oneof
.
@htuch could you throw some light on this? Should my field_migrate.oneof_promotion
annotation include all 3 (tls_certificates
, tls_certificate_sds_secret_configs
and tls_certificate_ca_client_plugin
) or just the last 2 or you think oneof
is not warranted here? And the reason for that?
I think this PR is premature. I don't think we have enough consensus yet on exactly how this should work. I think we need to finalize our design before we add anything to the xDS API. |
message CommonTlsContext { | ||
option (udpa.annotations.versioning).previous_message_type = "envoy.api.v2.auth.CommonTlsContext"; | ||
|
||
// Config for CA client plugin to get secrets |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One of the problems with SDS was the verbose config - if you have 5000 services you'll need to duplicate that message for each cluster and listener. It would be ideal to not repeat it :-)
There are discussions about better naming and referencing of resources - could this be expressed as just a reference (the name of the provider) ? For most servers there will be a single 'plugin config' - maybe few, but not one per cluster and listener.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point and I have thought about it too but it's not easy to do it in a stateless way. If you have any ideas that do not impose any architectural requirements, let me know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do others think about creating a new resource type just for this? That way the config is available as a resource and its name can be referenced here. I would prefer not to do it in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, let's punt this to another PR. Ideally we motivate any additional API decomposition with concrete use cases.
Agreed with @markdroth; can we have a public design document describing how this will work? Thanks. |
Sure, I'll create a design doc and attach to this PR. Hopefully that will help us reach broad consensus. |
FWIW I'm definitely in favor of doing this change. I have been thinking about it for a while. It will make it much easier to natively implement ACME (#96) as an extension. We need to think about async reload support for these extensions also, so there is some plumbing overlap with SDS. |
@mattklein123 thanks. Can you explain what you mean by "async reload" support and "plumbing overlap with SDS" in this context? We have used "async reload" at the transport/handshake level where a cert is acquired dynamically and asynchronously for a new (m)TLS connection. But this is implied in xDS via the *TlsContext fields for clusters/listeners and that semantic is not changed by this proposal. |
Sorry I just meant that we need to make sure that the Envoy side implementation supports async fetch from the extension. This would allow certificates to be fetched over the network, refreshed, etc., much like how SDS works. Optimally, SDS would be implemented in a future version as an extension, but I think that is out of scope for this set of changes. |
Is there a preferred format for a design doc? Is Google doc okay? |
The requested design doc is here https://docs.google.com/document/d/1A1_QVCrfwgkFY2YxNqbVe_eKt54kkJ6-iQq5EA_QaNc/edit?usp=sharing |
Please make the doc world-readable. |
Just did. I had to make a copy to do that. I edited my comment to include the new link. |
@sanjaypujare sorry can you please make it world commentable? |
Done. |
/wait |
I notice this added the "waiting" label - curious what it signifies. |
It allows filtering PRs by label. PRs with the Pushing a commit automatically clears the label. In this case, since the current work is happening in the linked design doc, I added the waiting label here. |
@htuch and others: I have addressed your comments in the design doc as well as in the PR. |
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
27dc9dd
to
71b3508
Compare
@mattklein123 addressed the pending comments and rebased to master. Hopefully all okay now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM with small PGV comments. @htuch @markdroth any further comments?
/wait
message CertificateProvider { | ||
// opaque name used to specify certificate instances or types. For example, "ROOTCA" to specify | ||
// a root-certificate (validation context) or "TLS" to specify a new tls-certificate. | ||
string name = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be required by validation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes it should be required. Adding that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
// to maintain a single certificate-provider instance. The sharing can happen, for | ||
// example, among multiple clusters or between the tls_certificate and validation_context | ||
// certificate providers of a cluster. | ||
oneof config { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you add a required validation to the oneof?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
I'm still a little nervous that we haven't fleshed out the details for ACME, but if Matt is confident enough that this API will work for that when the time comes, then I'm happy with it as well. |
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
Can you check CI? /wait |
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo two tiny docs comments.
SdsSecretConfig validation_context_sds_secret_config = 2 | ||
[(validate.rules).message = {required: true}]; | ||
// Config for fetching validation context via SDS API. Note SDS API allows certificates to be | ||
// fetched/refreshed over the network "asynchronously" with respect to the TLS handshake. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: why quotes around "asynchronously"? This seems to imply it's not really async?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed quotes
(udpa.annotations.field_migrate).oneof_promotion = "dynamic_validation_context" | ||
]; | ||
|
||
// Certificate provider for fetching validation context |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For documentation purposes, best to spell out the mutex relationship between the SDS and certificate provider. The oneof_promotion
annotations are removed during doc generation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
Signed-off-by: Sanjay Pujare <sanjaypujare@users.noreply.github.com>
fixed the issue. Hope it can be merged now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is a follow up from envoyproxy#11061 which added support for sending arbitrary certificateProvider plugin configuration. Signed-off-by: Easwar Swaminathan <easwars@google.com>
Signed-off-by: Sanjay Pujare sanjaypujare@users.noreply.github.com
For an explanation of how to fill out the fields, please see the relevant section
in PULL_REQUESTS.md
Commit Message: api: enhance v3 CommonTlsContext for agentless support
Additional Description: add protos/fields needed to support agentless (without SDS) secret access
Risk Level: low
Testing: pre-submit and do_ci fix format checks
Docs Changes: none
Release Notes: -NA-
[Optional Runtime guard:]
[Optional Fixes #Issue]
[Optional Deprecated:]
@htuch @costinm @markdroth PR for agentless secrets as discussed.