-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add v2 endpoints controller #2883
Conversation
85db7b9
to
6404ce3
Compare
//TODO(zalimeni) Do we need to check service-enable label here on service/deployments/replicasets? | ||
// Should we expect owner is annotated, and if so, is that consistent for non-replicaset workloads? | ||
// If so, update to register and deregister conditionally. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This 🧵 discusses maybe not needing explicit exclusion for services. I'm not 100% certain whether this controller should expect any opt-outs that will impact service registration, outside of disabled k8s namespaces from config, already checked above.
Input welcome! Lacking a strong opinion, I'll probably convert this to a follow-up ticket as a placeholder for a pre-GA decision.
// We have to fetch the owner and check its type because pod names cannot be disambiguated from pod owner names due to | ||
// the `-` delimiter and unique ID parts also being valid name components. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Feel free to correct me if anyone knows better here. 🧵 for context here.
//TODO(zalimeni) Read from Consul or some other state to determine whether a write is needed. | ||
// For now, we blindly write state on each reconcile to simplify implementation. | ||
_, err := resourceClient.Write(ctx, &pbresource.WriteRequest{Resource: serviceResource}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My thinking on this r/n is that we should just try to read the existing Service, and if we find it, check for total equality minus resource version. If there's a diff, rewrite.
If we move to an EndpointSlices
/Pods
-based approach in the future, we'll need to do some sort of checking for stale workload selector values, but since Endpoints
gives us a full snapshot up to the 1K pod limit, we should be able to just upsert today w/o being concerned with drift.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a specific case you were trying to address by taking the diff and rewriting it? There was some discussion of running multiple versions of the controller concurrently in the future, and I wonder if this could create flapping with different workers fighting over the same service.
I think I naively assumed that if the write doesn't error, the resource service has persisted the service. Some future work is planned for later releases to sync the status of Consul resources back to Kubernetes resources. Do you think this would address any of your concerns?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a specific case you were trying to address by taking the diff and rewriting it?
Mainly just avoiding needless writes - TBF I'm not fully up to speed on the resource API and state store changes, but my basic understanding is reads are cheaper, and we are theoretically capped in the 2-3K/s for writes. This seemed like an easy way to avoid needless clobbering, esp. when potentially many reconciles loops attempt to write at once w/o versions and the baked-in CAS retries compete w/ each other.
There was some discussion of running multiple versions of the controller concurrently in the future, and I wonder if this could create flapping with different workers fighting over the same service.
I think the flapping would occur regardless of pre-reads? But that's a good callout. I'd hope we'd come up w/ some one-way version tagging similar or identical to the managed-by
tag if needed to avoid issues from that. (Do you have a link to that conversation by chance?)
I think I naively assumed that if the write doesn't error, the resource service has persisted the service. Some future work is planned for later releases to sync the status of Consul resources back to Kubernetes resources. Do you think this would address any of your concerns?
That's my assumption too 👍🏻 ATM I'm only thinking of debouncing, though bidirectional sync would add complexity to that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it was in a casual Zoom, so I don't happen to have a link to the convo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked 🧵 here to seek wisdom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary from Slack: not entirely conclusive, but we agree early optimization should be generally avoided until we get a better idea of write performance in the wild.
I think it's fair to argue that debouncing rapid identical writes falls under low-hanging fruit to reduce needless load, but is leading me to additional questions about how we'll handle state in resources that may be mutated by multiple sources/data flows (not sure how much of this there will be - could resource Meta
tagging be an example?), since we also can't just blindly rewrite records in that scenario.
Regardless, this is tagged for follow-up outside this PR.
VirtualPort: uint32(p.Port), | ||
TargetPort: p.TargetPort.String(), | ||
Protocol: common.GetPortProtocol(p.AppProtocol), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd appreciate a spot check on this mapping JIC I'm missing some nuance - cc @ishustava in case you have input
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think targetPort
will work in the case it is a number, but the pod defines the service with a name. We will need to look up the corresponding port name in the pod. I'm open to changing how this works on the Pod side but not sure how you get around the name being optional and targetPort
working on either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think targetPort will work in the case it is a number, but the pod defines the service with a name
(This might be wrong or obvious, feel free to correct me) It seems to me from the k8s docs that there's an inherent tradeoff that applies generally: you need to pick whether to use a name consistently across service and pods (which enables mixed pod port numbers under one name), or you need to pick a consistent pod port number and use that in the service. Either way, it seems selected pods need to adhere to the service's chosen int-or-string value for targetPort
.
Our docs for Consul Service.TargetPort
say target_port is the name of the workload port
, but I took that to mean we defer totally to the value used by the k8s service per the typical scheme above.
It seems wrong to me to reach into the pods to infer anything beyond the workload selector, since the targetPort
mapping isn't really a consul-k8s
-specific limitation... But I might be missing a reason we'd want to do that.
I'm open to changing how this works on the Pod side
What did you have in mind here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems wrong to me to reach into the pods to infer anything beyond the workload selector, since the targetPort mapping isn't really a consul-k8s-specific limitation
Yeah, I don't like the idea of reaching in either. The pods may be slightly different, and the names don't align, making the lookup challenging. But I think I disagree that there isn't something K8s-specific here. We generate a Consul port name per port, even if one wasn't defined in the Kube spec, and Kubernetes doesn't necessarily require you to use any name at all, so there is a mismatch of strong (consul) vs. weak (k8s) references. I think I had the opposite reaction that since the VM case is straightforward, we'd have to "fix it" in consul-k8s.
I'm open to changing how this works on the Pod side
No specific ideas here. I'm just trying not to look like I'm suggesting a singular solution.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started 🧵 here to figure out how we want this to work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This conversation was ongoing as of EOD Friday so leaving for resolution outside this PR, but I'll try to keep things linked in Slack so the chain of 🧵s isn't lost from here.
For the short-term, the plan ATM is to use Consul Service TargetPort
strictly as a string name. In the case where the K8s service targetPort
is a number, we'll rely on best-effort inference of the best Consul TargetPort
value based on the name used by the most prevalent endpoint subset for that port (e.g. none, or a specific name that isn't being directly targeted in K8s).
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Outdated
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Outdated
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller_test.go
Outdated
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller_test.go
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Outdated
Show resolved
Hide resolved
6404ce3
to
0780abf
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't get a chance to review the test case yet, but everything else looks great! I wanted to send you my open-ended comments in advance, like how targetPort
works and the appProtocol
.
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller_test.go
Outdated
Show resolved
Hide resolved
//TODO(zalimeni) Read from Consul or some other state to determine whether a write is needed. | ||
// For now, we blindly write state on each reconcile to simplify implementation. | ||
_, err := resourceClient.Write(ctx, &pbresource.WriteRequest{Resource: serviceResource}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was there a specific case you were trying to address by taking the diff and rewriting it? There was some discussion of running multiple versions of the controller concurrently in the future, and I wonder if this could create flapping with different workers fighting over the same service.
I think I naively assumed that if the write doesn't error, the resource service has persisted the service. Some future work is planned for later releases to sync the status of Consul resources back to Kubernetes resources. Do you think this would address any of your concerns?
case "http2": | ||
return pbcatalog.Protocol_PROTOCOL_HTTP2 | ||
case "grpc": | ||
return pbcatalog.Protocol_PROTOCOL_GRPC |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we technically need to include prefixes with some of these protocols. I was reading the API docs for appProtocol and it mentions only IANA names can be unprefixed. Only my first time seeing this so I could be mistaken.
I also think an annotation on the service or pod might be easier for customers to support. e.g. the Bitnami Postgres helmchart exposes annotations for services and certain pods but not a change of appProtocol.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if we technically need to include prefixes with some of these protocols. I was reading the API docs for appProtocol and it mentions only IANA names can be unprefixed. Only my first time seeing this so I could be mistaken.
Yeah, I was wondering the same - we may need to iterate on this part. FWIW, Istio's solution (appProtocol
and detection via port name) use unprefixed values from what I can see, and Open Service Mesh also includes non-standard tcp
and gRPC
. I wonder whether the intent in k8s docs was valid Service Name Syntax rather than a registered name - that seems arbitrarily limiting for new protocols, to me, when the purpose appears to be open-ended extension.
I also think an annotation on the service or pod might be easier for customers to support. e.g. the Bitnami Postgres helmchart exposes annotations for services and certain pods but not a change of appProtocol.
Agreed, it seems like annotations could help ease operator pain here rather than plumbing appProtocol
through templates - though maybe better for us to enforce "standard" k8s concepts and push that concern up the tooling chain rather than accommodate it ourselves? I can start a 🧵 in the multi-port channel and see whether a follow-up makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added this as a bullet in follow-up ticket so we don't lose track but probably don't need to figure out right this second.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Circling back to this one to tie up the loose end: @ishustava, do you happen to have perspective on either of Dan's questions above?
My gut is to leave the unprefixed protocols in place for now since there seems to be precedent in other meshes, and wait to see if we need annotations in addition to just appProtocol
. If that's not ideal, we can fix now or plan to follow up in 1.18.
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Outdated
Show resolved
Hide resolved
VirtualPort: uint32(p.Port), | ||
TargetPort: p.TargetPort.String(), | ||
Protocol: common.GetPortProtocol(p.AppProtocol), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think targetPort
will work in the case it is a number, but the pod defines the service with a name. We will need to look up the corresponding port name in the pod. I'm open to changing how this works on the Pod side but not sure how you get around the name being optional and targetPort
working on either.
0780abf
to
2e50816
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@DanStough @thisisnotashwin ok, I think this is ready for a next pass. I've added more tests around the essential CrUD bits as well as the pod fetching behavior since that seemed valuable to lock in.
Assuming no major changes are needed, I'll plan to merge this to unblock testing while I follow up on remaining TODOs and Slack 🧵s.
//TODO(zalimeni) Read from Consul or some other state to determine whether a write is needed. | ||
// For now, we blindly write state on each reconcile to simplify implementation. | ||
_, err := resourceClient.Write(ctx, &pbresource.WriteRequest{Resource: serviceResource}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Asked 🧵 here to seek wisdom
VirtualPort: uint32(p.Port), | ||
TargetPort: p.TargetPort.String(), | ||
Protocol: common.GetPortProtocol(p.AppProtocol), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Started 🧵 here to figure out how we want this to work
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Outdated
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Outdated
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller_test.go
Show resolved
Hide resolved
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller.go
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think based on all the tradeoffs we discussed to get a version of this ready to test, LGTM.
control-plane/connect-inject/controllers/endpointsv2/endpoints_controller_test.go
Show resolved
Hide resolved
2e50816
to
b6daa27
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BEAUTIFUL!!!
395da7b
to
31fa5c0
Compare
Implement the basic requirements of a new Endpoints controller that registers Services via Consul's V2 API. Further tests and TODOs will be addressed in follow-up changes.
31fa5c0
to
202959d
Compare
Follows pattern of iterative review in #2868, and currently targets that branch due to dependence on common code added there
This PR is part of a series of PRs that add functionality to the V2 Endpoints Controller. For the MVP task to be complete, there will be a few additional changes to resolve TODOs, debounce upsert writes if needed, and add further tests.
Changes proposed in this PR:
How I've tested this PR: Unit tests
How I expect reviewers to test this PR: 👁️🗨️👃🏻👁️🗨️
Checklist: