diff --git a/docs/proposals/service-discovery/README.md b/docs/proposals/service-discovery/README.md index 644571f563e8..0c11bf1c82d1 100644 --- a/docs/proposals/service-discovery/README.md +++ b/docs/proposals/service-discovery/README.md @@ -3,13 +3,12 @@ title: Service discovery with native Kubernetes naming and resolution authors: - "@bivas" - "@XiShanYongYe-Chang" +- "@jwcesign" reviewers: - "@RainbowMango" - "@GitHubxsy" - "@Rains6" -- "@jwcesign" - "@chaunceyjiang" -- TBD approvers: - "@RainbowMango" @@ -22,7 +21,7 @@ update-date: 2023-08-19 ## Summary -With the current `serviceImport` controller, when a `ServiceImport` object is reconciled, the derived service is prefixed with `derived-` prefix. +In multi-cluster scenarios, there is a need to access services across clusters. Currently, Karmada support this by creating derived service(with `derived-` prefix, ) in other clusters to access the service. This Proposal propose a method for multi-cluster service discovery using Kubernetes native Service, to modify the current implementation of Karmada's MCS. This approach does not add a `derived-` prefix when accessing services across clusters. @@ -32,12 +31,15 @@ This Proposal propose a method for multi-cluster service discovery using Kuberne This section is for explicitly listing the motivation, goals, and non-goals of this KEP. Describe why the change is important and the benefits to users. --> + Having a `derived-` prefix for `Service` resources seems counterintuitive when thinking about service discovery: - Assuming the pod is exported as the service `foo` - Another pod that wishes to access it on the same cluster will simply call `foo` and Kubernetes will bind to the correct one - If that pod is scheduled to another cluster, the original service discovery will fail as there's no service by the name `foo` - To find the original pod, the other pod is required to know it is in another cluster and use `derived-foo` to work properly +If Karmada supports service discovery using native Kubernetes naming and resolution (without the `derived-` prefix), users can access the service using its original name without needing to modify their code to accommodate services with the `derived-` prefix. + ### Goals - Remove the "derived-" prefix from the service @@ -51,8 +53,8 @@ Having a `derived-` prefix for `Service` resources seems counterintuitive when t Following are flows to support the service import proposal: -1. `Deployment` and `Service` are created on cluster member1 and the `Service` imported to cluster member2 using `ServiceImport` (described below as [user story 1](#story-1)) -2. `Deployment` and `Service` are created on cluster member1 and both propagated to cluster member2. `Service` from cluster member1 is imported to cluster member2 using `ServiceImport` (described below as [user story 2](#story-2)) +1. `Deployment` and `Service` are created on cluster member1 and the `Service` imported to cluster member2 using `MultiClusterService` (described below as [user story 1](#story-1)) +2. `Deployment` and `Service` are created on cluster member1 and both propagated to cluster member2. `Service` from cluster member1 is imported to cluster member2 using `MultiClusterService` (described below as [user story 2](#story-2)) The proposal for this flow is what can be referred to as local-and-remote service discovery. In the process handling, it can be simply distinguished into the following scenarios: @@ -60,16 +62,17 @@ The proposal for this flow is what can be referred to as local-and-remote servic 2. **Local** and **Remote** - Users accessing the `foo` service will reach either member1 or member2 3. **Remote** only - in case there's a local service by the name `foo` Karmada will remove the local `EndPointSlice` and will create an `EndPointSlice` pointing to the other cluster (e.g. instead of resolving member2 cluster is will reach member1) -Based on the above three scenarios, we have proposed two strategies: +Based on the above three scenarios, we think there is two reasonable strategies(Users can utilize PP to propagate the Service and implement the `Local` scenario, it's not necessary to implement it with `MultiClusterService`): - **RemoteAndLocal** - When accessing Service, the traffic will be evenly distributed between the local cluster and remote cluster's Service. - **LocalFirst** - When accessing Services, if the local cluster Service can provide services, it will directly access the Service of the local cluster. If a failure occurs in the Service on the local cluster, it will access the Service on remote clusters. > Note: How can we detect the failure? > Maybe we need to watch the EndpointSlices resources of the relevant Services in the member cluster. If the EndpointSlices resource becomes non-existent or the statue become not ready, we need to synchronize it with other clusters. -> As for the specific implementation of the LocalFirst strategy, we can iterate on it subsequently. -This proposal suggests using the [MultiClusterService API](https://github.com/karmada-io/karmada/blob/24bb5829500658dd1caeea16eeace8252bcef682/pkg/apis/networking/v1alpha1/service_types.go#L30) to enable cross-cluster service discovery. To avoid conflicts with the previously provided [prefixed cross-cluster service discovery](./../networking/multiclusterservice.md#cross-clusters-service-discovery), we can add an annotation on the MultiClusterService API with the key `discovery.karmada.io/strategy`, its value can be either `RemoteAndLocal` or `LocalFirst`. +This proposal suggests using the [MultiClusterService API](https://github.com/karmada-io/karmada/blob/24bb5829500658dd1caeea16eeace8252bcef682/pkg/apis/networking/v1alpha1/service_types.go#L30) to enable cross-cluster service discovery. + +This proposal focuses on the `RemoteAndLocal` strategy, and we will subsequently iterate on the `LocalFirst` strategy. ### User Stories (Optional) @@ -82,27 +85,16 @@ bogged down. #### Story 1 -As a Kubernetes cluster member, -I want to access a service from another cluster member, -So that I can communicate with the service using its original name. - -**Background**: The Service named `foo` is created on cluster member1 and imported to cluster member2 using `ServiceImport`. +As a user of a Kubernetes cluster, I want to be able to access a service whose corresponding pods are located in another cluster. I hope to communicate with the service using its original name. **Scenario**: 1. Given that the `Service` named `foo` exists on cluster member1 -2. And the `ServiceImport` resource is created on cluster member2, specifying the import of `foo` -3. When I try to access the service inside member2 -4. Then I can access the service using the name `foo.myspace.svc.cluster.local` +1. When I try to access the service inside member2, I can access the service using the name `foo.myspace.svc.cluster.local` #### Story 2 -As a Kubernetes cluster member, -I want to handle conflicts when importing a service from another cluster member, -So that I can access the service without collisions and maintain high availability. - -**Background**: The Service named `foo` is created on cluster member1 and has a conflict when attempting to import to cluster member2. -Conflict refers to the situation where there is already a `Service` `foo` existing on the cluster (e.g. propagated with `PropagationPolicy`), but we still need to import `Service` `foo` from other clusters onto this cluster (using `ServiceImport`) +As a user of a Kubernetes cluster, I want to access a service that has pods located in both this cluster and another. I expect to communicate with the service using its original name, and have the requests routed to the appropriate pods across clusters. **Scenario**: @@ -123,7 +115,7 @@ This might be a good place to talk about core concepts and how they relate. ### Risks and Mitigations -Adding a `Service` that resolve to a remote cluster will add a network latency of communication between clusters. + +Adding a `Service` that resolve to a remote cluster will add a network latency of communication between clusters. ## Design Details @@ -144,118 +137,196 @@ proposal will be implemented, this is the place to discuss them. ### API changes -Add an annotation on the MultiClusterService API with the key `discovery.karmada.io/strategy`, its value can be either `RemoteAndLocal` or `LocalFirst`. +This proposal proposes two new fields `ServiceProvisionClusters` and `ServiceConsumptionClusters` in `MultiClusterService` API. +```go + +type MultiClusterServiceSpec struct { + ... + + // ServiceProvisionClusters specifies the clusters which will provision the service backend. + // If leave it empty, we will collect the backend endpoints from all clusters and sync + // them to the ServiceConsumptionClusters. + // +optional + ServiceProvisionClusters []string `json:"serviceProvisionClusters,omitempty"` + + // ServiceConsumptionClusters specifies the clusters where the service will be exposed, for clients. + // If leave it empty, the service will be exposed to all clusters. + // +optional + ServiceConsumptionClusters []string `json:"serviceConsumptionClusters,omitempty"` +} + +``` + +With this API, we will: +* Use `ServiceProvisionClusters` to specify the member clusters which will provision the service backend, if leave it empty, we will collect the backend endpoints from all clusters and sync them to the `ServiceConsumptionClusters`. +* Use `ServiceConsumptionClusters` to specify the clusters where the service will be exposed. If leave it empty, the service will be exposed to all clusters. + +For example, if we want access `foo`` service which are localted in member2 from member3 , we can use the following yaml: ```yaml +apiVersion: v1 +kind: Service +metadata: + name: foo +spec: + ports: + - port: 80 + targetPort: 8080 + selector: + app: foo +--- apiVersion: networking.karmada.io/v1alpha1 kind: MultiClusterService metadata: name: foo - annotation: - discovery.karmada.io/strategy: RemoteAndLocal spec: types: - CrossCluster - range: - clusterNames: - - member2 + serviceProvisionClusters: + - member2 + serviceConsumptionClusters: + - member3 +``` + +### Implementation workflow + +#### Service propagation + +The process of propagating Service from Karmada control plane to member clusters is as follows: + +
+ +
+ +1. `multiclusterservice` controller will list&watch `Service` and `MultiClusterService` resources from Karmada control plane. +1. Once there is same name `MultiClusterService` and `Service`, `multiclusterservice` will create the Work(corresponding to `Service`), the target cluster namespace is all the clusters in filed `spec.serviceProvisionClusters` and `spec.serviceConsumptionClusters`. +1. The Work will be synchronized with the member clusters. After synchronization, `EndpointSlice` will be created in member clusters. + +#### `EndpointSlice` synchronization + +The process of synchronizing `EndpointSlice` from `ServiceProvisionClusters` to `ServiceConsumptionClusters` is as follows: + ++ +
+ +1. `endpointsliceCollect` controller will list&watch `MultiClusterService`. +1. `endpointsliceCollect` controller will build the informer to list&watch the target service's EndpointSlice from `ServiceProvisionClusters`. +1. `endpointsliceCollect` controller will create the corresponding Work for each `EndpointSlice` in the cluster namespace. + When creating the Work, in order to delete the corresponding work when `MultiClusterService` deletion, we should add following labels: + * `endpointslice.karmada.io/name`: the service name of the original `EndpointSlice`. + * `endpointslice.karmada.io/namespace`: the service namespace of the original `EndpointSlice`. + ++ +
+ +1. `endpointsliceDispatch` controller will list&watch `MultiClusterService`. +1. `endpointsliceDispatch` controller will list&watch `EndpointSlice` from `MultiClusterService`'s `spec.serviceProvisionClusters`. +1. `endpointsliceDispatch` controller will creat the corresponding Work for each `EndpointSlice` in the cluster namespace of `MultiClusterService`'s `spec.serviceConsumptionClusters`. + When creating the Work, in order to facilitate problem investigation, we should add following annotation to record the original `EndpointSlice` information: + * `endpointslice.karmada.io/work-provision-cluster`: the cluster name of the original `EndpointSlice`. + Also, we should add the following annotation to the syned `EndpointSlice` record the original information: + * `endpointslice.karmada.io/endpointslice-generation`: the resoruce generation of the `EndpointSlice`, it could be used to check whether the `EndpointSlice` is the newest version. + * `endpointslice.karmada.io/provision-cluster`: the cluster location of the original `EndpointSlice`. +1. Karmada will sync the `EndpointSlice`'s work to the member clusters. + +But, there is one point to note that, assume I have following configuration: +```yaml +apiVersion: v1 +kind: Service +metadata: + name: foo +spec: + ports: + - port: 80 + targetPort: 8080 + selector: + app: foo +--- +apiVersion: networking.karmada.io/v1alpha1 +kind: MultiClusterService +metadata: + name: foo +spec: + types: + - CrossCluster + serviceProvisionClusters: + - member1 + - member2 + serviceConsumptionClusters: + - member2 ``` -The optimization design for the MultiClusterService API needs to be further iterated and improved, such as fixing the annotation `discovery.karmada.io/strategy` in the spec. - -### General Idea - -Before delving into the specific design details, let's first take a look from the user's perspective at what preparations they need to make. - -First, the user creates a foo Deployment and Service on the Karmada control panel, and creates a PropagationPolicy to distribute them into the member cluster `member1`. - -![image](statics/user-operation-01.png) - -Second, the user creates an MCS object on the Karmada control plane to enable cross-cluster service `foo`. In this way, the service on cluster `member2` can access the foo Service on cluster member1. - -![image](statics/user-operation-02.png) - -Then, present our specific plan design. - -1. When the `mcs-controller` detects that a user has created a `MultiClusterService` object, it will create a `ServiceExport` object in the Karmada control plane and propagates it to the source clusters. This process involves two issues. -- How are source clusters determined? -- How to propagate the `ServiceExport` object? - -Detailed explanations are given below: - -- There are two ways of thinking about the first question: - - We can determine which clusters the target service was propagated to by looking up the `ResourceBinding` associated with the target service, which are the source clusters. - - Alternatively, we can just treat all clusters as source clusters. This creates some redundancies, but they can be eliminated in subsequent iterations. -- There are four ways we can use to propagate `ServiceExport` to member clusters: - - Propagated by specifying a `PropagationPolicy`, specifying the source clusters in the `.spec.placement.clusterAffinity.clusterNames` field of the `PropagationPolicy`. - - pros: - - Ability to reuse previous code to a greater extent. - - cons: - - `PropagationPolicy` is a user-oriented API that has impact on user perception. - - In order to get real-time source clusters information, controller need to watch the `ResourceBinding` object. This drawback no longer exists for the direct way of treating all clusters as source cluster. - - Propagated by specifying a `ResourceBinding`, specify the source clusters in the `.spec.clusters` field of the `ResourceBinding` - - pros: - - Ability to reuse previous code to a greater extent. - - cons: - - In order to get real-time source clusters information, controller need to watch the `ResourceBinding` object. This drawback no longer exists for the direct way of treating all clusters as source cluster. - - Propagated by specifying a set of `Work`s in the namespaces that correspond to the source clusters. - - pros: - - Clear implementation logic. - - cons: - - In order to get real-time source clusters information, controller need to watch the `ResourceBinding` object. This drawback no longer exists for the direct way of treating all clusters as source cluster. - - Less reuse of code logic, `Work` object needs to be created one by one. - - Modify the `.spec.propagateDeps` field of the ResourceBinding associated with the target `Service` object to true, enable the dependency distributor capability, and add the `ServiceExport` resource to the [InterpretDependency](https://karmada.io/docs/next/userguide/globalview/customizing-resource-interpreter/#interpretdependency) resource interpreter of the `Service` resource. - - pros: - - Code logic reuse is large, do not need to watch ResourceBinding resource changes. - - cons: - - The controller need to enable the dependency distributor capability of the target `Service` object and maintain it. - -Taken together, we can propagate `ServiceExport` to all clusters with the help of `ResourceBinding`. - -![image](statics/design-01.png) - -2. Depending on the existing MCS atomic capabilities, the `serviceExport` controller and `endpointSlice` controller will collect the `EndpointSlices` related to `foo` Service into the Karmada control plane. - -![image](statics/design-02.png) - -3. The `mcs-controller` controller propagates `Service` and `EndpointSlice` objects from karmada control-plane to the destination clusters and over-watch synchronizes their changes. Again, this process requires consideration of two issues. -- How are destination clusters determined? -- How to propagate the `Service` and `EndpointSlice` object? - -> Note: In this scenario, we haven't used the `ServiceImport` object yet, so we don't need to propagate it to the destination clusters. - -Detailed explanations are given below: - -- We can get the destination clusters from the `.spec.range` field of the `MultiClusterService` object. One thing to consider, however, is that the resources to be propagated may already exist in the destination clusters. - - If there is a Service existing on the target cluster, there is no need to resynchronize the EndpointSlices exported from this cluster to the cluster. Only synchronize the EndpointSlices received from other clusters. - - If there is no Service on the target cluster, both the Service and the EndpointSlices collected from other clusters need to be synchronized to that cluster. -- There are three ways we can use to propagate `Service` and `EndpointSlice` to the destination clusters: - - Propagated the `Service` and `EndpointSlice` resources by specifying the respective `ResourceBinding`, specify the source clusters in the `.spec.clusters` field of the `ResourceBinding` - - pros: - - Ability to reuse previous code to a greater extent. - - cons: - - Since the `Service` object has already been propagated to the source clusters by the user, we need to create a new `ResourceBinding` object to propagate it to the destination clusters. - - Propagated the `Service` and `EndpointSlice` resources by specifying the respective set of `Work`s in the namespaces that correspond to the source clusters. - - pros: - - Clear implementation logic. - - cons: - - Less reuse of code logic, `Work` object needs to be created one by one. - - Modify the `.spec.propagateDeps` field of the ResourceBinding associated with the target `Service` object to true, enable the dependency distributor capability, and add the `EndpointSlice` resource to the [InterpretDependency](https://karmada.io/docs/next/userguide/globalview/customizing-resource-interpreter/#interpretdependency) resource interpreter of the `Service` resource. - - pros: - - Code logic reuse is large, do not need to watch ResourceBinding resource changes. - - cons: - - The controller need to enable the dependency distributor capability of the target `Service` object and maintain it. - - Since the `Service` object has already been propagated to the source clusters by the user, we need to create a new `ResourceBinding` object to propagate it to the destination clusters. - -We need to choose a way, or provide new ideas, to accomplish the propagation of `Service` and `EndpointSlice` resources. - -Taken together, we can propagate `Service` and `EndpointSlice` to the destination clusters with the help of `ResourceBinding`. - -![image](statics/design-03.png) - -At this point, the entire process is complete, and `foo` Service can now be accessed across clusters. - -![image](statics/access.png) +When create the corresponding Work, Karmada should only sync the exists `EndpointSlice` in `member1` to `member2`. + +### Components change + +#### karmada-controller + +* Add `multiclusterservice` controller to support reconcile `MultiClusterService` and Clusters, including creation/deletion/updating. +* Add `endpointsliceCollect` controller to support reconcile `MultiClusterService` and Clusters, collect `EndpointSlice` from `ServerClusters` as work. +* Add `endpointsliceDispatch` controller to support reconcile `MultiClusterService` and Clusters, dispatch `EndpointSlice` work from `serviceProvisionClusters` to `serviceConsumptionClusters`. + +### Status Record + +We should have following Condition in `MultiClusterService`: +```go + MCSServiceAppliedConditionType = "ServiceApplied" + + MCSEndpointSliceCollectedCondtionType = "EndpointSliceCollected" + + MCSEndpointSliceAppliedCondtionType = "EndpointSliceApplied" +``` + +`MCSServiceAppliedConditionType` is used to record the status of `Service` propagation, for example: +```yaml +status: + conditions: + - lastTransitionTime: "2023-11-20T02:30:49Z" + message: Service is propagated to target clusters. + reason: ServiceAppliedSuccess + status: "True" + type: ServiceApplied +``` + +`MCSEndpointSliceCollectedConditionType` is used to record the status of `EndpointSlice` collection, for example: +```yaml +status: + conditions: + - lastTransitionTime: "2023-11-20T02:30:49Z" + message: Failed to list&watch EndpointSlice in member3. + reason: EndpointSliceCollectedFailed + status: "False" + type: EndpointSliceCollected +``` + +`MCSEndpointSliceAppliedConditionType` is used to record the status of `EndpointSlice` synchronization, for example: +```yaml +status: + conditions: + - lastTransitionTime: "2023-11-20T02:30:49Z" + message: EndpointSlices are propagated to target clusters. + reason: EndpointSliceAppliedSuccess + status: "True" + type: EndpointSliceApplied +``` + +### Metrics Record + +For better monitoring, we should have following metrics: + +* `mcs_sync_svc_duration_seconds` - The duration of syncing `Service` from Karmada control plane to member clusters. +* `mcs_sync_eps_duration_seconds` - The time it takes from detecting the EndpointSlice to creating/updating the corresponding Work in a specific namespace. + +### Development Plan + +* API definition, including API files, CRD files, and generated code. (1d) +* For `multiclusterservice` controller, List&watch mcs and service, reconcile the work in execution namespace. (5d) +* For `multiclusterservice` controller, List&watch cluster creation/deletion, reconcile the work in corresponding cluster execution namespace. (10) +* For `endpointsliceCollect` controller, List&watch mcs, collect the corresponding EndpointSlice from `serviceProvisionClusters`, and `endpointsliceDispatch` controller should sync the corresponding Work. (5d) +* For `endpointsliceCollect` controller, List&watch cluster creation/deletion, reconcile the EndpointSlice's work in corresponding cluster execution namespace. (10d) +* If cluster gets unhealth, mcs-eps-controller should delete the EndpointSlice from all the cluster execution namespace. (5d) ### Test Plan diff --git a/docs/proposals/service-discovery/statics/access.png b/docs/proposals/service-discovery/statics/access.png deleted file mode 100644 index 5e7cd1853bab..000000000000 Binary files a/docs/proposals/service-discovery/statics/access.png and /dev/null differ diff --git a/docs/proposals/service-discovery/statics/design-01.png b/docs/proposals/service-discovery/statics/design-01.png deleted file mode 100644 index f6247e7dfd69..000000000000 Binary files a/docs/proposals/service-discovery/statics/design-01.png and /dev/null differ diff --git a/docs/proposals/service-discovery/statics/design-02.png b/docs/proposals/service-discovery/statics/design-02.png deleted file mode 100644 index 53a05529564c..000000000000 Binary files a/docs/proposals/service-discovery/statics/design-02.png and /dev/null differ diff --git a/docs/proposals/service-discovery/statics/design-03.png b/docs/proposals/service-discovery/statics/design-03.png deleted file mode 100644 index ba9194c2273c..000000000000 Binary files a/docs/proposals/service-discovery/statics/design-03.png and /dev/null differ diff --git a/docs/proposals/service-discovery/statics/mcs-eps-collect.png b/docs/proposals/service-discovery/statics/mcs-eps-collect.png new file mode 100644 index 000000000000..075cbc3c8c6c Binary files /dev/null and b/docs/proposals/service-discovery/statics/mcs-eps-collect.png differ diff --git a/docs/proposals/service-discovery/statics/mcs-eps-sync.png b/docs/proposals/service-discovery/statics/mcs-eps-sync.png new file mode 100644 index 000000000000..388e7413c163 Binary files /dev/null and b/docs/proposals/service-discovery/statics/mcs-eps-sync.png differ diff --git a/docs/proposals/service-discovery/statics/mcs-svc-sync.png b/docs/proposals/service-discovery/statics/mcs-svc-sync.png new file mode 100644 index 000000000000..6b86e7f7bb23 Binary files /dev/null and b/docs/proposals/service-discovery/statics/mcs-svc-sync.png differ diff --git a/docs/proposals/service-discovery/statics/user-operation-01.png b/docs/proposals/service-discovery/statics/user-operation-01.png deleted file mode 100644 index 7e445a7ffcac..000000000000 Binary files a/docs/proposals/service-discovery/statics/user-operation-01.png and /dev/null differ diff --git a/docs/proposals/service-discovery/statics/user-operation-02.png b/docs/proposals/service-discovery/statics/user-operation-02.png deleted file mode 100644 index d2b63ab821be..000000000000 Binary files a/docs/proposals/service-discovery/statics/user-operation-02.png and /dev/null differ