Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce trafficDistribution field for Kubernetes Services #123487

Merged
merged 7 commits into from
Mar 5, 2024

Conversation

gauravkghildiyal
Copy link
Member

@gauravkghildiyal gauravkghildiyal commented Feb 24, 2024

What type of PR is this?

/kind feature
/kind api-change

What this PR does / why we need it:

This is an initial implementation of the changes discussed in KEP-4444 (new field offering greater control over traffic distribution to Service endpoints)

Special notes for your reviewer:

We were planning on finalizing the names during the implementation. Given no one liked the initial field name (routingPreference), I've choosen a different one from the discussions:

Field name: trafficDistribution
Allowed values: PreferClose (not specifying any value results in default behaviour)

Does this PR introduce a user-facing change?

A new (alpha) field, `trafficDistribution`, has been added to the Service `spec`.
This field provides a way to express preferences for how traffic is distributed to the endpoints for a Service.
It can be enabled through the `ServiceTrafficDistribution` feature gate.

Additional documentation e.g., KEPs (Kubernetes Enhancement Proposals), usage docs, etc.:

- [KEP]: https://github.com/kubernetes/enhancements/tree/master/keps/sig-network/4444-service-traffic-distribution

/sig network
/priority important-soon

@k8s-ci-robot k8s-ci-robot added release-note Denotes a PR that will be considered when it comes time to generate release notes. kind/feature Categorizes issue or PR as related to a new feature. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API sig/network Categorizes an issue or PR as relevant to SIG Network. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. area/code-generation area/kube-proxy sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. labels Feb 24, 2024
@gauravkghildiyal
Copy link
Member Author

/assign @danwinship
/assign @robscott
/assign @thockin

@k8s-triage-robot
Copy link

This PR may require API review.

If so, when the changes are ready, complete the pre-review checklist and request an API review.

Status of requested reviews is tracked in the API Review project.

@sftim
Copy link
Contributor

sftim commented Feb 25, 2024

Changelog suggestion

-A new field, `trafficDistribution`, has been added to the Service `spec`. This field provides a way to express preferences for how traffic is distributed to Service endpoints. It can be enabled through the `ServiceTrafficDistribution` feature gate.
+A new (alpha) field, `trafficDistribution`, has been added to the Service `spec`.
+This field provides a way to express preferences for how traffic is distributed to the endpoints for a Service.
+It can be enabled through the `ServiceTrafficDistribution` feature gate.

func validateServiceTrafficDistribution(service *core.Service) field.ErrorList {
allErrs := field.ErrorList{}

if service.Spec.TrafficDistribution == nil || *service.Spec.TrafficDistribution == "" {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it seems we are not going to use Defaultvalue and use nil as Default, I was thinking that was the best option too, and it is easy to implementn backwards compatibility and skew support

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep that is the case!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re-open.

I'm not sure unspecified makes sense in the new naming, and if it does make sense we should not equate "" with unspecified. "" is a value.

Comment on lines +75 to +77
func endpointReady(endpoint discoveryv1.Endpoint) bool {
return endpoint.Conditions.Ready != nil && *endpoint.Conditions.Ready
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this play with terminating endpoints @robscott ?

The feature is critical for rolling updates, I think is fair to say that users will expect it to keep working once hints are enabled, do we need to consider them for the hints or just only in the proxy implementation?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it is answered later in the unit tests https://github.com/kubernetes/kubernetes/pull/123487/files#r1503894252, it seems externalTrafficPolicy Local disable hints

@@ -50,6 +51,9 @@ type Reconciler struct {
// topologyCache tracks the distribution of Nodes and endpoints across zones
// to enable TopologyAwareHints.
topologyCache *topologycache.TopologyCache
// trafficDistributionEnabled determines if endpointDistribution field is to
// be considered when reconciling EndpointSlice hints.
trafficDistributionEnabled bool
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to make this an option?
if this is going to be a field this will be assumed to be always true and governed by the feature gate until then,

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By "option", did you mean something like reconciler := NewReconciler(other_params, WithTrafficDistributionEnabled())?

  • If not, can you please elaborate? Yes the idea here was for this to be a featuregate equivalent, similar to what topologyCache does for topology hints to some extend...nil topologyCache means feature gate is disabled.
  • If yes, doesn't it needlessly make this more complex for just one option?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

in the future when the feature gate graduates this will be always true, bceause the feature will be removed, so no need to pass a parameter

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is: why is the feature gate passed as an argument to the reconciler? is it because feature module is not available to k8s.io/endpointslice?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the question is: why is the feature gate passed as an argument to the reconciler? is it because feature module is not available to k8s.io/endpointslice?

@khenidak -- Yes that is the intent -- the package is used independently as well (outside the kube-controller-manager binary), which means it's not helpful to explicitly create a dependency from the k8s.io/endpointslice package to the featuregates. Hence the need to pass a flag.

in the future when the feature gate graduates this will be always true, bceause the feature will be removed, so no need to pass a parameter

@aojea -- Thanks I understood the intent and have made the change. My thoughts initially were to eventually remove this additional parameter from NewReconciler when it is not needed -- I was hoping such change to the package's API surface should be one which independent users of this package can easily accommodate. But I agree if we can avoid such things at lower cost, then why not.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changes to public apis are disruptive for users as they conflict when they revendor,

if endpoint.Zone != nil {
zone = *endpoint.Zone
}
slice.Endpoints[i].Hints = &discoveryv1.EndpointHints{ForZones: []discoveryv1.ForZone{{Name: zone}}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/slice.Endpoints[i].Hints/endpoint.Hints/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, are we ok with empty hints?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/slice.Endpoints[i].Hints/endpoint.Hints/

The intent is to modify the value in the slice. Using endpoint.Hints would just modify the copied endpoint (and not the one in the slice), which would be incorrect.

also, are we ok with empty hints?

Yes you are right, we would be okay with that!


// needsUpdate returns true if any ready endpoint in the slice has a
// missing or incorrect hint.
func (preferCloseHeuristic) needsUpdate(slice *discoveryv1.EndpointSlice) bool {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why do we need to separate needsUpdate(..) and update(..)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question! The answer to this is how ReconcileHints is structured:

  • needsUpdate helps us decide when an endpointslice needs update so that we can deepcopy it (and not modify the original)
  • update helps us modify the an EndpointSlice inplace.

There is a case where we DON'T need to deepcopy and just modify the original EndpointSlice (viz. slicesToCreate), so having this separation helps us use them independently.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2024
@gauravkghildiyal
Copy link
Member Author

/unhold Added the "options" construct.

Would need an lgtm again from someone here :)

@gauravkghildiyal
Copy link
Member Author

/unhold

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Mar 4, 2024
@robscott
Copy link
Member

robscott commented Mar 4, 2024

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Mar 4, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: bdec775adc250a85d8f2bbdaa8e1df7c9827bac3

@thockin
Copy link
Member

thockin commented Mar 5, 2024

I interpret us makingPreferClose to mean PreferZone as trying to somewhat balance the risks and meeting half way between

I think we need to find a way to offer PreferNode, and I'd prefer (ha!) not to have 4 different traffic-control knobs.

@pacoxu
Copy link
Member

pacoxu commented Mar 5, 2024

/test pull-kubernetes-node-e2e-containerd

@gauravkghildiyal
Copy link
Member Author

I think we need to find a way to offer PreferNode, and I'd prefer (ha!) not to have 4 different traffic-control knobs.

I think we should be able to use the trafficDistribution field to offer prefer-node-behaviour as well -- I certainly agree that we should/would not need a 4th knob for such a thing.

Assuming if I'm understanding the concern correctly, on the question of whether:

  • PreferClose (which for now offers perfer-zone-behaviour) could in the future mean a prefer-node-behaviour
    OR
  • We would need to introduce a separate PreferNode in addition to existing PreferClose

...this exact thing is something we've delved into in the past at https://github.com/kubernetes/enhancements/pull/4445#discussion_r1475298305 It's a long conversation, but certainly one of the very insightful ones, and worth a re-read.

Pulling in the thoughts from that conversation here for a more structured comparision of the choice we have made (viz. defining PreferClose instead of PreferZone) and how it affects future decisions. I, for one feel like we have made the right choice with using PreferClose (although, I agree, initially I may have been the one advocating using PreferZone)

PreferClose

Pros

  • Having a generic term like PreferClose means that implementations are free to interpret it as they want and offer a certain level of "closeness" which they are comfortable with offering without taking a "huge" amount of risk.

    • For kube-proxy, this would mean that PreferClose=prefer-zone-behaviour is something that we deemed okay
    • For some other implementation, they might be able to offer a better PreferClose=prefer-node-and-waterfall-based-on-load-behaviour while still not making it too risky (maybe that implementation can monitor backend traffic and appropriately use the fallback when appropriate)

    ...The advantage being, that the same PreferClose word is suitable for both the "naive" (kube-proxy) and the "smart" (some-future-implementation) implementation. This encourages standardization where the end-user can use PreferClose across implementations and expect somewhat good results (in the form of improved latency, or cost savings by avoiding cross-zone traffic)

Cons

  • For an implementation like kube-proxy, which initialy delivered PreferClose=prefer-zone-behaviour, it may not be seen appropriate to have a behaviour change and start delivering PreferClose=prefer-node-behaviour (without also adding any kind of load-based-feedback to avoid node-local overload) since this is certainly increasing the risk. In other words, if kube-proxy were to start support prefer-node-behaviour without having any feedback mechanism, a new PreferNode would need to be introduced to the API. This means that the new set of supported values would be [PreferClose, PreferNode] and an argument could be made that the choices have overlapping behaviour and could be seen as confusing.

If we really treat this as a valid-concern, a possible solution here would be an introduction of an implementaiton specific heuristic for kube-proxy to deliver prefer-node-behaviour (something like kube-proxy.io/PreferNode).

PreferZone

Pros

The advantage with only having precise terms like PreferZone is that if in the future kube-proxy wants to offer prefer-node-behaviour, it only has the option of adding a new choice PreferNode to the API and the new possibe set of values [PreferZone, PreferNode] don't have any overlapping behaviour (which may be considered less confusing)

Cons

This choice forces smarter implementations to only be able to deliver prefer-zone-behaviour, even though they could have delivered a a prefer-node-and-waterfall-based-on-load-behaviour. The implementation would either have to:

  • define an "implementation-specific-heuristic" to deliver such a thing (which results in less standardization across implementations)
  • or wait for such a terminology to be introduced to the kubernetes standard API (which could take ~year)

@k8s-ci-robot k8s-ci-robot merged commit a76a3e0 into kubernetes:master Mar 5, 2024
15 checks passed
@k8s-ci-robot k8s-ci-robot added this to the v1.30 milestone Mar 5, 2024
lzhecheng added a commit to lzhecheng/cloud-provider-azure that referenced this pull request Mar 19, 2024
There's a new field called "trafficDistribution" in k/k, whose e2e
test needs 3 zones.

kubernetes/kubernetes#123487

Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>
lzhecheng added a commit to lzhecheng/cloud-provider-azure that referenced this pull request Mar 19, 2024
There's a new field called "trafficDistribution" in k/k, whose e2e
test needs 3 zones.

feature: kubernetes/kubernetes#123487
e2e: kubernetes/kubernetes#123812

Signed-off-by: Zhecheng Li <zhechengli@microsoft.com>
@gyutaeb
Copy link

gyutaeb commented Jun 5, 2024

KEP-4444 link is changed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/code-generation area/kube-proxy cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/api-change Categorizes issue or PR as related to adding, removing, or otherwise changing an API kind/feature Categorizes issue or PR as related to a new feature. lgtm "Looks good to me", indicates that a PR is ready to be merged. priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/apps Categorizes an issue or PR as relevant to SIG Apps. sig/network Categorizes an issue or PR as relevant to SIG Network. size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

None yet