Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Locality endpoint discovery service (LEDS) #10373

Open
mattklein123 opened this issue Mar 12, 2020 · 28 comments
Open

Locality endpoint discovery service (LEDS) #10373

mattklein123 opened this issue Mar 12, 2020 · 28 comments

Comments

@mattklein123
Copy link
Member

A group of us met this morning to discuss how to move the EGDS issue and PR forward:

The consensus at this point is that we would like to hold on working on and merging EGDS as currently specified and instead develop something that we are tentatively calling IEDS (Individual Endpoint Discovery Service). Note that better name suggestions are appreciated here!

The general idea of IEDS will be to allow a sub-API of EDS to provide individual endpoints within a defined locality/priority. This will avoid quite a bit of complexity around shuffling endpoints within localities/priorities. Additionally, localities/priorities rarely change in a deployment.

The high level flow of the IEDS API will be:

  1. Allow the lb_endpoints to either be specified inline as today or instead pointed at a config source that configures the IEDS endpoint
    repeated LbEndpoint lb_endpoints = 2;
  2. Determine how the IEDS resource name will be populated. Some proposals have included adding a namespace field. I'm not as clear on this part so I will defer to @htuch and @markdroth on what they want to do here. From my perspective it would be OK to have the config source specify a namespace and then have that namespace passed as the resource name in the discovery request:
    repeated string resource_names = 3;
  3. Modify the EDS code to work properly with the sub-API in which we properly do init manager init as well as populate/update the hosts in a locality/priority. This should work correctly with delta/incremental also.

To avoid confusion, note that we are not tackling merging/patching as mentioned here: #8400. That is a much larger project that should not block this effort.

cc @envoyproxy/maintainers @htuch @markdroth @snowp @wgallagher @tomwans @gengleilei @hzxuzhonghu @seflerZ @lambdai @howardjohn @ramaraochavali

@tomwans
Copy link
Contributor

tomwans commented Mar 13, 2020

IEDS can be confused with delta/incremental xDS. Naming suggestion: How about Locality-EDS (LEDS)? I don't have full context, but my understanding is this xDS resource represents a set of endpoints within a locality?

@mattklein123
Copy link
Member Author

I like LEDS as a name option. Let's see if anyone has any other ideas.

@markdroth
Copy link
Contributor

Another naming suggestion would be Named Endpoint Discovery Service (NEDS).

@ramaraochavali
Copy link
Contributor

+1 for LEDS

@seflerZ
Copy link

seflerZ commented Mar 14, 2020 via email

@mattklein123
Copy link
Member Author

Is that mean that we can assign a config source for each endpoint and provide resource name for it?

For each locality/priority. So the LEDS resource is LbEndpoint. The management server could send all of them as SoTW, or it can do incremental updates.

One other thing that occurs to me is that if we implement this correctly, many users would probably end up using static clusters with fixed endpoint.v3.ClusterLoadAssignment load_assignment which would call out to LEDS for each locality.

@seflerZ
Copy link

seflerZ commented Mar 16, 2020 via email

@gengleilei
Copy link
Contributor

@mattklein123 I understand that the LEDS cannot reduce the amount of data transmission between the management server and the Envoy through the LEDS without the realization of delta XDS. In the case of non-delta XDS, when one of the endpoints in the locality changes, the whole data should still be pushed to the Envoy, and the use of LEDS will bring some complexity instead.
As far as I know, the management server cannot support delta XDS in a short time. Currently, I think the practicality of LEDS is not strong under the current situation.

@htuch
Copy link
Member

htuch commented Mar 16, 2020

@gengleilei are you talking about Pilot? LEDS will require delta xDS support, I believe Pilot must have some support here already from the VHDS work, CC @brian-avery @costinm

@howardjohn
Copy link
Contributor

I don't think any of the VHDS work has merged yet. But we shouldn't make Envoy decisions due to limitations in Pilot's implementation. I don't think there is any reason we cannot support delta xDS, we just haven't yet. We can have delta XDS just for LEDS and still use SOTW XDS for ADS, right? If so then it won't be too hard to implement I think.

@mattklein123
Copy link
Member Author

LEDS will require delta xDS support

Pedantically it won't require delta xDS support (e.g. if we implement it right per above we could use it with static clusters and get mostly the same behavior as today), but it will require delta support for the CPU savings that you are looking for.

@htuch
Copy link
Member

htuch commented Mar 16, 2020

@howardjohn yeah, I think having delta xDS (on a distinct connection to the ADS one) just for LEDS would be fine.

@mattklein123 the protocol extension for namespace/attribute that is in the proposal above requires that we add something to the discovery request message. Are you suggesting we would do this for SoTW DiscoveryRequest as well as incremental?

@mattklein123
Copy link
Member Author

Are you suggesting we would do this for SoTW DiscoveryRequest as well as incremental?

I would recommend consistency here since I don't see any reason not to have it?

@htuch
Copy link
Member

htuch commented Mar 16, 2020

@mattklein123 really depends on use case, all things being equal we're pushing more towards putting complexity/advanced stuff in incremental xDS and leaving SoTW simple. It might make sense for LDS in proxyless servers in SoTW though, so probably fine to do it in both places.

@seflerZ
Copy link

seflerZ commented Mar 17, 2020

@mattklein123 Would you mind to paste the IEDS protobuf definition onto this issue?

@markdroth
Copy link
Contributor

Harvey and I chatted about this and agreed to go with the namespace approach. I'll let him propose the specific proto changes.

htuch added a commit to htuch/envoy that referenced this issue Apr 8, 2020
This is an RFC for the introduction of namespaces to xDS. The idea is
that we now have structured resource naming (when opting in) of the form
<authority>/<qualifiers>/<opaque resource name>. The resource name is
present in the ConfigSource, reflected in discovery requests and used to
filter resources in discovery responses by a client.

This feature is backported from the UDPA-TP strawman and is intended to
support envoyproxy#10373, gRPC xDS
resource selection and federation use cases.

Signed-off-by: Harvey Tuch <htuch@google.com>
@mattklein123 mattklein123 changed the title Individual endpoint discovery service (IEDS working name) Locality endpoint discovery service (LEDS) May 17, 2020
@gnz00

This comment has been minimized.

@htuch
Copy link
Member

htuch commented May 20, 2020

@gnz00 the second statement "doesn’t handle the scenario where there is a large number of endpoints with the same locality attribute" is true for SotW LEDS xDS, but for delta LEDS xDS each endpoint in a locality would be a first class xDS resource under this proposal, capable of independent delivery.

On-demand EDS is an interesting thought experiment; what criteria would a fetch be made with? When we have too few available connections? When existing backends are too heavily loaded? For subset LB, when we miss on a subset?

@gnz00

This comment has been minimized.

@htuch
Copy link
Member

htuch commented May 21, 2020

VHDS can return routes with unknown clusters today, but there is no corresponding on-demand CDS implementation. That would be pretty reasonable to add though.

I think it depends somewhat on what dimension of scalability we're talking. If it's O(20k) endpoints and a single cluster/route, then we needs something like LEDS. If it's O(20k) clusters with relatively few hosts, then on-demand CDS is going to be more attractive.

BTW, the point regarding subsetting is well taken. LEDS, on-demand CDS, etc. are for those situations where subsetting doesn't work or the sizes of configs are still too large even after subsetting.

@gnz00

This comment has been minimized.

@mattklein123
Copy link
Member Author

Can we please split the most recent part of this conversation into a new issue? It's not really related to the LEDS proposal.

Building on-demand CDS directly into Envoy is a fine thing to tackle. It's actually possible to build today without any core changes (some people have built FaaS systems this way using Envoy) but no one has done it in OSS.

@htuch
Copy link
Member

htuch commented Dec 17, 2020

The design here should ideally reflect kubernetes/enhancements#2094 (and specifically ideas such as kubernetes/enhancements#2094 (comment)) when understanding endpoint scalability requirements.

@adisuissa
Copy link
Contributor

Here's LEDS design doc.
PTAL, comments are welcome.

@stevenzzzz
Copy link
Contributor

cc @stevenzzzz

@mpuncel
Copy link
Contributor

mpuncel commented Jun 10, 2022

This is a very exciting feature! It looks like work on it has paused?

I also wanted to ask 2 design questions (which should be relevant even if this work isn't picked up again until later).

  1. Should this also support DNS discovery at the locality level as an alternative to xDS? DNS is only cluster-wide right now, so you can't take advantage of locality weights or priorities and use DNS discovery.
  2. Will this have similar consistency characteristics as EDS and RDS do today? Namely, if the EDS record changes, will Envoy require a LEDS update for each locality before it can exit "warming". For more details on this, see xds: Consistency while handling CDS updates and EDS #13009 where I believe the consensus is we want Envoy to retain the most recent sub-resource when the outer resource changes.

@adisuissa
Copy link
Contributor

LEDS integration was done in: #18195

@adisuissa
Copy link
Contributor

I also wanted to ask 2 design questions (which should be relevant even if this work isn't picked up again until later).

  1. Should this also support DNS discovery at the locality level as an alternative to xDS? DNS is only cluster-wide right now, so you can't take advantage of locality weights or priorities and use DNS discovery.

Not sure I fully grok your question, but I think this requires an EDS-cluster have named hosts (DNS), and this is not supported at the moment.

  1. Will this have similar consistency characteristics as EDS and RDS do today? Namely, if the EDS record changes, will Envoy require a LEDS update for each locality before it can exit "warming". For more details on this, see xds: Consistency while handling CDS updates and EDS #13009 where I believe the consensus is we want Envoy to retain the most recent sub-resource when the outer resource changes.

There is warming between EDS->LEDS, and there is no requirement to resend the already sent locality-weights, only the ones that were updated.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests