Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the EGDS protocol #10023

Closed
wants to merge 11 commits into from

Conversation

gengleilei
Copy link
Contributor

Signed-off-by: leilei.gll leilei.gll@alibaba-inc.com
Signed-off-by: jingyi.lyq jingyi.lyq@alibaba-inc.com

Description: Implement the EGDS protocol
Risk Level: Medium
Testing: unit test, integration
Release Notes: added support for the EGDS protocol
[Optional Fixes #Issue] #9455
[Optional Deprecated:]

the draft design document is available at below link.
The draft design can be seen here.

The design file for EGDS in Pilot is here.

@repokitteh-read-only
Copy link

CC @envoyproxy/api-shepherds: Your approval is needed for changes made to api/.

🐱

Caused by: #10023 was opened by gengleilei.

see: more, trace.

@gengleilei gengleilei requested a review from lizan February 12, 2020 09:59
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: jingyi.lyq <jingyi.lyq@alibaba-inc.com>
parent_.group_.value().name(), new_hosts.size(), host_set.mutableHosts().size());
HostVector hosts_added;
HostVector hosts_removed;
const bool hosts_updated = parent_.parent_.cluster_.updateDynamicHostList(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe this implies that each group update contains the full list of endpoints for this priority, which would limit the usefulness of groups to only define disjoint priorities (priorities cannot span groups). Is this correct/intentional? If not, how are changes to the same priority defined by multiple groups handled?

I made a comment on the design doc about this as well, can we make sure that we have clarity on the exact invariants imposed by the groups? I think it would be helpful to iron out these details before getting this PR reviewed

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Multiple groups that can define the same priority, can also be different, because activeEndpointGroupMonitor just calls the updateDynamicHostList function to compute the updated host (get added, deleted hosts), EdsCluster then uses this data to update its priority_set and call the original updateClusterPrioritySet function to complete the update.

Yes, it is necessary to discuss those details. I have replied in the design document.

Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
}

// New xDS payload. The size of endpoints is fully arbitrary.
// Also the Egds requests to Pilot can be merged into one to boost efficiency
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably shouldn't mention Pilot in Envoy API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, I deleted it

Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Copy link
Contributor

@ramaraochavali ramaraochavali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left some api comments/questions

service EndpointGroupDiscoveryService {
option (envoy.annotations.resource).type = "envoy.api.v2.EndpointGroup";

// The resource_names field in DiscoveryRequest specifies a list of clusters
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: /s/clusters/endpoint groups

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@@ -114,4 +115,28 @@ message ClusterLoadAssignment {

// Load balancing policy settings.
Policy policy = 4;

// References endpoint groups by name.
repeated Egds endpoint_groups = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can a ClusterLoadAssignment send both Endpoints and EndpointGroups? Or either of them? If either of them we should probably use oneof ? I am assuming this to work similar to how Rds and route_config work for Http Connection Manager.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it does impact cluster warming and Envoy init. For EDS type of cluster, a cluster is considered warmed, if it gets at least one Endpoint Response. With this change, we should consider a cluster is fully warmed, only after receiving responses to all endpoint groups?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ClusterLoadAssignment does not recommend sending Endpoints and EndpointGroups together, and while the code implementation can be done, it can cause confusion when applied.

If EGDS is used, I think it is necessary to initialize the cluster after receiving the response from all the endpoint groups. Otherwise, how should the envoy handle the initialization state of the cluster ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since named_points is there, we probably should not move to union { endpoints, named_endpoints, endpoint_groups}.
That says we need extra document/comment here to describe the two fields

}

// New xDS payload. The size of endpoints is fully arbitrary.
message EndpointGroup {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Can you correct the comment above and explain what EndpointGroup is?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, it's been modified.


// Map of named endpoints that can be referenced in LocalityLbEndpoints.
// [#not-implemented-hide:]
map<string, endpoint.Endpoint> named_endpoints = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This field is not currently in use, but I think it should be retained because the endpoint group just groups the endpoints and the original definition should not be changed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's a great chance to declare endpoints and named_endpoints as union.
protobuf doesn't support a field both union and repeated but it's viable to embed endpoints in another message StructEndpoints, and make

  union {
    StructEndpoints                          endpoints = 1
    map<string, endpoint.Endpoint> named_endpoints = 2;
  }
`
WDYT?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


// The EndpointGroup request configuration.
message Egds {
// Configuration for the source of EndpointGroup updates for this Cluster.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: /s/Cluster/Endpoint Group

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Signed-off-by: leilei.gll <leilei.gll@alibaba-inc.com>
Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

flush thoughts 1/N

@@ -114,4 +115,28 @@ message ClusterLoadAssignment {

// Load balancing policy settings.
Policy policy = 4;

// References endpoint groups by name.
repeated Egds endpoint_groups = 6;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since named_points is there, we probably should not move to union { endpoints, named_endpoints, endpoint_groups}.
That says we need extra document/comment here to describe the two fields


// Map of named endpoints that can be referenced in LocalityLbEndpoints.
// [#not-implemented-hide:]
map<string, endpoint.Endpoint> named_endpoints = 3;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO it's a great chance to declare endpoints and named_endpoints as union.
protobuf doesn't support a field both union and repeated but it's viable to embed endpoints in another message StructEndpoints, and make

  union {
    StructEndpoints                          endpoints = 1
    map<string, endpoint.Endpoint> named_endpoints = 2;
  }
`
WDYT?


// Start update the hosts changes in the Endpoint Group to the cluster store.
const auto& host_set = priority_set_.getOrCreateHostSet(priority, overprovisioning_factor);
HostVectorSharedPtr current_hosts_copy(new HostVector(host_set.hosts()));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC one of the goals is to scale hosts to 10x. Not sure if the vector is the best container here


// Update the added host.
current_hosts_copy->insert(current_hosts_copy->end(), hosts_added.begin(), hosts_added.end());
for (auto it = hosts_added.begin(); it != hosts_added.end(); ++it) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: for ( const auto& host : hosts_added)

// Update the added locality weights.
for (const auto& pair : new_locality_weights_map) {
if (!locality_weights_map_[priority].count(pair.first)) {
locality_weights_map_[priority].insert({pair.first, pair.second});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mark. Lost in this structure

}

ASSERT(std::all_of(current_hosts_copy->begin(), current_hosts_copy->end(),
[&](const auto& host) { return host->priority() == priority; }));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: capture only the needed priority

@mattklein123
Copy link
Member

mattklein123 commented Feb 26, 2020

Hey folks,

Before we continue on the in-depth review I would like to provide a few of my own thoughts here after reading this PR as well as the spec and talking with @snowp.

My general feeling is that the EGDS proposal is solving a very specific use case (management server driven slices) and adds a substantial amount of complexity to Envoy, while also adding complexity to the management server at the same time. It's not clear to me that this is the right tradeoff, as we have already identified a lot of corner cases (priority, etc.) that might not be handled by this code.

After thinking about this, my personal opinion is that instead we should invest in a true delta EDS host solution. The way I envision this working is that the EDS response can either contain an inline repeated LocalityLbEndpoints list OR a config source that points to a new delta API that can stream host changes (adds/removes/updates).

I also envision us modifying the delta API to allow for a keyframe / i-frame approach to delivering delta updates that would allow the management server to resync the client as it wishes. This would help management server complexity come down quite a lot in many cases (obviously this can be done orthogonally).

I also think that longer term we can look at implementing a lot of this more complex delta / subset logic directly inside https://github.com/envoyproxy/xds-relay once that gets going.

I suggest we pause this review for now and potentially either take this to the design doc and/or discuss this in the next Envoy community meeting? Thoughts? cc @envoyproxy/maintainers @envoyproxy/api-shepherds @wgallagher @lambdai @tomwans @gengleilei jessicayuen

Copy link
Contributor

@lambdai lambdai left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EdsClusterImpl seems straight though I am not familiar with the existing EdsCluster requirements

EgdsApiImpl(EndpointGroupsManager& endpoint_group_mananger,
ProtobufMessage::ValidationVisitor& validation_visitor,
Config::SubscriptionFactory& subscription_factory, Stats::Scope& scope,
const Protobuf::RepeatedPtrField<envoy::config::endpoint::v3::Egds>& egds_list);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about changing the last parameter to absl::Span<envoy::config::endpoint::v3::Egds* const> egds_list ?
Not couple the constructor with a RepeatedPtrField

Comment on lines +28 to +32
if (config_source_map.count(egds.config_source())) {
config_source_map[egds.config_source()].insert(egds.endpoint_group_name());
} else {
config_source_map[egds.config_source()] = {egds.endpoint_group_name()};
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, isn't equivalent to config_source_map[egds.config_source()].insert(egds.endpoint_group_name())?

Comment on lines +103 to +105
EgdsApiPtr egds_api_;
EgdsClusterMapperPtr egds_cluster_mapper_;
EndpointGroupsManagerPtr endpoint_group_manager_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These members plus the subscription factory and cluster manager are introduced to tackle the endpoint_groups but add very little value to the existing EdsClusterImpl. A strong signal is that these are rarely used except the inialization.
My impression is that these can be extracted.

virtual bool clearEndpointGroup(absl::string_view name, absl::string_view version_info) PURE;
virtual bool
batchUpdateEndpointGroup(const std::vector<envoy::config::endpoint::v3::EndpointGroup>& added,
const std::vector<std::string> removed,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason removed cannot be reference?

Comment on lines +242 to +244
priority_state_manager.updateClusterPrioritySet(priority, std::move(current_hosts_copy),
hosts_added, hosts_removed, absl::nullopt,
overprovisioning_factor);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: move it to caller?

@seflerZ
Copy link

seflerZ commented Mar 1, 2020

@mattklein123 Thanks so much for your suggestion. From my view I want make some explanation about the EGDS:

  1. The EGDS was thoroughly discussed in both Istio and Envoy community, you can find the issues here and here.
  2. We (including you) have discussed the Delta xDS and EGDS approaches. We make conclusion to implement this feature by introducing EGDS because:
    1. The xDS Delta is intend to achieve incremental updates in xDS sequences, not patch on a single resource. As you know the EDS has no sub-resource at present. Unless we make modification on the design to Delta xDS. No patch can be applied.
    2. The xDS Delta is not that stable and tested on production environment(Istio hasn't implemented this). Introducing patching to Delta xDS will cost big effort and increase complexity, as we've discussed. And this will introduce modification on Delta xDS protocol, which won't be easy work.
    3. I personally think introducing a side resource like EGDS won't have much side effect on current EDS handling process. As we've implemented in our PR. You can switch off EGDS at any time by just changing an attribute in the config. If we have another better solution in the future, we can remove it easily.
  3. I don think the EGDS would not be a specific use case because it solves a generic scaling problem. When you cluster grows large, there will be 1000+ endpoints within a single EDS. It wouldn't be much effective and wisely to do fully updates if only 1~2 endpoint(s) was changed.
  4. The EGDS was designed to collaborate with the K8s endpoint slice, I think it is a good practice to have all the concepts aligned.

There are many people interested in this feature. @howardjohn @hzxuzhonghu @sudhi-vm @lizan , you are welcomed to leave opinions.

@hzxuzhonghu
Copy link
Contributor

I agree with @seflerZ It is hard to guarantee a new model can be implemented quickly. EGDS is the most direct way to resolve the current scalibility issue. For no matter enduser or vendors, the service mesh currently can not satisfy their requirements, they may have very large clusers commonly exceed more than 10k sidecars. But currently, service mesh solution istio/envoy can not handle this.

As the linked designs and prs, this is not much effort doing this, why can not handle the corner cases with careful review from maintainers?

I can not imagine when we can support a totally solutions like delta xds, which is not widely proved in production i think.

So would expect this to land down asap.

@mattklein123
Copy link
Member

mattklein123 commented Mar 1, 2020

The EGDS was thoroughly discussed in both Istio and Envoy community, you can find the issues here and here.

It was never thoroughly discussed and agreed upon with the Envoy maintainers. As we make clear in the contribution guide, we expect that large features have prior agreement which was not followed here. I'm sorry that so much work has gone into this, but that is not a reason alone that it should be merged.

We (including you) have discussed the Delta xDS and EGDS approaches. We make conclusion to implement this feature by introducing EGDS because:

Per above, we have had various discussions, but we never came to any agreement on this.

I do not think that this PR has much of a chance of being merged as is, as @snowp and I both agree that it's not the best long term approach we should use. However, we welcome the opinions of the other @envoyproxy/maintainers if they feel differently.

Assuming this PR does not merge as is, I think there are a few options of how to move forward:

  1. Discuss further in a meeting setting, probably the next Envoy community call.
  2. If you are eager to get this working for you, one option would be to see if you can implement it purely as an extension cluster without any changes to the core. I think this is probably possible to do but I'm not sure. If it is implemented purely as an extension cluster I don't have much objection to it and I doubt @snowp does either. This extension could be kept as part of Istio for now also.
  3. I do maintain that a delta host solution is the right path forward. I don't think this solution requires any kind of merging as it would be a new resource sent down (LocalityLbEndpoint). Conceptually it should be not that difficult to implement within Envoy. The fact that Delta is not highly used/stable yet is orthogonal as we have to invest in that anyway. If there is time urgency (2) above might be the best option for you.

Thank you for iterating towards the best long term technical solution for the project!

@gengleilei
Copy link
Contributor Author

@mattklein123 Thank you for your advice, extension cluster refers to using CustomClusterType in the Cluster definition to add an EGDS type cluster ? I see redis implemented in this way. I think EGDS is not a special cluster, it is a new type of discovery, such as the current discovery type has static,EDS, etc. If we treat EGDS as a new type of discovery, I wonder if this meets the community's requirements ? thanks.

@mattklein123
Copy link
Member

I think EGDS is not a special cluster, it is a new type of discovery, such as the current discovery type has static,EDS, etc. If we treat EGDS as a new type of discovery, I wonder if this meets the community's requirements ? thanks.

My personal opinion is that we do not want to add the EGDS API. Once we add an API, realistically, we cannot remove it. Thus, we can't make such a decision without a lot of thought. If you would like to discuss this part more let's please do so at the next community meeting.

By recommending an extension cluster, I'm trying to help you meet your goals if you absolutely must ship something like this ASAP, as you can implement whatever you want outside of core Envoy. Beyond that, as I already said, I would rather see us invest in a delta endpoint solution. Let's call this solution the LEDS (locality endpoint discovery service) for sake of a better name. This would be a sub-API called from within EDS that can deliver individual endpoints in a delta fashion as I described above.

@stale
Copy link

stale bot commented Mar 11, 2020

This pull request has been automatically marked as stale because it has not had activity in the last 7 days. It will be closed in 7 days if no further activity occurs. Please feel free to give a status update now, ping for review, or re-open when it's ready. Thank you for your contributions!

@stale stale bot added the stale stalebot believes this issue/PR has not been touched recently label Mar 11, 2020
@mattklein123
Copy link
Member

Per discussion closing in favor of #10373

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api stale stalebot believes this issue/PR has not been touched recently waiting
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants