Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Delta XDS - improved #166

Merged
merged 50 commits into from
Sep 9, 2022
Merged

Conversation

slonka
Copy link
Member

@slonka slonka commented May 19, 2021

This is a PR based on work from @sschepens #152 and @mikegajda HubSpot#4 .

At Allegro we've implemented it here: allegro/envoy-control#255 .

We've seen 50-70% improvements in CPUs usage in Envoys that do not have significant traffic (mostly do XDS) and ~35% in more RPS intensive apps.

@slonka
Copy link
Member Author

slonka commented May 19, 2021

@mikegajda - some of your commits are missing sign-off, do you mind if I squash them and add you as a co-author?

@mikegajda
Copy link

Sure that'd be great, thanks @slonka!

sschepens and others added 27 commits May 20, 2021 08:14
Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: Sebastian Schepens <sebastian.schepens@mercadolibre.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
…delay to account for potential delays in writing to the wire (we don't want that to happen)

Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
* Update protobuf to envoy 1.16.0

Signed-off-by: Lukasz Jedryczka <lukasz.jedryczka@allegro.pl>

* Fixing test by setting -boostrap-version 2 flag

Signed-off-by: Lukasz Jedryczka <lukasz.jedryczka@allegro.pl>

* Update protobuf to envoy 1.16.0

Signed-off-by: Lukasz Jedryczka <lukasz.jedryczka@allegro.pl>

* Information about update envoy image version in README.md

Signed-off-by: wookieJ <lukaszjedryczka.biuro@gmail.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: karthik <listaction@gmail.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
* Update protobuf to envoy 1.17

Signed-off-by: Lukasz Dziedziak <lukasz.dziedziak@allegro.pl>

* Use v2 version

Signed-off-by: Lukasz Dziedziak <lukasz.dziedziak@allegro.pl>

* Support V2/V3 in Envoy - remove V2 in separate PR

Signed-off-by: Lukasz Dziedziak <lukasz.dziedziak@allegro.pl>

* UDPA download - split directory create/copy

Signed-off-by: Lukasz Dziedziak <lukasz.dziedziak@allegro.pl>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
Signed-off-by: slonka <slonka@users.noreply.github.com>
@slonka slonka force-pushed the delta-xds-non-breaking-slonka-hash-bytes branch from 6e66493 to bac2a72 Compare May 20, 2021 06:15
Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
kukulam-allegro
kukulam-allegro previously approved these changes Mar 1, 2022
Automaat
Automaat previously approved these changes Mar 11, 2022
@Ferdudas97
Copy link
Contributor

Tests on our environment are finished. We confirmed that, using delta improves cpu utilization ( dropped by ~50%).

@rulex123
Copy link
Collaborator

@Ferdudas97 Hello!

I have tried this patch out on Spotify's perimeter, which is Envoy based and uses a custom control plane that sends SoTW updates to our Envoy fleet. The only updates sent concern (atm) changes in clusters endpoints, and we have something like 350 upstream clusters configured in our Envoys.

I tried out this patch on one of our Envoys that serves production traffic (around 8.5K RPS), and didn't observe any noticeable changes in CPU usage, though I could verify that the Envoy was using incremental xDS . So I am curious about the setup you tried this on, since you noticed such a drastic improvement: could you share some details about the setup (e.g. number of clusters, type and frequency of changes etc.)?

@Ferdudas97
Copy link
Contributor

Hi @rulex123, I tested it on our environment where we have ~1750 clusters. We are using envoy-control, which is our abstraction layer over java-control-plane. According to our metrics, it returns 200-600 snapshots per minute.
Zrzut ekranu 2022-03-16 o 07 54 47

On our service with enabled DELTA_GRPC and wildcard dependencies (service is receiving SoTW), we can see CPU utilization drop by 50%.

Zrzut ekranu 2022-03-16 o 08 24 36

Part of envoy's config

dynamic_resources:
  lds_config:
    ads: {}
    resource_api_version: V3
    initial_fetch_timeout: 20s
  cds_config:
    ads: {}
    resource_api_version: V3
    initial_fetch_timeout: 20s
  ads_config:
    api_type: DELTA_GRPC
    transport_api_version: V3
    grpc_services:
      envoy_grpc:
        cluster_name: envoy-control-xds

@rulex123
Copy link
Collaborator

I would like to start removing support for xDS v2 altogether, do you think it would be possible to coordinate that effort with adding support for xDS incremental? i.e. If we removed support for v2 and then updated this patch accordingly, the amount of code to test/review to ship xDS incremental would reduce significantly.

@Automaat
Copy link
Contributor

I would like to start removing support for xDS v2 altogether, do you think it would be possible to coordinate that effort with adding support for xDS incremental? i.e. If we removed support for v2 and then updated this patch accordingly, the amount of code to test/review to ship xDS incremental would reduce significantly.

@rulex123 We don't want to merge these two topics, since these are two different topics. I have a local branch with almost removed v2 API and would like to push it as soon as we merge this pull request with delta XDS.

@rulex123
Copy link
Collaborator

I would like to start removing support for xDS v2 altogether, do you think it would be possible to coordinate that effort with adding support for xDS incremental? i.e. If we removed support for v2 and then updated this patch accordingly, the amount of code to test/review to ship xDS incremental would reduce significantly.

@rulex123 We don't want to merge these two topics, since these are two different topics. I have a local branch with almost removed v2 API and would like to push it as soon as we merge this pull request with delta XDS.

I agree we shouldn't merge the two in one patch: what I am suggesting is that we first ship removal of support for xDS v2, and after that change has been merged upstream, we rebase the delta-xds-non-breaking-slonka-hash-bytes branch on main so that the scope of the code to review/test in this patch is reduced.

I think this is worth it, especially if you have a branch where you started the v2 removal work. I can also help with that work, in case you don't have cycles to continue it. WDYT?

@Automaat
Copy link
Contributor

I would like to start removing support for xDS v2 altogether, do you think it would be possible to coordinate that effort with adding support for xDS incremental? i.e. If we removed support for v2 and then updated this patch accordingly, the amount of code to test/review to ship xDS incremental would reduce significantly.

@rulex123 We don't want to merge these two topics, since these are two different topics. I have a local branch with almost removed v2 API and would like to push it as soon as we merge this pull request with delta XDS.

I agree we shouldn't merge the two in one patch: what I am suggesting is that we first ship removal of support for xDS v2, and after that change has been merged upstream, we rebase the delta-xds-non-breaking-slonka-hash-bytes branch on main so that the scope of the code to review/test in this patch is reduced.

I think this is worth it, especially if you have a branch where you started the v2 removal work. I can also help with that work, in case you don't have cycles to continue it. WDYT?

I will work on removing API v2 during this week, hopefully I will prepare first PR this week

@mikegajda
Copy link

👋 any updates here? I'd be happy to help get this merged as I'm hoping to deploy this change soon

@Ferdudas97
Copy link
Contributor

👋 any updates here? I'd be happy to help get this merged as I'm hoping to deploy this change soon

Hey, the work is done on this branch. We are waiting for the review and merging of this PR -> there is a problem with publishing artifact for JCP.

…g-slonka-hash-bytes and resolve conflicts

Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
@Ferdudas97 Ferdudas97 dismissed stale reviews from Automaat and kukulam-allegro via ac74ede July 25, 2022 11:47
@Ferdudas97 Ferdudas97 force-pushed the delta-xds-non-breaking-slonka-hash-bytes branch from 33a623c to fc131d5 Compare July 27, 2022 10:04
@Ferdudas97
Copy link
Contributor

@rulex123 @mikegajda it is ready for your reviews

@mikegajda
Copy link

Taking a look at this in a moment @Ferdudas97! Thanks for getting it to this point

}

@Override
public void onV3StreamRequest(long streamId, DiscoveryRequest request) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be onV2StreamRequest to fulfill this the goal above "throws if it sees a v2 request", or were you thinking of throwing on a V3 non-delta request here and the comment above is in out of sync?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it was out of sync

Signed-off-by: radoslaw.chrzanowski <radoslaw.chrzanowski@allegro.pl>
@Ferdudas97 Ferdudas97 force-pushed the delta-xds-non-breaking-slonka-hash-bytes branch from db316b7 to fdc7425 Compare August 11, 2022 16:57
@Ferdudas97 Ferdudas97 merged commit f54d234 into main Sep 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.