Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eds: decrease computational complexity of updates #11442

Merged
merged 24 commits into from
Jul 9, 2020

Conversation

pgenera
Copy link
Contributor

@pgenera pgenera commented Jun 4, 2020

Commit Message: Makes BaseDynamicClusterImpl::updateDynamicHostList O(n) rather than O(n^2)
Additional Description: Instead of calling .erase() on list iterators as we find them, we swap with the end of the list and erase after iterating over the list. This shows a ~3x improvement in execution time in the included benchmark test.
Risk Level: Medium. No reordering happens to the endpoint list. Not runtime guarded.
Testing: New benchmark, existing unit tests pass (and cover the affected function).
Docs Changes: N/A
Release Notes: N/A

#2874 #11362

pgenera added 4 commits June 3, 2020 19:40
…impl.

Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Contributor Author

pgenera commented Jun 4, 2020

I'm happy to put this behind a runtime guard if it seems prudent.

@pgenera pgenera marked this pull request as ready for review June 4, 2020 18:21
@jmarantz jmarantz self-assigned this Jun 4, 2020
source/common/upstream/upstream_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/upstream_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/upstream_impl.cc Outdated Show resolved Hide resolved
source/common/upstream/upstream_impl.cc Outdated Show resolved Hide resolved
@mattklein123
Copy link
Member

I'm happy to put this behind a runtime guard if it seems prudent.

IMO it's OK to not have a runtime guard for this, but I would raise the regression risk to medium at least. @snowp can you also take a look at this?

Signed-off-by: Phil Genera <pgenera@google.com>
@htuch
Copy link
Member

htuch commented Jun 9, 2020

This should be rebased on #11505.

pgenera added 3 commits June 11, 2020 13:05
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add to the PR description a comparison of your speed test before/after your n^2 fix?

You'd have to patch the speed-test only into a different client.

test/common/common/utility_test.cc Outdated Show resolved Hide resolved
test/common/common/utility_test.cc Outdated Show resolved Hide resolved
Signed-off-by: Phil Genera <pgenera@google.com>
test/common/common/utility_test.cc Outdated Show resolved Hide resolved
test/common/common/utility_test.cc Outdated Show resolved Hide resolved
pgenera added 2 commits June 17, 2020 14:32
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Contributor Author

pgenera commented Jun 17, 2020

Can you add to the PR description a comparison of your speed test before/after your n^2 fix?

You'd have to patch the speed-test only into a different client.

Done, its still ~3.2x improvement over the baseline. Results are linked from the description; I can be more explicit than that if you'd like.

jmarantz
jmarantz previously approved these changes Jun 17, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

up to you if you want to think about larger variable names.

This does need a @envoyproxy/senior-maintainers approval.

source/common/upstream/upstream_impl.cc Outdated Show resolved Hide resolved
@jmarantz
Copy link
Contributor

and thanks for doing this!

@pgenera
Copy link
Contributor Author

pgenera commented Jun 17, 2020

Can you add to the PR description a comparison of your speed test before/after your n^2 fix?
You'd have to patch the speed-test only into a different client.

Done, its still ~3.2x improvement over the baseline. Results are linked from the description; I can be more explicit than that if you'd like.

After a bit of thought I noticed the performance of priorityAndLocalWeighted is about ~30% slower. Notably those tests don't exercise any of the n^2 logic (eg, with one iteration none of those .erase() calls happen), but I'm still surprised that the visitor-predicate-pattern is measurably slower than iterating in situ. Even with this mysterious observation, I think a 320% improvement in what I think is the common case is worth a 30% worse performance on the first iteration.

@jmarantz
Copy link
Contributor

Quick check: are you comparing optimized runs? Without optimization, inlining, and collapsing of dead logic I could see this being a significant performance degradation.

@jmarantz
Copy link
Contributor

I see in your comment you did use -c opt. It's probably worth an iteration with cachegrind or callgrind focusing on the troublesome use-case to see just what's up. Possibly the lambda context adds a level of indirection through a generated structure that might not be possible to fully optimize away.

If the absolute per-itereration perf penalty is not too great it might be fine to just explain that, maybe as a comment in the code for posterity.

Signed-off-by: Phil Genera <pgenera@google.com>
jmarantz
jmarantz previously approved these changes Jun 29, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks! Praying to the gods of clang it goes through!

@envoyproxy/senior-maintainers

test/benchmark/main.cc Outdated Show resolved Hide resolved
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Contributor Author

pgenera commented Jun 29, 2020

Great, thanks! Praying to the gods of clang it goes through!

I do not think we have been smiled upon :D. I'll be out the rest of this week, but will poke my head in late tonight in case there's something easy to do.

test/benchmark/main.cc Outdated Show resolved Hide resolved
@jmarantz
Copy link
Contributor

jmarantz commented Jul 2, 2020

also merge master to hopefully pick up a fix that was made to the http2 integration test for tsan.

@jmarantz
Copy link
Contributor

jmarantz commented Jul 2, 2020

/wait

pgenera added 2 commits July 6, 2020 13:00
Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: Phil Genera <pgenera@google.com>
@pgenera
Copy link
Contributor Author

pgenera commented Jul 7, 2020

also merge master to hopefully pick up a fix that was made to the http2 integration test for tsan.

Done and done. And it appears to have helped!

jmarantz
jmarantz previously approved these changes Jul 8, 2020
Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@envoyproxy/senior-maintainers

Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a nit.
/wait

test/benchmark/main.h Show resolved Hide resolved
Signed-off-by: Phil Genera <pgenera@google.com>
Copy link
Member

@htuch htuch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks!

@pgenera
Copy link
Contributor Author

pgenera commented Jul 8, 2020

Looking through the CI failures:

  • coverage: [ FAILED ] IpVersionsClientType/HdsIntegrationTest.SingleEndpointUnhealthyHttp/5, where GetParam() = (4-byte object <00-00 00-00>, 4-byte object <01-00 00-00>, 0): unrelated
  • windows: //test/extensions/filters/http/router:auto_sni_integration_test FAILED: unrelated

Both of these have high-flakiness warnings when I look at them in azure. They all (of course) pass locally and with RBE.

Signed-off-by: Phil Genera <pgenera@google.com>
@jmarantz
Copy link
Contributor

jmarantz commented Jul 9, 2020

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@htuch htuch merged commit b1e62a3 into envoyproxy:master Jul 9, 2020
@pgenera pgenera deleted the eds-nsquared branch July 13, 2020 16:07
}

skip_expensive_benchmarks = skip_switch.getValue();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add some big nice WARNING when this flag is enabled in order to increase the chances of someone noticing the difference between envoy_cc_benchmarks and tests for those benchmarks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in #12121

scheler pushed a commit to scheler/envoy that referenced this pull request Aug 4, 2020
Makes BaseDynamicClusterImpl::updateDynamicHostList O(n) rather than O(n^2)

Instead of calling .erase() on list iterators as we find them, we swap with the end of the list and erase after iterating over the list. This shows a ~3x improvement in execution time in the included benchmark test.

Risk Level: Medium. No reordering happens to the endpoint list. Not runtime guarded.
Testing: New benchmark, existing unit tests pass (and cover the affected function).
Docs Changes: N/A
Release Notes: N/A

Relates to envoyproxy#2874 envoyproxy#11362

Signed-off-by: Phil Genera <pgenera@google.com>
Signed-off-by: scheler <santosh.cheler@appdynamics.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants