fix(cf): CF forkjoin threading improvements #5071

german-muzquiz · 2020-11-04T17:09:19Z

CF provider creates a ForkJoinPool for each accounts defined, with a default max parallelism of 16.
This causes that in case of a large amount of accounts the number of threads increases significantly, for example having 50 accounts means 50 Applications FJP + 50 Routes FJP = 100 x 16 = 1600 threads.

This causes a lot of unesessary overhead because the only threads actually being used at any given moment are the ones used by running caching agents, which means that most of these threads are not used all the time.

The given change moves the ForkjoinPool to be a singleton bean at the provider level, so there's only one pool for all accounts, with a configurable max parallelism. In this way the full capacity of the pool can be used regardless of which agents are running at any given moment.

…-threads � Conflicts: � clouddriver-core/src/main/groovy/com/netflix/spinnaker/clouddriver/jobs/local/ForceDestroyWatchdog.java

zachsmith1 · 2020-11-05T21:21:48Z

LGTM -- One note: you will need to create a PR in halyard and update the credentials validator once this PR is merged/versioned. Anytime we touch the credentials in CD, halyard will need the accompanying change

…-threads

german-muzquiz · 2020-12-11T23:05:29Z

@fieldju this is a candidate to be backported to version 1.23: presumably with the introduction of kork-credentials for cloudfoundry in 1.23, there's a period of time when there are more ManagedAccount cloudfoundry objects than real accounts, waiting to be garbage collected. Since each ManagedAccount object creates 2 ForkJoinPool objects with 16 threads each, this causes high spikes of threads while the garbage collector kicks-in, potentially bringing down clouddriver pod.

This is considered a performance regression for cloudfoundry provider in 1.23 because the spikes of threads were not seen at least in 1.20. This is documented in this bug.

Instead of assuming this is caused by the introduction of kork-credentials change and revert that change, this PR already fixes the root cause of the issue by defining a singleton ForkJoinPool for all cloudfoundry accounts.

The implication of backporting this PR will cause the cloudfoundry setting cloudfoundry.accounts[x].maxCapiConnectionsForCache to be ignored in the change from 1.23.3 to 1.23.4.

There is not an easy way to create test cases for performance regressions like this, but I did a manual test with the same configuration reported in the bug and I only see the expected amount of threads controlled by the new setting cloudfoundry.apiRequestParallelism:

fieldju · 2020-12-11T23:12:01Z

@Mergifyio backport release-1.23.x

mergify · 2020-12-11T23:12:29Z

Command backport release-1.23.x: success

Backports have been created

#5161 fix(cf): CF forkjoin threading improvements (bp #5071) has been created for branch release-1.23.x

* feat(kubernetes): Send SIGKILL to kubectl * fix(cf): CF forkjoin threading improvements (cherry picked from commit 1f7d7b7)

* feat(kubernetes): Send SIGKILL to kubectl * fix(cf): CF forkjoin threading improvements (cherry picked from commit 1f7d7b7) Co-authored-by: German Muzquiz <35276119+german-muzquiz@users.noreply.github.com>

German Muzquiz added 4 commits October 30, 2020 09:43

feat(kubernetes): Send SIGKILL to kubectl

427b9b0

Merge branch 'master' of github.com:spinnaker/clouddriver

1271a0e

fix(cf): CF forkjoin threading improvements

6b10355

Merge branch 'master' of github.com:spinnaker/clouddriver into fix/cf…

110337b

…-threads � Conflicts: � clouddriver-core/src/main/groovy/com/netflix/spinnaker/clouddriver/jobs/local/ForceDestroyWatchdog.java

zachsmith1 approved these changes Nov 5, 2020

View reviewed changes

kevinawoo added the ready to merge Approved and ready for a merge label Nov 5, 2020

mergify bot added the auto merged Merged automatically by a bot label Nov 5, 2020

Merge branch 'master' of github.com:spinnaker/clouddriver into fix/cf…

1122c4b

…-threads

mergify bot merged commit 1f7d7b7 into spinnaker:master Nov 5, 2020

german-muzquiz deleted the fix/cf-threads branch November 5, 2020 22:28

spinnakerbot added the target-release/1.24 label Nov 5, 2020

german-muzquiz mentioned this pull request Nov 6, 2020

feat(validations/cf): Updated CF client for validations spinnaker/halyard#1819

Merged

german-muzquiz mentioned this pull request Dec 11, 2020

High number of threads in Cloudfoundry provider spinnaker/spinnaker#6229

Closed

german-muzquiz added the backport-candidate Add to PRs to designate release branch patch candidates. label Dec 11, 2020

mergify bot pushed a commit that referenced this pull request Dec 11, 2020

fix(cf): CF forkjoin threading improvements (#5071)

b70bd34

* feat(kubernetes): Send SIGKILL to kubectl * fix(cf): CF forkjoin threading improvements (cherry picked from commit 1f7d7b7)

mergify bot mentioned this pull request Dec 11, 2020

fix(cf): CF forkjoin threading improvements (bp #5071) #5161

Merged

fieldju removed the backport-candidate Add to PRs to designate release branch patch candidates. label Dec 14, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cf): CF forkjoin threading improvements #5071

fix(cf): CF forkjoin threading improvements #5071

german-muzquiz commented Nov 4, 2020

zachsmith1 commented Nov 5, 2020

german-muzquiz commented Dec 11, 2020

fieldju commented Dec 11, 2020

mergify bot commented Dec 11, 2020

fix(cf): CF forkjoin threading improvements #5071

fix(cf): CF forkjoin threading improvements #5071

Conversation

german-muzquiz commented Nov 4, 2020

zachsmith1 commented Nov 5, 2020

german-muzquiz commented Dec 11, 2020

fieldju commented Dec 11, 2020

mergify bot commented Dec 11, 2020