Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance degradation when using a large ConfigMap with many ImageTagTransformers #2869

Closed
ephesused opened this issue Aug 20, 2020 · 13 comments
Labels
area/api issues for api module area/kyaml issues for kyaml area/plugin issues for plugins kind/regression Categorizes issue or PR as related to a regression from a prior release. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.

Comments

@ephesused
Copy link
Contributor

ephesused commented Aug 20, 2020

Starting with ef924a5, the image transformer conversion to kyaml, we started seeing significantly slower performance with our large kustomization runs.

I have created a test case that shows an example of the performance problem. Given a count, the test case generates a file housing a single configmap with count entries, a file housing count image tag transformers, and kustomization.yaml that includes the two files. The default count is 999.

Using files generated with a count of 999, ef924a5 (the commit with kyaml) runs roughly six times slower than b7f7536 (the commit prior to kyaml). The output between the two is the same, so this difference is solely in run time.

FYI @monopole

Test setup

Create make-test.sh with the following content

#!/bin/bash

count=999

if [ -n "$1" ]; then
  count="$1"
fi

cat > kustomization.yaml <<EO_KUSTOMIZATION
resources:
  - resources.yaml

transformers:
  - transformers.yaml
EO_KUSTOMIZATION

cat > resources.yaml <<EO_CONFIGMAPS
apiVersion: v1
kind: ConfigMap
metadata:
  name: key-values
data:
EO_CONFIGMAPS

echo "" > transformers.yaml

for i in `seq 1 $count`; do
   value=`printf "%04d" $i`
   echo "  - key-$value=value-$value" >> resources.yaml
   cat >> transformers.yaml << EO_TRANSFORMERS
---
apiVersion: builtin
kind: ImageTagTransformer
metadata:
  name: image-transformer-$value
imageTag:
  name: old-name-$value
  newName: new-name-$value
  newTag: new-tag-$value
fieldSpecs:
- path: spec/template/spec/containers[]/image
EO_TRANSFORMERS
done

Then generate the files:

$ chmod 755 make-test.sh
$ ./make-test.sh
$

Expected behavior

Here is the run time for b7f7536, the commit prior to kyaml:

$ time build/kustomize-execs/branch-b7f7536c/kustomize build -o b7f7536c.yaml

real    0m0.790s
user    0m0.863s
sys     0m0.024s
$

Actual behavior

Here is the run time for ef924a5, the commit with kyaml:

$ time build/kustomize-execs/branch-ef924a5c/kustomize build -o ef924a5c.yaml

real    0m5.057s
user    0m5.893s
sys     0m0.127s
$
@monopole
Copy link
Contributor

monopole commented Sep 2, 2020

Wow, thanks. If you have the time and inclination, some benchmark tests in the high level krusty package (a new file dedicated to the above) would be appreciated.

I've not measured this, but my assumption is that - because we are in mid-refactor from apimachinery to kyaml - we're spending too much time in https://github.com/kubernetes-sigs/kustomize/blob/master/kyaml/filtersutil/filtersutil.go#L21. This function converts old data types to RNodes and back again, and it's called constantly. Once were completely switched to kyaml, which remains a goal for kubernetes 1.20, calls to this function can be dropped, as RNode will be the underlying data structure.

More background in #2886 and the DepProvider mentioned therein.

@Shell32-Natsu Shell32-Natsu added area/api issues for api module area/kyaml issues for kyaml area/plugin issues for plugins kind/regression Categorizes issue or PR as related to a regression from a prior release. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Oct 21, 2020
@ephesused
Copy link
Contributor Author

Sorry to say, I haven't had time to come back around here and provide clean tests.

I did want to note that my daily runs showed a slowdown in the testcase for #2808 at some point between 76a8f03 and fc06283. Given the ongoing refactor effort, I didn't think it made sense to open a new issue.

The Oct 26 7:20am Eastern time run showed no meaningful variance in the execution time. At that point, master was 76a8f03.

The Oct 27 7:20am Eastern time run tripped on the execution time chang. At that point, master was fc06283:

     kustomize_test.go:188: Retrieving kustomize 'branch-master'
     kustomize_test.go:194: 'branch-master' information:
         branch: master
         sha1: fc062839050f55a7ed46f0d066aac32c0bde8a39

Later in the log:

 === RUN   Test_kustomize/kust-issue-2808/kustomize/branch-master
     kustomize_test.go:181: Runtime variation (20.59x): branch-master ran in 844ms; branch-7e04be9e ran in 41ms.

This is from a run I did today, comparing v3.8.5 to v3.8.6:

 === RUN   Test_kustomize/kust-issue-2808/kustomize/release-v3.8.6
     kustomize_test.go:181: Runtime variation (8.90x): release-v3.8.6 ran in 828ms; release-v3.8.5 ran in 93ms.

@ephesused
Copy link
Contributor Author

As an update, the current master branch, 60c8f4c, is significantly slower than v3.8.8 (which, in turn, is significantly slower than v3.7.0).

Admittedly, my master branch build may not be completely accurate - I'm not certain I have my pin/unpin process set up correctly.

[root@48e39bcd8374 project_root]# time build/kustomize-execs/release-v3.7.0/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m0.729s
user    0m0.818s
sys     0m0.026s
[root@48e39bcd8374 project_root]# time build/kustomize-execs/release-v3.8.8/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m9.870s
user    0m12.038s
sys     0m0.198s
[root@48e39bcd8374 project_root]# time build/kustomize-execs/branch-60c8f4c5/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m23.527s
user    0m29.132s
sys     0m0.494s
[root@48e39bcd8374 project_root]#

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2021
@ephesused
Copy link
Contributor Author

ephesused commented Mar 15, 2021

/remove-lifecycle stale

This one still is an issue (though it's not as bad as it has been). v3.7.0 runs in 0.7 seconds; v4.0.5 runs in 4.0 seconds. So v4.0.5 is an improvement over v3.8.8, but it's still far behind v3.7.0.

[sdsci@2cc8dd1e265e project_root]$ time build/kustomize-execs/release-v3.7.0/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m0.691s
user    0m0.764s
sys     0m0.019s
[sdsci@2cc8dd1e265e project_root]$ time build/kustomize-execs/release-v4.0.5/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m3.964s
user    0m5.091s
sys     0m0.095s
[sdsci@2cc8dd1e265e project_root]$ time build/kustomize-execs/release-v3.8.8/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m9.473s
user    0m11.458s
sys     0m0.207s
[sdsci@2cc8dd1e265e project_root]$

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 15, 2021
@brichins
Copy link

brichins commented Mar 16, 2021 via email

@KnVerey KnVerey added this to To do in Kustomize performance Jun 9, 2021
@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 14, 2021
@ephesused
Copy link
Contributor Author

/remove-lifecycle stale

There's been improvement over time, but the execution time still is meaningfully slower when compared to v3.7.0. For example, 4.2.0's run time is over 2.5 times the run time in 3.7.0. If that performance difference is acceptable for the devs, I'm comfortable with this ticket getting closed - but I'd rather that closure happen as an explicit closure by a person (instead of an automated closure from a bot).

Thanks.

[root@13d15a2088de project_root]# time build/kustomize-execs/release-v3.7.0/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m0.699s
user    0m0.791s
sys     0m0.021s
[root@13d15a2088de project_root]# time build/kustomize-execs/release-v4.2.0/kustomize build testdata/kust-issue-2869 -o /dev/null

real    0m1.802s
user    0m2.150s
sys     0m0.049s
[root@13d15a2088de project_root]#

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 12, 2021
@jzuijlek
Copy link

Agreed, we are forced to upgrade because of deprecated kubernetes apis and performance is an issue for us.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle stale
  • Mark this issue or PR as rotten with /lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Dec 21, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Close this issue or PR with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jan 20, 2022
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api issues for api module area/kyaml issues for kyaml area/plugin issues for plugins kind/regression Categorizes issue or PR as related to a regression from a prior release. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence.
Projects
kanban
  
Under consideration
Development

No branches or pull requests

8 participants