Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Starting with 00f0fd71, performance degradation for strategic patch merges #2987

Closed
ephesused opened this issue Sep 15, 2020 · 34 comments · Fixed by #4568
Closed

Starting with 00f0fd71, performance degradation for strategic patch merges #2987

ephesused opened this issue Sep 15, 2020 · 34 comments · Fixed by #4568
Assignees
Labels
area/api issues for api module area/plugin issues for plugins help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/regression Categorizes issue or PR as related to a regression from a prior release. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.

Comments

@ephesused
Copy link
Contributor

The run time for strategic patch merges in 00f0fd7 is roughly five times the run time for 1c6481d (the commit immediately before 00f0fd7).

Apologies for the lack of a test case PR.

Test setup

kustomization.yaml

resources:
  - resources.yaml

patchesStrategicMerge:
  - merge.yaml

resources.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment
spec:
  template:
    spec:
      containers:
      - name: this-is-my-container
        image: this-is-my-image

merge.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name: deployment
spec:
  template:
    metadata:
      annotations:
        anno1: value1

Equivalent output

The kustomize build output agrees between builds of the two commits:

$ diff <(./kustomize-1c6481d0 build ./input) <(./kustomize-00f0fd71 build ./input)
$

Expected run time (based on 1c6481d)

Execution using 1c6481d takes about an eighth of a second:

$ time ./kustomize-1c6481d0 build ./input > /dev/null

real    0m0.124s
user    0m0.122s
sys     0m0.016s
$

Actual run time for 00f0fd7

Execution using 00f0fd7 takes about five eighths of a second:

$ time ./kustomize-00f0fd71 build ./input > /dev/null

real    0m0.636s
user    0m0.650s
sys     0m0.021s
$
@Shell32-Natsu
Copy link
Contributor

Thanks for the performance test. We are in a big refactor of fundamental components so will consider this later.

@icy
Copy link

icy commented Oct 1, 2020

Just fyi, I have a base manifest file with 59k lines, and kustomize with some patch merge took 31 seconds. Not a big deal, as kubectl diff and kubetl apply may take 10 minutes to process :D

@Shell32-Natsu Shell32-Natsu added area/api issues for api module area/plugin issues for plugins kind/regression Categorizes issue or PR as related to a regression from a prior release. labels Oct 21, 2020
@Shell32-Natsu Shell32-Natsu added the priority/backlog Higher priority than priority/awaiting-more-evidence. label Oct 21, 2020
@savitha-ping
Copy link

Between kustomize v3.5.4 and v3.5.5, "kustomize build" is about 10 times slower. This coincides with when kyaml and other dependencies were updated.

kustomize/api v0.3.3
sigs.k8s.io/kustomize/kyaml v0.1.10
sigs.k8s.io/yaml v1.2.0

@marcghorayeb
Copy link

marcghorayeb commented Jan 4, 2021

On 3.9.1:

  • kustomize build --enable_kyaml=false currently takes around 28 seconds
  • kustomize build still building after 1min 40 seconds

Edit:

kustomize build finishes after 2min 35 seconds.

@ephesused
Copy link
Contributor Author

ephesused commented Jan 4, 2021

For the test case in this issue, I don't see meaningful performance differences with v3.9.1. For performance trouble arising in recent master builds and v3.9.1, #2869 (comment) may be related.

(Edited to clarify that this comment was in response to @marcghorayeb's comment.)

@savitha-ping
Copy link

savitha-ping commented Jan 5, 2021

Hmm, that's interesting. I see no tangible improvement when running "kustomize build --enable_kyaml=false" on kustomize versions > v3.5.4. So maybe it isn't kyaml that's causing the degradation. Per my test results, I'm guessing it must be the introduction of one of these other components:

kustomize/api v0.3.3
sigs.k8s.io/yaml v1.2.0

Those changed between kustomize v3.5.4 and later versions.

@maganaluis
Copy link

maganaluis commented Jan 6, 2021

I also noticed this when installing Kubeflow via Kustomize. It takes over 20s to build the manifests, I am using v3.9.1

@william-salt-cko
Copy link

It takes me roughly 25 seconds to run against a handful of simple kustomizations. It is really quite slow.

@emas80
Copy link

emas80 commented Apr 7, 2021

I was doing some tests with the latest version v4.0.5 before upgrading our CI/CD.
The performance drop on my machine is incredible compared to v3.8.4 (which, like others have pointed out, was much slower than v3.5.4):

v4.0.5

time ./kustomize build stage -o stage.yaml

real	1m15.867s
user	2m16.644s
sys	0m0.766s

v3.8.4

time kustomize build stage -o stage.yaml

real	0m27.704s
user	0m47.028s
sys	0m0.518s

The generated file is about 2.2MB, so pretty big.

We really can't use that on our CI/CD, what has changed that could explain these poor performances?

@Shell32-Natsu
Copy link
Contributor

@emas80 That's interesting. @monopole I suppose the performance should be improved but it looks that it's hurt.

@fejta-bot
Copy link

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 6, 2021
@k8s-triage-robot
Copy link

Stale issues rot after 30d of inactivity.
Mark the issue as fresh with /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.

If this issue is safe to close now please do so with /close.

Send feedback to sig-contributor-experience at kubernetes/community.
/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 5, 2021
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@natasha41575 natasha41575 reopened this Sep 7, 2021
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Sep 7, 2021
@natasha41575
Copy link
Contributor

#4152 may fix it

@natasha41575 natasha41575 added help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. triage/accepted Indicates an issue or PR is ready to be actively worked on. labels Oct 27, 2021
@k8s-ci-robot k8s-ci-robot removed the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 27, 2021
@shapirus
Copy link

Yes, I do remember about this issue (still having to run our argocd with kustomize 3.5.4!), only haven't had time to properly test it yet. I will ping you as soon as I have something, hope to do it some time next week.

@shapirus
Copy link

shapirus commented Nov 1, 2021

@natasha41575 I am attaching a minimal set of manifests that demonstrates the performance degradation between v3.5.4 and v4.4.0. A Dockerfile that, when built, downloads the respective kustomize binaries and runs them, is included for convenience.

kustomize-performance-test.zip

In my case the slowdown happens on apiVersion: bitnami.com/v1alpha1, kind: SealedSecret, however I have noticed that it happens on any random string in those fields just as well. Thus it looks like kustomize v4.4.0 spends extra time on doing something when it encounters the apiVersion and/or kind values that it is not aware of (whereas v3.5.4 does not do it).

To me it sounds like the potential fix should be trivial, given the change that was done in order to resolve #4100: just process those resources as is.

p.s. this really belongs in #4100 that should be reopened, if you wish, rather than here, since strategicPatchMerge is not used in this case.

@shapirus
Copy link

@natasha41575 ? :)

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@KnVerey KnVerey reopened this Dec 22, 2021
@shapirus
Copy link

Shoud I create a separate issue for this case, anyone? It doesn't seem relevant to strategic patch merges at all, and it has a clear demonstration.

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

@k8s-ci-robot
Copy link
Contributor

@k8s-triage-robot: Closing this issue.

In response to this:

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Reopen this issue or PR with /reopen
  • Mark this issue or PR as fresh with /remove-lifecycle rotten
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@natasha41575
Copy link
Contributor

natasha41575 commented Apr 4, 2022

Super delayed. but hopefully #4568 will fix this. I tested it with the zip file @shapirus provided and have some promising results:

On master:

$ time kustomize build
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: example

real    0m0.670s
user    0m0.688s
sys     0m0.037s

With my PR:

$ time kustomize build
apiVersion: bitnami.com/v1alpha1
kind: SealedSecret
metadata:
  name: example

real    0m0.109s
user    0m0.114s
sys     0m0.029s

@shapirus
Copy link

@natasha41575 @ephesused so, after all, it turned out to be two different issues. The situation that I initially tested showed a several hundred percent degradation. It was fixed by #4569.

But now that I tested 5.2.1 vs 3.5.4 on real manifests which make heavy use of patches (both types), I saw that 5.2.1 performed worse. It is faster than 3.5.4 in other parts, so, plus and minus combined, overall degradation in the end isn't so noticeable, but in a synthetic test 5.2.1 is almost 2 times slower compared to v3.5.4.

I will prepare a minimal set of manifests that demonstrates the issue and hopefully helps to track the culprit down, then we can decide whether we should reopen this issue or open a new one.

@ephesused
Copy link
Contributor Author

I will prepare a minimal set of manifests that demonstrates the issue and hopefully helps to track the culprit down, then we can decide whether we should reopen this issue or open a new one.

Sorry to hear about the increase. I suspect the root cause will be unrelated to this one, so I'd suggest a new ticket that links back to this one. If you're able to provide something we can use for investigation, I'm happy to do some analysis on it.

Thanks!

@shapirus
Copy link

@ephesused there is indeed a significant slowdown starting with 00f0fd7. However, it's rather difficult to pinpoint exact changeset that did it, because there were both related and unrelated performance changes over time that affect test results even in a very minimal set of manifests.

But I agree, it will be better to open a new ticket. Will do it tomorrow. I have already updated my tests to conveniently specify versions (and git revisions -- to be built locally) to be tested to have a set of reference points, just need to finish a few minor things.

@shapirus
Copy link

@ephesused here's the new issue:

It's quite general, but boils down to the PSM issue as the only remaining today.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/api issues for api module area/plugin issues for plugins help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. kind/regression Categorizes issue or PR as related to a regression from a prior release. lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. priority/backlog Higher priority than priority/awaiting-more-evidence. triage/accepted Indicates an issue or PR is ready to be actively worked on.
Projects
None yet
Development

Successfully merging a pull request may close this issue.