Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Researching performance degradation (and improvement) over a number of releases #5422

Open
shapirus opened this issue Oct 27, 2023 · 24 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@shapirus
Copy link

shapirus commented Oct 27, 2023

Summary

This is an attempt to track kustomize's performance degradation (and improvement) for several test cases, each working with a minimal set of manifests aiming to utilize a specific set of kustomize's subroutines.

TLDR

patchesStrategicMerge is the only thing that is still slow.

Benchmark description

The tests are as follows.

1. A basic Deployment:

apiVersion: apps/v1
kind: Deployment
metadata:
 name: test
spec:
  replicas: 1

+

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

resources:
- deployment.yaml

2. Same Deployment plus a JSON6902 patch:

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

resources:
- deployment.yaml

patches:
- patch: |-
    - op: replace
      path: /spec/replicas
      value: 2
  target:
    group: apps
    version: v1
    kind: Deployment
    name: test

3. Same Deployment plus a strategic merge patch:

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1

resources:
- deployment.yaml

patches:
- patch: |-
    apiVersion: apps/v1
    kind: Deployment
    metadata:
     name: test
    spec:
      replicas: 2

4. A resource of an "unknown" kind

(can be e.g. SealedSecret in a real application):

apiVersion: apps/v1
kind: UnknownKind
metadata:
 name: test
spec:
  replicas: 1

It is worth mentioning that saving the patches in separate files made no difference compared to having them inline in kustomization.yaml. There is no difference between the resources having and not having the namespace attribute, either.

Running the benchmark

I have prepared a set of shell scripts designed to be run in a Docker container that execute a user-configurable number of iterations (default: 200, can be set via ITERATIONS env variable) for each of the above cases for several versions (and git revisions) of kustomize starting with v3.5.4 and ending with v5.2.1.

The mentioned benchmark supports building for linux/amd64 and linux/arm64 (meaning it should also be possible to build and run it on an ARM Mac host, but I have no hardware to verify this), and runtime has been tested on both platforms.

A zip file is attached to this ticket which contains manifests, helper scripts (beware: not necessarily safe to run outside a docker container), and Dockerfile. The versions/revisions to download or build are specified in Dockerfile, see comments inside for more information. Also see the end of this comment for an updated version.

Build and run in one command:

docker run -it $(docker build -q .)

(but it's better to run a simple docker build . first to make sure it builds properly -- unlike the one above, this will produce output.)

Test results

(also see updated results with more versions in the comments below.)

### Starting benchmark on Linux x86_64

# kustomize versions: 3.5.4 3.7.0 3.8.0 1c6481d0 00f0fd71 3.8.3 3.8.6 4.1.2 4.3.0 4.4.0 5.0.0 5.1.0 5.2.1
# iterations per test: 200
# tests: 1_no-patches 2_patches-json6902 3_patches-strategic-merge 4_no-patches-unknown-kind 
# time unit: seconds

             test: 1   test: 2   test: 3   test: 4
    v3.5.4      4.47      4.53      4.42      4.24
    v3.7.0      4.54      4.59      4.44      4.78
    v3.8.0      4.73      4.60     20.03      4.69
  1c6481d0      4.90      4.86     18.76      4.63
  00f0fd71      4.85      4.98     95.63      4.62
    v3.8.3      4.96      5.03    106.87      4.64
    v3.8.6    104.41    105.33    104.61    104.73
    v4.1.2    120.12    119.76    119.68    119.94
    v4.3.0    124.38    124.89    124.41    124.37
    v4.4.0      1.81      1.96    124.81    124.42
    v5.0.0      1.01      1.13     11.65     11.70
    v5.1.0      1.01      1.06     11.51     11.26
    v5.2.1      1.06      1.08     11.56      1.01
ARM64 results for the sake of completeness
### Starting benchmark on Linux aarch64

# kustomize versions: kustomize/v3.5.4 kustomize/v3.7.0 kustomize/v3.8.0 1c6481d0 00f0fd71 kustomize/v3.8.3 3.8.6 4.1.2 4.3.0 4.4.0 5.0.0 5.2.1
# iterations per test: 100
# tests: 1_no-patches 2_patches-json6902 3_patches-strategic-merge 4_no-patches-unknown-kind 
# time unit: seconds

             test: 1   test: 2   test: 3   test: 4
    v3.5.4      4.61      4.41      4.33      4.29
    v3.7.0      4.98      4.79      4.78      4.84
    v3.8.0      4.81      4.87     18.38      4.87
  1c6481d0      5.01      5.43     18.73      4.90
  00f0fd71      5.05      5.14    113.08      4.89
    v3.8.3      4.95      5.12    112.45      4.75
    v3.8.6    108.89    108.59    109.17    108.97
    v4.1.2    124.71    125.93    125.55    124.82
    v4.3.0    127.36    127.53    127.15    128.09
    v4.4.0      1.92      1.99    127.89    127.31
    v5.0.0      1.34      1.45     11.82     11.94
    v5.2.1      1.32      1.41     11.84      1.34

As we can see, there was a slight (but reproducibly visible) degradation between 3.5.4 and 3.8.0 in tests 1, 2, 4, and a significant 5x degradation of test 3 (PSM) starting with 3.8.0.

In addition to the continuing gradual degradation from version to version, revision 00f0fd7 introduced further 5x degradation in test 3 (PSM) compared to 3.8.0 (revision 1c6481d is shown here as it comes right before 00f0fd7), see #2987 for this specific one.

Version 3.8.3 is yet slower than 00f0fd7.

Version 3.8.6 was as slow as 3.8.3 in test 3 (PSM), but introduced a 20x degradation in the rest of the tests, making them as slow as PSM (#4100).

Then there was further 20% degradation in all tests by version 4.3.0.

Improvements started to come with 4.4.0, but tests 3 and 4 stayed unfixed.

Version 5.0.0 seems to have added a serious improvement in something that affected all test cases, but tests 3 (PSM) and 4 (unknown kind), even though becoming much faster than in previous releases, remained much slower than tests 1 and 2.

Version 5.2.1 fixed test 4 (unknown kind), it came with merging PR #5076, discussed in #4569.

Conslusion

The slow patchesStrategicMerge, even though it is much faster than in older releases, still remains, and it is probably the only significant performance issue that still requires close attention.

There is, apparently, some unoptimal code specific to PSM. As we can see, there were improvements that made all modes more effective, PSM included, but relatively to the others it is still very slow.

Related issues

Here are some of the existing issues that gave hints as for what specific versions/revisions were to be tested:

...and the following might give yet another overall improvement?

Follow-up

I have added more tests to cover the Components functionality: 5) no base + plain Deployment as a Component; 6) plain Deployment in base + a JSON6902 patch as a Component; 7) same as 2, but with PSM.

Results:

Starting kustomize benchmark on Linux x86_64
kustomize versions: 
  3.5.4
  3.7.0
  3.8.0
  5.0.0
  5.2.1
iterations per test: 200
tests: 
  1_no-patches
  2_patches-json6902
  3_patches-strategic-merge
  4_no-patches-unknown-kind
  5_component-no-base-no-patches
  6_component-json6902-over-base
  7_component-PSM-over-base
time unit: seconds

             test: 1   test: 2   test: 3   test: 4   test: 5   test: 6   test: 7
    v3.5.4      4.42      4.55      4.59      4.41    (fail)    (fail)    (fail)
    v3.7.0      4.78      4.78      4.75      4.60      5.86      7.22      7.14
    v3.8.0      4.72      4.84     19.97      4.70      5.81      7.25     21.40
    v5.0.0      1.06      1.04     11.74     11.98      2.10      2.87     13.43
    v5.2.1      1.05      1.05     11.53      1.07      1.91      2.90     13.22

As we can see here, in 3.7.0, when Components were introduced, the basic case was only 1.25 times slower than the no-component plain-deployment case and the json6902 component test was 1.6 times slower than the respective no-component test.

In v5.2.1, however, this difference increased to 1.8x and 2.76x, respectively. This may indicate either that there are still possibilities for improvement in the Components code, or that the common overhead was reduced so much that what's left are the Components' genuine computational requirements which became more visible after shaving off the time spent on the common overhead.

It is also clear that both the component and no-component patchesStragegicMerge cases suffer from the same performance hit and should equally benefit from a single fix.

Not attaching my tests as a zip file any longer, since they are now available on github: https://github.com/shapirus/kustomize-benchmark-suite. There's no readme, but if somebody is interested, there is a brief instruction in Dockerfile.

@shapirus shapirus added the kind/bug Categorizes issue or PR as related to a bug. label Oct 27, 2023
@k8s-ci-robot k8s-ci-robot added the needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. label Oct 27, 2023
@k8s-ci-robot
Copy link
Contributor

This issue is currently awaiting triage.

SIG CLI takes a lead on issue triage for this repo, but any Kubernetes member can accept issues by applying the triage/accepted label.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@shapirus
Copy link
Author

So there were two releases that clearly made PSM performance worse: 1) 3.8.0; 2) 00f0fd7.

#2987, which was specifically referring to PSM (created for 00f0fd7) is closed with #4568, which first appeared in the v4.5.5 release, unless I'm reading the git log wrong.

I couldn't find any issues or PRs concerning the earlier 5x degradation that the benchmark shows, which was introduced between 3.7.0 and 3.8.0. It's quite likely that that code still exists.

I am now going to add v4.5.5 to the list of versions to run the benchmark for and re-run it. I expect that, if #4568 did the job, PSM performance will become 5x faster than the release before it.

Will post the updated results and zip archive of the benchmark suite in further comments when they're ready.

@shapirus
Copy link
Author

shapirus commented Oct 27, 2023

I expect that, if #4568 did the job, PSM performance will become 5x faster than the release before it.

That's precisely what happened!

Here are the test results with the added 4.5.4 and 4.5.5 that demonstrate the effect of #4568.

### Starting benchmark on Linux x86_64

# kustomize versions: 3.5.4 3.7.0 3.8.0 1c6481d0 00f0fd71 3.8.3 3.8.6 4.1.2 4.3.0 4.4.0 4.5.4 4.5.5 5.0.0 5.1.0 5.2.1
# iterations per test: 200
# tests: 1_no-patches 2_patches-json6902 3_patches-strategic-merge 4_no-patches-unknown-kind 
# time unit: seconds

             test: 1   test: 2   test: 3   test: 4
    v3.5.4      4.37      4.51      4.52      4.38
    v3.7.0      4.70      4.85      4.60      4.78
    v3.8.0      4.65      4.72     19.82      4.75
  1c6481d0      4.75      5.03     18.69      4.54
  00f0fd71      4.82      4.91     96.12      4.55
    v3.8.3      4.97      5.14    106.78      4.83
    v3.8.6    104.81    106.48    105.37    105.14
    v4.1.2    120.08    120.51    120.08    120.25
    v4.3.0    124.54    124.79    125.05    124.91
    v4.4.0      1.84      2.02    125.49    124.94
    v4.5.4      1.49      1.66    114.47    114.45
    v4.5.5      1.67      1.83     12.69     12.27
    v5.0.0      1.03      1.27     11.53     11.68
    v5.1.0      1.14      1.21     11.45     11.28
    v5.2.1      1.04      1.24     11.42      1.28

So, finally, it seems that we should be looking for something between 3.7.0 and 3.8.0 that introduced the first PSM performance degradation.

At this point, someone familiar with the code base (@ephesused? :)) should be more efficient in searching for the culprit than myself.

Updated benchmark suite: kustomize-versions-benchmark-moreversions.zip

@shapirus
Copy link
Author

shapirus commented Oct 27, 2023

Nevermind, there's just a few commits between 3.7.0 and 3.8.0.

I tested all of them:

### Starting benchmark on Linux x86_64

# kustomize versions: 42d1f7b7 d3a7335b def00220 5a022862 6a50372d
# iterations per test: 200
# tests: 1_no-patches 2_patches-json6902 3_patches-strategic-merge 4_no-patches-unknown-kind 
# time unit: seconds

             test: 1   test: 2   test: 3   test: 4
  42d1f7b7      4.48      4.53      4.46      4.48
  d3a7335b      4.62      4.69      4.61      4.41
  def00220      4.48      4.54      4.57      4.63
  5a022862      4.58      4.54     17.68      4.48
  6a50372d      4.62      4.59     17.98      4.60

So we see that 5a02286 is the one that introduced the first patchesStrategicMerge performance degradation, for which I could find no issue or PR with a fix. It was simply an update of kustomize/api from v0.4.2 to v0.5.0, so the actual code that causes this degradation is to be found there (unless it was fixed later).

@shapirus
Copy link
Author

shapirus commented Oct 27, 2023

And this is the most likely chunk of code that could introduce it (so the actual commit is d3a7335, whereas 5a02286 is when it actually started to be used).

This code still sits there in that file, but with some later additions.

@ephesused
Copy link
Contributor

As an aside, yes, there were significant performance improvements with v5.0.0 (#4791, #4944, #4809).

Back to the issue at hand...

Looking around a little bit, I suspect the performance issue comes from something like a one-time initialization. I say that since I see similar run time performance when I invoke the executable, but when I craft a test that loops 200 builds within the same execution, the run time for test 3 is on par with test 2.

The effect of the one-time extra cost would be most significant to users that have lots of kustomize invocations against a large number of small builds. If I remember correctly, that's exactly your situation.

$ time for i in {1..200}; do bin/kustomize-v5.2.1 build -o /dev/null issue5422/kustomize-versions-benchmark/tests/2_patches-json6902; done

real    0m4.448s
user    0m1.512s
sys     0m1.117s
$ time for i in {1..200}; do bin/kustomize-v5.2.1 build -o /dev/null issue5422/kustomize-versions-benchmark/tests/3_patches-strategic-merge; done

real    0m18.852s
user    0m16.888s
sys     0m5.430s
$ cd ..
$ go test ./kustomize --run Test_5422_2 -count 1
ok      sigs.k8s.io/kustomize/kustomize/v5      3.041s
$ go test ./kustomize --run Test_5422_3 -count 1
ok      sigs.k8s.io/kustomize/kustomize/v5      3.134s
$ 

Admittedly, I haven't been in this code for a little while, so I hope this testing pattern still is valid.

kustomize/5422_test.go

package main

import (
	"fmt"
	"os"
	"path/filepath"
	"testing"

	"sigs.k8s.io/kustomize/kustomize/v5/commands"
)

func Test_5422_2(t *testing.T) {
	run5422(t, "2_patches-json6902")
}

func Test_5422_3(t *testing.T) {
	run5422(t, "3_patches-strategic-merge")
}

func run5422(t *testing.T, subdir string) {
	rootdir := "/src/dist/issue5422/kustomize-versions-benchmark/tests/"
	outputPath := filepath.Join(rootdir, fmt.Sprintf("/results-%s.yaml", subdir))
	inputPath := filepath.Join(rootdir, subdir)
	os.Args = []string{"kustomize", "build", "-o", outputPath, inputPath}
	for i := 0; i < 200; i++ {
		err := commands.NewDefaultCommand().Execute()
		if err != nil {
			t.Fatalf("Failure: '%s'", err.Error())
		}
	}
}

@shapirus
Copy link
Author

shapirus commented Oct 27, 2023

The effect of the one-time extra cost would be most significant to users that have lots of kustomize invocations against a large number of small builds. If I remember correctly, that's exactly your situation.

It is. Mine is more or less okay now, given all the improvements in the recent releases (although still slower overally than 3.5.4 -- we use a lot of PSMs), but for some people this can be really critical, e.g. #5084.

Your tests are interesting. Yes, they do suggest that it's something like one-time initialization, but they also show that it happens only for test 3 (PSM). In test 2, durations for 200 * 1 and 1 * 200 are pretty close, considering all the overhead of spawning the processes etc., but in test 3 there is a big difference.

Can the changeset in api/filters/patchstrategicmerge/patchstrategicmerge.go in d3a7335b be responsible for this? It seems to be the one that introduced the first performance degradation of PSM. Strictly speaking, that degradation was introduced by the switch from api/v0.4.2 to api/v0.5.0, so it could be any commit between 0.4.2 and 0.5.0. But it looks suspicious enough, especially considering the name of the changed file. That code seems to be still present there (but I don't know exactly what it does).
^ turned out to be the wrong track

@shapirus
Copy link
Author

shapirus commented Oct 27, 2023

I did some profiling, for both the go test and standalone binary execution scenarios.

pprof-compatible files are attached: kustomize-5422-profiles.zip

Standalone profiles were made for the following test cases: 1) PSM; 2) JSON6902

Go test profiles were made for PSM and JSON6902, for both single run and 200 iterations.

One difference that clearly stands out, when the standalone profiles are viewed as graphs (will attach screenshots below), is that in the case of PSM kustomize spends a lot of time (cumulative -- child calls included) in spec (*SwaggerProps) FromGnosticand proto UnmarshalOptions unmarshal, whereas these calls are not present in the json6902's execution profile.

(another interesting hint is that the PSM's profile file is 2.5 times bigger than the json6902s, which suggests that they have quite different execution paths.)

Both of these calls are a part of the following code:

switch {
case format == Proto:
doc := &openapi_v2.Document{}
// We parse protobuf and get an openapi_v2.Document here.
if err := proto.Unmarshal(b, doc); err != nil {
return fmt.Errorf("openapi proto unmarshalling failed: %w", err)
}
// convert the openapi_v2.Document back to Swagger
_, err := swagger.FromGnostic(doc)
if err != nil {
return errors.Wrap(err)
}

Does this ring a bell for you @ephesused? I'm afraid that at this point I finally lack the knowledge required to try and refactor this code, or comment out certain parts to see if something improves performance.

p.s. nevermind the long execution times in the profiles: I intentionally ran kustomize under qemu (arm64 guest, amd64 host) to make it run as slowly as possible to get meaningful timings, because my host machine is too fast to profile a single invocation properly.

PSM:

PSM

JSON6902:

6902

@shapirus
Copy link
Author

shapirus commented Oct 28, 2023

I'm afraid that at this point I finally lack the knowledge required to try and refactor this code, or comment out certain parts to see if something improves performance.

...not quite.

So more experiments :)

I narrowed it down to the following, or I think I did:

l.Schema = l.GetSchema()

If we do, as an experiment, l.Schema = nil there, then PSM becomes as fast as everything else, and the two expensive calls mentioned in the previous comment are no longer made, but it stops working correctly: it replaces instead of merging. JSON6902 patches stop working too.

I wonder if GetSchema() can be optimized. It should be possible. After all, v3.5.4 was 2.5 times faster than v5.2.1 in PSM even without all the performance improvements that came with v4.4.0 and v5.0.0, so it managed to perform this particular job more efficiently.

This situation somewhat reminds me of the work done in #4569 to skip making unnecessary calls to process unknown resource kinds. Maybe something similar can be done for PSMs too.

@chlunde
Copy link
Contributor

chlunde commented Oct 30, 2023

I added a benchmark in #5425 in order to improve performance and detect regressions in the future. I will make safer versions of my enhancements as new PRs for review soon.

@natasha41575
Copy link
Contributor

@shapirus or @ephesused if one of you is available to review #5425 before we assign it to an approver, that would help us out a ton.

@shapirus
Copy link
Author

shapirus commented Dec 7, 2023

...still no ideas on the PatchesStrategicMerge performance issue? It's the only thing that is still slow.

@ephesused
Copy link
Contributor

I haven't had time to investigate anything recently, sorry. If you'd like, I can rebase my work from #5084 (comment) and put it up as a PR if you want to see whether that provides any help. Given your analysis here, though, I think this issue is unrelated to what I am pursuing over in #5084.

While I'd like to dig into this problem, right now I expect the earliest I might have to do so would be the new year.

@shapirus
Copy link
Author

shapirus commented Dec 7, 2023

I haven't had time to investigate anything recently, sorry.

That's fine, no worries. It was more like a keep-alive packet to prevent the overly intelligent bots from auto-closing the issue :).

If you'd like, I can rebase my work from #5084 (comment) and put it up as a PR if you want to see whether that provides any help. Given your analysis here, though, I think this issue is unrelated to what I am pursuing over in #5084.

Yes, please do, if you have time. My benchmark suite accepts PR numbers, in addition to revision hashes and release versions, so if the changes are in a PR, then it'll be very easy to benchmark it against a selected set of versions to see if it makes any difference.

@ephesused
Copy link
Contributor

If you'd like, I can rebase my work from #5084 (comment) and put it up as a PR if you want to see whether that provides any help.

Yes, please do, if you have time.

It's now #5481.

@ephesused
Copy link
Contributor

I took some time to dig around, and there are many places in the strategic merge patch flow where the schema is involved. Adjusting that code for this performance benefit would be a large effort, likely with substantial regression concerns.

I adjusted my runs to reset the schema before each execution, which enabled profiling to line up better with your use case:

	os.Args = []string{"kustomize", "build", "-o", outputPath, inputPath}
	for i := 0; i < b.N; i++ {
		openapi.ResetOpenAPI()
		err := commands.NewDefaultCommand().Execute()
		if err != nil {
			b.Fatalf("Failure: '%s'", err.Error())
		}
	}

That led to a realization that might work if your use cases are well-defined. I don't know how wise this would be - I'm just thinking through possibilities. This approach assumes that the default schema doesn't have any effect on your kustomization output. That's true for the simplified test case. Since the default schema has no effect, adjust kustomization.yaml so it doesn't use the default schema:

kind: Kustomization
apiVersion: kustomize.config.k8s.io/v1beta1
openapi:
  path: "empty.json"

empty.json is exactly what it says:

{}

The empty schema still gets parsed, but even with the cost of the file load, it can speed things up a good bit.

Here's what I get with the default behavior:

$ go test ./kustomize -run nope -bench Benchmark_5422_3 -benchmem -benchtime=5s -count 3
goos: windows
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Benchmark_5422_3-8            81          73811585 ns/op        47365282 B/op     344372 allocs/op
Benchmark_5422_3-8            76          72932484 ns/op        47366434 B/op     344375 allocs/op
Benchmark_5422_3-8            81          73413496 ns/op        47365657 B/op     344373 allocs/op
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5      23.141s

Here's what I get with the empty schema approach:

$ go test ./kustomize -run nope -bench Benchmark_5422_3 -benchmem -benchtime=5s -count 3
goos: windows
goarch: amd64
pkg: sigs.k8s.io/kustomize/kustomize/v5
cpu: Intel(R) Core(TM) i7-7700 CPU @ 3.60GHz
Benchmark_5422_3-8           232          25439710 ns/op          795561 B/op       5249 allocs/op
Benchmark_5422_3-8           236          25534328 ns/op          797795 B/op       5254 allocs/op
Benchmark_5422_3-8           236          25347450 ns/op          797175 B/op       5254 allocs/op
PASS
ok      sigs.k8s.io/kustomize/kustomize/v5      26.341s

@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2024
@shapirus
Copy link
Author

/remove-lifecycle stale

time to rerun the performance tests, it's a good reminder.

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 10, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle stale
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

@k8s-ci-robot k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 8, 2024
@k8s-triage-robot
Copy link

The Kubernetes project currently lacks enough active contributors to adequately respond to all issues.

This bot triages un-triaged issues according to the following rules:

  • After 90d of inactivity, lifecycle/stale is applied
  • After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
  • After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

  • Mark this issue as fresh with /remove-lifecycle rotten
  • Close this issue with /close
  • Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle rotten

@k8s-ci-robot k8s-ci-robot added lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Jul 8, 2024
@shapirus
Copy link
Author

shapirus commented Jul 8, 2024

/remove-lifecycle rotten

not ready yet

@k8s-ci-robot k8s-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Jul 8, 2024
@stormqueen1990
Copy link
Member

/lifecycle frozen

@k8s-ci-robot k8s-ci-robot added the lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. label Jul 9, 2024
@shapirus
Copy link
Author

shapirus commented Jul 9, 2024

All right, here are my fresh benchmark results. To recap, these tests resemble a CD scenario where a tool such as ArgoCD has to run kustomize potentially many hundreds of times to build manifests for multiple applications.

Using kustomize v3.5.4 as a reference point, we can see that there were a number of versions with terribly degraded performance in some or even all test cases, but starting with v5.2.1 almost all of that have been fixed, except the tests which involve strategic merge patches: these are still about 2.5 times slower than in v3.5.4.

There is also a notable improvement in all cases but PSM in the latest versions compared to v3.5.4, which is excellent.

...if only we could fix the strategic merge patches performance hit! That's the only thing left that is still much worse than it was before the performance regressions were first introduced.

(the contents of the test manifests can be found here: https://github.com/shapirus/kustomize-benchmark-suite/tree/master/tests)

Starting kustomize benchmark on Linux x86_64
kustomize versions: 
  3.5.4
  3.7.0
  3.8.0
  1c6481d0
  00f0fd71
  3.8.3
  3.8.6
  4.1.2
  4.3.0
  4.4.0
  4.5.4
  4.5.5
  5.0.0
  5.1.0
  5.2.1
  5.3.0
  5.4.2
iterations per test: 100
tests: 
  1_no-patches
  2_patches-json6902
  3_patches-strategic-merge
  4_no-patches-unknown-kind
  5_component-no-base-no-patches
  6_component-json6902-over-base
  7_component-PSM-over-base
time unit: seconds

             test: 1   test: 2   test: 3   test: 4   test: 5   test: 6   test: 7
    v3.5.4      2.44      2.43      2.45      2.38    (fail)    (fail)    (fail)
    v3.7.0      2.59      2.61      2.59      2.55      3.28      3.88      3.83
    v3.8.0      2.65      2.63     10.41      2.61      3.21      3.82     11.19
  1c6481d0      2.71      2.67      9.75      2.55      3.24      3.81     10.51
  00f0fd71      2.64      2.74     49.70      2.53      3.16      3.71     50.46
    v3.8.3      2.69      2.81     55.67      2.63      3.36      4.00     56.00
    v3.8.6     54.61     54.82     54.46     54.33     54.88     55.55     55.18
    v4.1.2     62.00     61.84     62.06     62.30     62.56     62.77     63.14
    v4.3.0     64.77     64.89     64.95     64.84     65.28     65.72     65.44
    v4.4.0      1.07      1.10     64.65     64.78      1.58      2.23     66.23
    v4.5.4      0.88      0.97     59.09     58.63      1.21      1.79     59.72
    v4.5.5      0.96      1.05      6.67      6.47      1.29      1.75      7.21
    v5.0.0      0.59      0.83      6.12      6.09      1.22      1.62      6.94
    v5.1.0      0.71      0.75      6.09      5.73      1.12      1.58      6.95
    v5.2.1      0.71      0.75      6.07      0.69      1.13      1.61      6.92
    v5.3.0      0.62      0.67      6.20      0.70      0.73      0.92      6.30
    v5.4.2      0.81      0.79      5.91      0.72      0.79      0.96      5.86

Another run: more iterations per test, less versions to test.

Starting kustomize benchmark on Linux x86_64
kustomize versions: 
  3.5.4
  5.2.1
  5.4.2
iterations per test: 500
tests: 
  1_no-patches
  2_patches-json6902
  3_patches-strategic-merge
  4_no-patches-unknown-kind
  5_component-no-base-no-patches
  6_component-json6902-over-base
  7_component-PSM-over-base
time unit: seconds

             test: 1   test: 2   test: 3   test: 4   test: 5   test: 6   test: 7
    v3.5.4     11.93     12.37     12.34     11.72    (fail)    (fail)    (fail)
    v5.2.1      3.48      3.47     31.41      3.42      5.94      8.27     35.48
    v5.4.2      3.76      3.76     31.28      3.41      3.59      4.88     30.73

@shapirus
Copy link
Author

Added the fresh 5.4.3 release to the tests, no changes compared to 5.4.2:

             test: 1   test: 2   test: 3   test: 4   test: 5   test: 6   test: 7
    v3.5.4     12.11     12.38     12.40     12.07    (fail)    (fail)    (fail)
    v5.2.1      3.24      3.38     30.99      3.35      5.71      8.30     36.00
    v5.4.2      3.48      3.67     31.27      3.57      3.61      4.76     30.81
    v5.4.3      3.49      3.48     31.24      3.56      3.76      4.95     31.04

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

7 participants