Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Completely disable typeCache in getTypeFromSchema #2646

Closed
wants to merge 1 commit into from

Conversation

chrisseto
Copy link

Description

In 0f0be85, the usage typeCache was disabled in getTypeFromSchema due to schema corruption issues that have yet to be resolved.

Despite being disabled elem is still hashed and populated into the typeCache resulting in a dramatic waste of memory and compute.

This commit comments out all references of typeCache in getTypeFromSchema.

Tests on v2.30.0 with a dramatically large CRD (~4.0Mb) show a ~50% decrease in memory consumption:

❯ go tool pprof  memprof-no-cache
Showing nodes accounting for 177.72MB, 83.13% of 213.79MB total
Dropped 113 nodes (cum <= 1.07MB)
Showing top 10 nodes out of 133
      flat  flat%   sum%        cum   cum%
   47.75MB 22.34% 22.34%    48.26MB 22.57%  io.ReadAll
   31.37MB 14.67% 37.01%    31.37MB 14.67%  encoding/json.(*RawMessage).UnmarshalJSON
   24.51MB 11.46% 48.47%    37.01MB 17.31%  sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).objectInterface
      18MB  8.42% 56.89%    90.97MB 42.55%  encoding/json.Unmarshal
   15.51MB  7.25% 64.15%    15.51MB  7.25%  reflect.mapassign_faststr0
   14.50MB  6.78% 70.93%    14.50MB  6.78%  reflect.New
      10MB  4.68% 75.61%       10MB  4.68%  compress/flate.(*huffmanDecoder).init
    6.50MB  3.04% 78.65%    10.50MB  4.91%  sigs.k8s.io/json/internal/golang/encoding/json.unquote (inline)
       5MB  2.34% 80.99%        5MB  2.34%  encoding/json.(*scanner).pushParseState
    4.57MB  2.14% 83.13%     4.57MB  2.14%  net/http.init.func5

❯ go tool pprof  memprof-with-cache
File: acceptance.test
Type: alloc_space
Time: Dec 11, 2024 at 4:18pm (EST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 443.30MB, 89.03% of 497.90MB total
Dropped 150 nodes (cum <= 2.49MB)
Showing top 10 nodes out of 114
      flat  flat%   sum%        cum   cum%
  178.51MB 35.85% 35.85%   272.01MB 54.63%  github.com/mitchellh/hashstructure.(*walker).visit
   53.42MB 10.73% 46.58%    53.42MB 10.73%  io.ReadAll
      45MB  9.04% 55.62%       45MB  9.04%  encoding/binary.Write
   37.34MB  7.50% 63.12%    37.34MB  7.50%  encoding/json.(*RawMessage).UnmarshalJSON
   26.51MB  5.32% 68.44%    37.01MB  7.43%  sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).objectInterface
   25.50MB  5.12% 73.56%    25.50MB  5.12%  reflect.(*structType).Field
      23MB  4.62% 78.18%   108.49MB 21.79%  encoding/json.Unmarshal
   21.01MB  4.22% 82.40%    21.01MB  4.22%  reflect.packEface
   20.01MB  4.02% 86.42%    20.01MB  4.02%  reflect.mapassign_faststr0
      13MB  2.61% 89.03%       13MB  2.61%  reflect.New

Acceptance tests

The cache itself appears to be write only, I'm happy to run the acceptance tests but there should be any functional changes.

  • Have you added an acceptance test for the functionality being added?
  • Have you run the acceptance tests on this branch?

Output from acceptance testing:

$ make testacc TESTARGS='-run=TestAccXXX'

...

Release Note

Release note for CHANGELOG:

Reduced memory consumption of `kubernetes_manifest` by up to 50%

References

This was discovered by my team while digging into some crashes in our integration tests following an, admittedly massive, update of our redpanda CRD

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

In 0f0be85, the usage `typeCache` was
disabled in `getTypeFromSchema` due to schema corruption issues that
have yet to be resolved.

Despite being disabled `elem` is still hashed and populated into the
`typeCache` resulting in a dramatic waste of memory and compute.

This commit comments out all references of `typeCache` in
`getTypeFromSchema`.

Tests on v2.30.0 with a dramatically large CRD (~4.0Mb) show a ~50%
decrease in memory consumption:

```
❯ go tool pprof  memprof-no-cache
Showing nodes accounting for 177.72MB, 83.13% of 213.79MB total
Dropped 113 nodes (cum <= 1.07MB)
Showing top 10 nodes out of 133
      flat  flat%   sum%        cum   cum%
   47.75MB 22.34% 22.34%    48.26MB 22.57%  io.ReadAll
   31.37MB 14.67% 37.01%    31.37MB 14.67%  encoding/json.(*RawMessage).UnmarshalJSON
   24.51MB 11.46% 48.47%    37.01MB 17.31%  sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).objectInterface
      18MB  8.42% 56.89%    90.97MB 42.55%  encoding/json.Unmarshal
   15.51MB  7.25% 64.15%    15.51MB  7.25%  reflect.mapassign_faststr0
   14.50MB  6.78% 70.93%    14.50MB  6.78%  reflect.New
      10MB  4.68% 75.61%       10MB  4.68%  compress/flate.(*huffmanDecoder).init
    6.50MB  3.04% 78.65%    10.50MB  4.91%  sigs.k8s.io/json/internal/golang/encoding/json.unquote (inline)
       5MB  2.34% 80.99%        5MB  2.34%  encoding/json.(*scanner).pushParseState
    4.57MB  2.14% 83.13%     4.57MB  2.14%  net/http.init.func5

❯ go tool pprof  memprof-with-cache
File: acceptance.test
Type: alloc_space
Time: Dec 11, 2024 at 4:18pm (EST)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 443.30MB, 89.03% of 497.90MB total
Dropped 150 nodes (cum <= 2.49MB)
Showing top 10 nodes out of 114
      flat  flat%   sum%        cum   cum%
  178.51MB 35.85% 35.85%   272.01MB 54.63%  github.com/mitchellh/hashstructure.(*walker).visit
   53.42MB 10.73% 46.58%    53.42MB 10.73%  io.ReadAll
      45MB  9.04% 55.62%       45MB  9.04%  encoding/binary.Write
   37.34MB  7.50% 63.12%    37.34MB  7.50%  encoding/json.(*RawMessage).UnmarshalJSON
   26.51MB  5.32% 68.44%    37.01MB  7.43%  sigs.k8s.io/json/internal/golang/encoding/json.(*decodeState).objectInterface
   25.50MB  5.12% 73.56%    25.50MB  5.12%  reflect.(*structType).Field
      23MB  4.62% 78.18%   108.49MB 21.79%  encoding/json.Unmarshal
   21.01MB  4.22% 82.40%    21.01MB  4.22%  reflect.packEface
   20.01MB  4.02% 86.42%    20.01MB  4.02%  reflect.mapassign_faststr0
      13MB  2.61% 89.03%       13MB  2.61%  reflect.New
```
@chrisseto chrisseto requested a review from a team as a code owner December 11, 2024 21:37
Copy link

CLA assistant check

Thank you for your submission! We require that all contributors sign our Contributor License Agreement ("CLA") before we can accept the contribution. Read and sign the agreement

Learn more about why HashiCorp requires a CLA and what the CLA includes

Have you signed the CLA already but the status is still pending? Recheck it.

@chrisseto
Copy link
Author

I'm a wee bit skeptical about my 50% claim from the memory profile alone. Turns out I'm not as proficient with go tool pprof as I thought. I need to do a bit more digging to try and track down the actual peak memory usage. At first glance this did align with our observations in tests but it's not adding up that we'd see memory usage from the hashstructure library as that should be GC'd relatively quickly.

I'm very open to pointers if anyone has some to share.

@chrisseto
Copy link
Author

After further analysis, it seems the memory bloat/leak/whathaveyou is stemming from encoding the return value ofmanifest_decode_multi.
Screenshot 2024-12-12 at 17 40 00

I don't yet have a solution for that but this PR certainly doesn't solve the problem.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant