-
Notifications
You must be signed in to change notification settings - Fork 62
fix: the log-cache instance number can not be adjusted #1310
base: master
Are you sure you want to change the base?
Conversation
Note that log-cache was intentionally removed from the There is some theory that log-cache has a memory leak when running with more than one instance due to the way the caches communicate with each other: cloudfoundry/log-cache-release#29 Later experiments seem to indicate that log-cache doesn't have a leak, but uses the total memory size of the node it is running on to determine the cache size, making it look like it has unbounded memory usage: cloudfoundry/gosigar#48 This issue still seems unresolved, so I think it is pre-mature to allow the instance count to be increased right now. Once the memory configuration question has been resolved, we should first decide if log-cache should be returned to the |
@jandubois thank you for your comment, I don't know the history, and your comment helped me a lot. I think only one log-cache POD is not good for a production environment, and seems like no quick solution for cloudfoundry/gosigar#48 . A workaround for the memory issue could be to set a lower memory percent for log-cache: https://github.com/cloudfoundry/log-cache-release/blob/develop/jobs/log-cache/spec#L35-L37 Another blocking issue for multiple log-cache PODs is #1307 , the log-cache will not be functional when there are multiple log-cache PODs, because their NODE_INDEX is same (0), please see: https://github.com/cloudfoundry/log-cache-release/blob/fa637cd7e2899d58930f893e52b9911cb8672755/src/internal/cache/log_cache.go#L181-L221 Adding below statements to log-cache.yml could make each log-cache POD has different NODE_INDEX, but this is only working for non-multi-az, when multi-az is enabled, the log-cache in different az will have deplicated NODE_INDEX.
|
@JimmyMa We talked about this PR in our planning meeting today and created #1312 to create a PR to fix the issue upstream. If we do decide to merge this PR, it should be changed so that even when Here is a workaround we used before to limit the
This of course requires that all your node have the same total memory size; otherwise you need to use taints and tolerations to limit your log-cache pods to nodes with a uniform size. I think I would prefer to not make any changes until the upstream issue has been fixed; it all feels like a hack otherwise. Once upstream can determine the memory available inside the pod, we can set a memory limit, and everything should start working. Is there a reason we shouldn't then move |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just blocking the PR from getting accidentally merged. See comment above about what we would like to see if we are to merge before the upstream fix is in place.
…control_plane is enabled The multiple-cluster-mode-operations removes '/instance_groups/name=api', so any other operations after multiple-cluster-mode-operations gets error if they try to operate '/instance_groups/name=api'. The multiple-cluster-mode-operations must be the botton of chart/templates/bosh_deployment.yaml
Since I was just looking at a BPM rendering issue, I wanted to leave this here, for whenever we get back to it: The reason Quarks-operator is rendering |
Description
This change makes the log-cache instance number adjustable in user's values.yml file.
Motivation and Context
the log-cache instance number can not be adjusted in user's values.yml file, and it's hard to have multiple log-cache instances.
How Has This Been Tested?
Screenshots (if appropriate):
Types of changes
Checklist: