Improving installation of user MLA stack #133
Labels
ee
Epic
priority/high
sig/app-management
Denotes a PR or issue as being assigned to SIG App Management.
sig/cluster-management
Denotes a PR or issue as being assigned to SIG Cluster Management.
Reference: kubermatic/ps-team-flotilla#103
@stroebitzer commented on Wed Jul 06 2022
On working on the KKP Admin training I stumbled from one issue to the next on installing the User MLA stack into my KKP installation.
The current way of installing it is some kind of Alpha version. For providing a smooth experience to our customers we should enhance the installation process.
Maybe changing the way of installing stuff from some
hack/deploy-seed.sh
script towards our kubermatic-installer could be an option.This ticket is about:
@talhalatiforakzai commented on Thu Jul 14 2022
Issues with installation of user mla
while deploying MLA stack through the helper script
This issue arrises with yq version 4.25.2 and to fix this edit line no 31 and 35 in
hack/fetch-chart-dependencies.sh
line 31:
chartname=$(yq read "$chartYAML" name)
intochartname=$(yq '.name' "$chartYAML")
line 35:
for url in $(yq r "$chartYAML" dependencies --tojson | jq -r .[].repository); do
intofor url in $(yq '.dependencies.[].repository' "$chartYAML"); do
Partial installation of MLA stack incase of limited resources
MLA stack partially fails due to resource limitation due to which other resources that are dependent on them fails to start. Cleanup the installation and provision resources before retrying, maybe we can update the deploy script to check for resources availabiity before provisioning MLA stack.
MLA stack causes other workloads to crash & restart
If MLA stack is not installed on dedicated machine deployments then it causes other worloads to run out of mem/cpu, for this reason user should be informed and asked to use seperate MD with minimum specs to avoid any issues.
Pods are not scheduled on nodes provisioned specifically for user mla
I have created a machine deployment for user mla, so that all the workloads related to user mla are scheduled on these nodes, but for some reason all the other workloads gets scheduled fine except for
MD Values
MLA Values
Quick fix is that you should move these things outside of cortex context for nodeselector and toleration
Consul chart fails to install incase of no default storage
The pods are in pending stage and when we describe pvc it shows no persistent volumes available for this claim and no storage class is set , basically when default storage is not set/applied on any storage class the consul chart rolls back the installation.
example solution
The text was updated successfully, but these errors were encountered: