-
Notifications
You must be signed in to change notification settings - Fork 309
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Procdump -c does not work in k8s #240
Comments
Hi - thanks for the feedback. I wrote a post on this a while back. Let me know if that helps answer your question and if not, please don't hesitate to reach back out. |
I have deployed my pod following the instructions in this post (https://medium.com/@marioh_78322/sysinternals-procdump-for-linux-and-cloud-native-applications-404d0351f1ea) and monitored my process using WORKDIR /app ENTRYPOINT ["./start.sh"]` start.sh: |
Thanks for the detailed information. Could you add the -log switch to the procdump command line? This will send extended logging to syslog. Please share the procdump related log entries (there can be quite a few). |
I discovered while reading the code that the CPU usage is obtained and calculated from /proc/[pid]/stat. However, in a docker environment, the CPU usage obtained here is relative to the CPU of the actual host machine, which is not very meaningful for program monitoring. We would rather obtain the CPU usage relative to this docker container. I found a method to obtain the CPU usage in a docker container by reading this article [https://chengdol.github.io/2021/09/19/k8s-container-mem-cpu/], and I have written a shell script based on it for reference. #!/bin/bash
while true; do
# get dotnet process id
pid=$DOTNET_PID
# get dotnet process cgroup path
cgroup_path=/proc/$pid/root/sys/fs/cgroup
# check if cgroup path exists
if [ ! -d $cgroup_path ]; then
sleep 1
continue
fi
# cpu, cpuacct dir are softlinks
# cpuacct.stat:
# Reports the total CPU time in nanoseconds
# spent in user and system mode by all tasks in the cgroup.
utime_start=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep user | awk '{print $2}')
stime_start=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep system | awk '{print $2}')
sleep 1
utime_end=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep user | awk '{print $2}')
stime_end=$(cat $cgroup_path/cpu,cpuacct/cpuacct.stat| grep system | awk '{print $2}')
# getconf CLK_TCK aka sysconf(_SC_CLK_TCK) returns USER_HZ
# aka CLOCKS_PER_SEC which seems to be always
# 100 independent of the kernel configuration.
HZ=$(getconf CLK_TCK)
# get cpu core number
cfs_quota_us=$(cat $cgroup_path/cpu/cpu.cfs_quota_us)
cfs_period_us=$(cat $cgroup_path/cpu/cpu.cfs_period_us)
cpu_core_num=$((cfs_quota_us/cfs_period_us))
# get container cpu usage
# on top of user/system cpu time
cpu_percent=$(( (utime_end+stime_end-utime_start-stime_start)*100/HZ/cpu_core_num ))
# memory in Mib: used - inactive(cache)
used=$(cat $cgroup_path/memory/memory.usage_in_bytes)
inactive=$(grep -w inactive_file $cgroup_path/memory/memory.stat | awk {'print $2'})
# numfmt: readable format
mem_usage=$(cat $cgroup_path/memory/memory.usage_in_bytes)
total_mem=$(cat $cgroup_path/memory/memory.limit_in_bytes)
# local memory info
local_mem_usage=$(cat /sys/fs/cgroup/memory/memory.usage_in_bytes)
local_total_mem=$(cat /sys/fs/cgroup/memory/memory.limit_in_bytes)
mem_percent=$(echo "scale=2; ($mem_usage + $local_mem_usage) * 100 / ($total_mem + $local_total_mem)" | bc)
if (( $(echo "$cpu_percent > $CPU_THRESHOLD" | bc -l) )) || (( $(echo "$mem_percent > $MEM_THRESHOLD" | bc -l) )); then
if [ ! -f "/app/create_dump.lock" ];then
echo $cpu_percent $mem_percent
echo $(($used)) | numfmt --to=iec
echo $(($total_mem)) | numfmt --to=iec
./procdump -pgid $pid /app/dump
touch /app/create_dump.lock
fi
fi
done |
Expected behavior
In a Kubernetes environment, when using procdump with the command 'procdump -c 10 -s 1 -w XXX', it doesn't generate a dump file when the CPU usage of the pod exceeds 10%. This might be because procdump monitors the CPU usage of the host machine instead of the pod itself. Could you consider adding monitoring for the pod's CPU and memory usage in future versions? It would greatly assist in troubleshooting .NET applications in Kubernetes.
System information (e.g., distro, kernel version, etc.)
pod docker image based on mcr.microsoft.com/dotnet/aspnet:7.0-bullseye-slim-amd64.
The text was updated successfully, but these errors were encountered: