From e4036e13dfbaa75465efc12d207930a396e9b6d4 Mon Sep 17 00:00:00 2001
From: Ashwini Ahire <124853365+ashwini-ahire7@users.noreply.github.com>
Date: Sat, 21 Sep 2024 16:22:25 +0800
Subject: [PATCH] Update trace_log.md

added description
---
 .../trace_log.md                              | 84 ++++++++++++-------
 1 file changed, 53 insertions(+), 31 deletions(-)
diff --git a/content/en/altinity-kb-queries-and-syntax/trace_log.md b/content/en/altinity-kb-queries-and-syntax/trace_log.md
index 3ed147b60e..3d8f1bafee 100644
--- a/content/en/altinity-kb-queries-and-syntax/trace_log.md
+++ b/content/en/altinity-kb-queries-and-syntax/trace_log.md
@@ -8,25 +8,25 @@ description: >-
 
 ## Collecting query execution flamegraph using system.trace_log
 
-ClickHouse® has embedded functionality to analyze the details of query performance.
+ClickHouse® provides an embedded functionality to analyze query performance in detail through the system.trace_log table.
 
-It's `system.trace_log` table.
+By default, this table collects information about queries that run longer than 1 second, capturing stack traces every second. However, you can customize the trace collection using the following settings:
 
-By default it collects information only about queries when runs longer than 1 sec (and collects stacktraces every second).
+query_profiler_real_time_period_ns: Adjusts the interval for collecting stack traces based on real time.
+query_profiler_cpu_time_period_ns: Adjusts the interval based on CPU time.
+Both settings work similarly by dumping stack traces of all threads running the query at the specified intervals. The real-time profiler helps in identifying situations where CPU usage is low but significant time is spent on I/O, while the CPU timer pinpoints hot spots in calculations more accurately by skipping I/O time.
 
-You can adjust that per query using settings `query_profiler_real_time_period_ns` & `query_profiler_cpu_time_period_ns`.
+Note: Attempting to collect stack traces at frequencies higher than a few KHz is typically not feasible.
 
-Both works very similar (with desired interval dump the stacktraces of all the threads which execute the query).
-real timer - allows to 'see' the situations when cpu was not working much, but time was spend for example on IO.
-cpu timer - allows to see the 'hot' points in calculations more accurately (skip the io time).
-
-Trying to collect stacktraces with a frequency higher than few KHz is usually not possible. 
-
-To check where most of the RAM is used you can collect stacktraces during memory allocations / deallocation, by using the
-setting `memory_profiler_sample_probability`.
+####  Memory Profiling
+To examine RAM usage, stack traces can be collected during memory allocations and deallocations using the memory_profiler_sample_probability setting.
 
+## Tools for Analyzing Stack Traces
 
 ### clickhouse-speedscope
+This tool processes the collected stack traces for analysis.
+
+Installation:
 
 ```bash 
 # install 
@@ -37,56 +37,78 @@ pip3 install -r requirements.txt
 ```
 
 For debugging particular query:
-```
+1. In the clickhouse-client, set your desired profiler settings:
+```sql
 clickhouse-client 
 
 SET query_profiler_cpu_time_period_ns=1000000; -- 1000 times per 'cpu' sec
--- or SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec.
--- or SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations
-
+-- or
+SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec.
+-- or
+SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations
+```
+2. Execute your query:
+```sql
 SELECT ... <your select>
-
+```
+3. Flush the logs:
+```sql
 SYSTEM FLUSH LOGS;
-
--- get the query_id from the clickhouse-client output or from system.query_log (also pay attention on query_id vs initial_query_id for distributed queries).
 ```
+4. Retrieve the query_id from either the clickhouse-client output or from system.query_log (note the difference between query_id and initial_query_id for distributed queries).
+5. Process the stack traces:
 
-Now let's process that: 
-```
+```bash
 python3 main.py &  # start the proxy in background
 python3 main.py --query-id 908952ee-71a8-48a4-84d5-f4db92d45a5d # process the stacktraces
 fg # get the proxy from background 
 Ctrl + C  # stop it.
 ```
-
-To access ClickHouse with other username / password etc. - see the sources of https://github.com/laplab/clickhouse-speedscope/blob/master/main.py
-
+For additional configuration options, refer to the source of the tool: clickhouse-speedscope https://github.com/laplab/clickhouse-speedscope/blob/master/main.py
 
 ### clickhouse-flamegraph
 
-Installation & usage instructions: https://github.com/Slach/clickhouse-flamegraph
+For more advanced visualizations, clickhouse-flamegraph can be used. Installation and usage instructions are available on its GitHub page https://github.com/Slach/clickhouse-flamegraph
 
-### pure flamegraph.pl examples
+Example for Generating a Flamegraph:
 
-```
+1. Install FlameGraph:
+```bash
 git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph
-
+```
+2. Generate a flamegraph from the system.trace_log:
+```bash
 clickhouse-client -q "SELECT  arrayStringConcat(arrayReverse(arrayMap(x -> concat( addressToLine(x), '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log WHERE event_time >= subtractMinutes(now(),10) GROUP BY trace FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > flamegraph.svg
 
 clickhouse-client -q "SELECT  arrayStringConcat((arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, sum(abs(size)) AS samples FROM system.trace_log where trace_type = 'Memory' and event_date = today() group by trace order by samples desc FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > allocs.svg
+```
+3. For memory allocation analysis:
+```bash
 clickhouse-client -q "SELECT  arrayStringConcat(arrayReverse(arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log where trace_type = 'Memory' group by trace FORMAT TabSeparated SETTINGS allow_introspection_functions=1" | /opt/flamegraph/flamegraph.pl > ~/mem1.svg
 ```
 
-### similar using perf 
 
-```
+
+### Using perf 
+
+Using perf for System-Wide Profiling
+perf can be used for system-wide profiling, particularly for capturing CPU usage:
+
+1. Install necessary tools:
+```bash
 apt-get update -y 
 apt-get install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`git
 apt-get install -y clickhouse-common-static-dbg clickhouse-common-dbg
 mkdir -p /opt/flamegraph
 git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph
-
+```
+2. Record the performance data:
+```bash
 perf record -F 99 -p $(pidof clickhouse) -G
+```
+3. Process the performance data:
+
+```bash
 perf script > /tmp/out.perf
 /opt/flamegraph/stackcollapse-perf.pl /tmp/out.perf | /opt/flamegraph/flamegraph.pl > /tmp/flamegraph.svg
 ```