From e4036e13dfbaa75465efc12d207930a396e9b6d4 Mon Sep 17 00:00:00 2001 From: Ashwini Ahire <124853365+ashwini-ahire7@users.noreply.github.com> Date: Sat, 21 Sep 2024 16:22:25 +0800 Subject: [PATCH] Update trace_log.md added description --- .../trace_log.md | 84 ++++++++++++------- 1 file changed, 53 insertions(+), 31 deletions(-) diff --git a/content/en/altinity-kb-queries-and-syntax/trace_log.md b/content/en/altinity-kb-queries-and-syntax/trace_log.md index 3ed147b60e..3d8f1bafee 100644 --- a/content/en/altinity-kb-queries-and-syntax/trace_log.md +++ b/content/en/altinity-kb-queries-and-syntax/trace_log.md @@ -8,25 +8,25 @@ description: >- ## Collecting query execution flamegraph using system.trace_log -ClickHouse® has embedded functionality to analyze the details of query performance. +ClickHouse® provides an embedded functionality to analyze query performance in detail through the system.trace_log table. -It's `system.trace_log` table. +By default, this table collects information about queries that run longer than 1 second, capturing stack traces every second. However, you can customize the trace collection using the following settings: -By default it collects information only about queries when runs longer than 1 sec (and collects stacktraces every second). +query_profiler_real_time_period_ns: Adjusts the interval for collecting stack traces based on real time. +query_profiler_cpu_time_period_ns: Adjusts the interval based on CPU time. +Both settings work similarly by dumping stack traces of all threads running the query at the specified intervals. The real-time profiler helps in identifying situations where CPU usage is low but significant time is spent on I/O, while the CPU timer pinpoints hot spots in calculations more accurately by skipping I/O time. -You can adjust that per query using settings `query_profiler_real_time_period_ns` & `query_profiler_cpu_time_period_ns`. +Note: Attempting to collect stack traces at frequencies higher than a few KHz is typically not feasible. -Both works very similar (with desired interval dump the stacktraces of all the threads which execute the query). -real timer - allows to 'see' the situations when cpu was not working much, but time was spend for example on IO. -cpu timer - allows to see the 'hot' points in calculations more accurately (skip the io time). - -Trying to collect stacktraces with a frequency higher than few KHz is usually not possible. - -To check where most of the RAM is used you can collect stacktraces during memory allocations / deallocation, by using the -setting `memory_profiler_sample_probability`. +#### Memory Profiling +To examine RAM usage, stack traces can be collected during memory allocations and deallocations using the memory_profiler_sample_probability setting. +## Tools for Analyzing Stack Traces ### clickhouse-speedscope +This tool processes the collected stack traces for analysis. + +Installation: ```bash # install @@ -37,56 +37,78 @@ pip3 install -r requirements.txt ``` For debugging particular query: -``` +1. In the clickhouse-client, set your desired profiler settings: +```sql clickhouse-client SET query_profiler_cpu_time_period_ns=1000000; -- 1000 times per 'cpu' sec --- or SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec. --- or SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations - +-- or +SET query_profiler_real_time_period_ns=2000000; -- 500 times per 'real' sec. +-- or +SET memory_profiler_sample_probability=0.1; -- to debug the memory allocations +``` +2. Execute your query: +```sql SELECT ... - +``` +3. Flush the logs: +```sql SYSTEM FLUSH LOGS; - --- get the query_id from the clickhouse-client output or from system.query_log (also pay attention on query_id vs initial_query_id for distributed queries). ``` +4. Retrieve the query_id from either the clickhouse-client output or from system.query_log (note the difference between query_id and initial_query_id for distributed queries). +5. Process the stack traces: -Now let's process that: -``` +```bash python3 main.py & # start the proxy in background python3 main.py --query-id 908952ee-71a8-48a4-84d5-f4db92d45a5d # process the stacktraces fg # get the proxy from background Ctrl + C # stop it. ``` - -To access ClickHouse with other username / password etc. - see the sources of https://github.com/laplab/clickhouse-speedscope/blob/master/main.py - +For additional configuration options, refer to the source of the tool: clickhouse-speedscope https://github.com/laplab/clickhouse-speedscope/blob/master/main.py ### clickhouse-flamegraph -Installation & usage instructions: https://github.com/Slach/clickhouse-flamegraph +For more advanced visualizations, clickhouse-flamegraph can be used. Installation and usage instructions are available on its GitHub page https://github.com/Slach/clickhouse-flamegraph -### pure flamegraph.pl examples +Example for Generating a Flamegraph: -``` +1. Install FlameGraph: +```bash git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph - +``` +2. Generate a flamegraph from the system.trace_log: +```bash clickhouse-client -q "SELECT arrayStringConcat(arrayReverse(arrayMap(x -> concat( addressToLine(x), '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log WHERE event_time >= subtractMinutes(now(),10) GROUP BY trace FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > flamegraph.svg clickhouse-client -q "SELECT arrayStringConcat((arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, sum(abs(size)) AS samples FROM system.trace_log where trace_type = 'Memory' and event_date = today() group by trace order by samples desc FORMAT TabSeparated" | /opt/flamegraph/flamegraph.pl > allocs.svg +``` +3. For memory allocation analysis: +```bash clickhouse-client -q "SELECT arrayStringConcat(arrayReverse(arrayMap(x -> concat(splitByChar('/', addressToLine(x))[-1], '#', demangle(addressToSymbol(x)) ), trace)), ';') AS stack, count() AS samples FROM system.trace_log where trace_type = 'Memory' group by trace FORMAT TabSeparated SETTINGS allow_introspection_functions=1" | /opt/flamegraph/flamegraph.pl > ~/mem1.svg ``` -### similar using perf -``` + +### Using perf + +Using perf for System-Wide Profiling +perf can be used for system-wide profiling, particularly for capturing CPU usage: + +1. Install necessary tools: +```bash apt-get update -y apt-get install -y linux-tools-common linux-tools-generic linux-tools-`uname -r`git apt-get install -y clickhouse-common-static-dbg clickhouse-common-dbg mkdir -p /opt/flamegraph git clone https://github.com/brendangregg/FlameGraph /opt/flamegraph - +``` +2. Record the performance data: +```bash perf record -F 99 -p $(pidof clickhouse) -G +``` +3. Process the performance data: + +```bash perf script > /tmp/out.perf /opt/flamegraph/stackcollapse-perf.pl /tmp/out.perf | /opt/flamegraph/flamegraph.pl > /tmp/flamegraph.svg ```