You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, I am currently working on profiling VLLM and I observed that the tool captures the execution of graph kernels at a high level but does not provide detailed insights into individual graph nodes' execution.
My goal is to obtain detailed profiling information on the execution of individual graph nodes, similar to the capabilities offered by Nvidia Nsight, which allows for tracking nodes instead of just graph-level execution.
I am seeking guidance or a workaround to enable detailed profiling of graph nodes within OmniTrace. Any insights or configuration options?
here is the command I use: omnitrace-run -c ~/.omnitrace.cfg --enable-categories device-critical-trace device_busy device_hip device_hsa device_memory_usage python rocm_hip rocm_hsa rocm_smi rocprofiler roctracer --roctracer-hip-activity --roctracer-hip-api --roctracer-hsa-activity --roctracer-hsa-api -- python -m omnitrace -- vllm_benchmark.py
Thanks in advance.
The text was updated successfully, but these errors were encountered:
OmarSayedMostafa
changed the title
Visualize/trace hip kernels nodes from Launched graph using hipGraphLaunch.
Enabling Detailed Profiling of Graph Nodes in OmniTrace
Apr 9, 2024
Given that the arrows flow from the API functions to multiple kernels, it appears that you are indeed getting the individual graph node execution. The --roctracer-hsa-activity option that you have enables that. You might want to remove the --hip-device-activity option bc that is the “high-level” kernel tracing option and doing both simultaneously might be doing funny things with the connection of the flow events and could also contribute to why none of the kernel function names are getting resolved beyond “Kernel Execution”.
Hi, I am currently working on profiling VLLM and I observed that the tool captures the execution of graph kernels at a high level but does not provide detailed insights into individual graph nodes' execution.
My goal is to obtain detailed profiling information on the execution of individual graph nodes, similar to the capabilities offered by Nvidia Nsight, which allows for tracking nodes instead of just graph-level execution.
I am seeking guidance or a workaround to enable detailed profiling of graph nodes within OmniTrace. Any insights or configuration options?
here is the command I use:
omnitrace-run -c ~/.omnitrace.cfg --enable-categories device-critical-trace device_busy device_hip device_hsa device_memory_usage python rocm_hip rocm_hsa rocm_smi rocprofiler roctracer --roctracer-hip-activity --roctracer-hip-api --roctracer-hsa-activity --roctracer-hsa-api -- python -m omnitrace -- vllm_benchmark.py
Thanks in advance.
The text was updated successfully, but these errors were encountered: