Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Analysis report block based filtering for profiling #566

Merged
merged 9 commits into from
Mar 10, 2025

Conversation

vedithal-amd
Copy link
Contributor

@vedithal-amd vedithal-amd commented Feb 14, 2025

Profiling mode changes

  • -b option now additionally accepts metric id(s), similar to -b option in analyze mode (e.g. 6, 6.2, 6.23)
    • Only counters mentioned in the selected analysis report blocks will be collected
      • Add parsing logic to identify hardware counters from analysis report blocks
      • Add filtering logic to only write filtered counters in perfmon files
      • Log not collected counters in one line
  • --list-metrics option added in profile mode to list possible metric id(s) similar to analyze mode
  • Write arguments provided during profiling in profiling_configuration.yaml file

Analysis mode changes

  • During analysis mode, only show report blocks selected during profiling
    • If -b option is provided in analysis mode, then follow provided filters
  • Do not show empty tables in analysis report

Miscellaneous changes

  • Update CHANGELOG
  • Add test cases
    • Instruction mix report block filter
    • Instruction mix and memory chart report block filter
    • Instruction mix report block filter and CPC hardware block filter
    • TA hardware block filter
    • --list-metrics in profile mode should work
  • Move binary handler fixtures to conftest.py to avoid importing
    fixtures

Public documentation changes

  • Use the term "Hardware report block" instead of "Hardware block"
  • Add documentation for "--list-metrics" option in profile mode
  • Add example of filtering by hardware report block such as instruction
    mix and wavefront launch statistics
  • Add deprecation warning for hardware component (sq, tcc) based filtering

@vedithal-amd vedithal-amd self-assigned this Feb 14, 2025
@vedithal-amd vedithal-amd changed the title Selective counter collection Analysis report-based filters for profiling Feb 14, 2025
@vedithal-amd vedithal-amd changed the title Analysis report-based filters for profiling Analysis section-based filters for profiling Feb 14, 2025
@vedithal-amd vedithal-amd changed the title Analysis section-based filters for profiling Analysis section based filters for profiling Feb 14, 2025
@feizheng10
Copy link
Contributor

@skyreflectedinmirrors , @gsitaram, please help to review the new profiling option:
--section ["SOL", "MEMCHART", "WAVEFRONT", "INSTMIX"]
This was a ask to match NV having. You know we have similar option -b, which is more HW oriented.
My question would be: should we keep --section, -b separate or not?

Copy link

@gsitaram gsitaram left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since there are quite a lot of changes, we must test the tool thoroughly before deployment. Submitted some comments from our team discussion earlier for now.

@gsitaram
Copy link

@skyreflectedinmirrors , @gsitaram, please help to review the new profiling option: --section ["SOL", "MEMCHART", "WAVEFRONT", "INSTMIX"] This was a ask to match NV having. You know we have similar option -b, which is more HW oriented. My question would be: should we keep --section, -b separate or not?

I feel that we should keep -b because it has been recommended to our customers as a helpful tool for faster analysis, and the sections do not cover the entire gamut of metrics that the tool collects.

@vedithal-amd
Copy link
Contributor Author

vedithal-amd commented Feb 21, 2025

Me and @feizheng10 had a discussion about this feature today...

Currently the way '-b' or '--block' option works in 'analyze' and 'profile' is different as shown below

$ build/rocprof-compute.bin analyze --help | grep -i "\--block"
  -b  [ ...], --block  [ ...]                   Specify hardware block/metric id(s) from --list-metrics for filtering.
$ build/rocprof-compute.bin profile --help | grep -i -A5 "\--block"
  -b  [ ...], --block  [ ...]                           Hardware block filtering:
                                                           SQ
                                                           SQC
                                                           TA
                                                           TD
                                                           TCP

The former filters the analysis report based on 'report block' such as 'System Speed of Light', 'Memory Chart', 'Wavefront Launch statistics' etc..

The latter filters the profiling operation based on hardware IP blocks such as TA, TCP, SQ etc...

This behavior is inconsistent, and we would like to remove 'hardware IP block' based filtering in 'profile' mode in favor of 'report block' based filtering. The former is less useful for kernel developers/profilers as there is no one to one correspondence between hardware IP block and analysis report blocks. For example, filtering by only TCP (L1 cache) or TCC (L2 cache) will affect 'System Speed of Light', 'Memory Chart', 'Instruction Cache' report blocks.

Both methods of filtering will save up on profiling time, so we are not losing up on that here.

We are thinking of supporting all 19 yaml files for report blocks using the '-b' option during 'profile' mode (instead of specifying hardware IP block). Users can filter based on multiple report blocks and sub-blocks using block numbers (instead of ambiguous acronyms), for example, 'rocprof-compute profile -b 4, 4.5, 5, 5.6'

To get the report block numbers corresponding to report block titles, you can use the '--list-metrics' options during 'analyze' mode. We want to replicate this in 'profile' mode, such that, users can grep for the report block title name and obtain the report block numbers to be used for filtering.

For example:

$ build/rocprof-compute.bin analyze -p tests/workloads/device_filter/MI200 --list-metrics gfx90a | grep -i "instruction mix"
10 -> Compute Units - Instruction Mix
        10.1 -> Overall Instruction Mix

--list-metrics will take an optional argument for GPU GFX architecture since report blocks maybe different per architecture. If no argument is provided, it will be automatically detected using 'rocm-smi' tool.

To summarize:

  • Update '-b' option in profile mode to take as argument, report block/sub-block numbers instead of hardware IP block names
  • Add '--list-metrics' option in profile mode to list all available report blocks/sub-blocks for the given GFX architecture, mention both block numbers and block title names

NOTE that this will break backward compatibility in the sense that '-b' option in profile mode will work differently.

@gsitaram, @skyreflectedinmirrors, could you please provide your comments on the above implementation suggestions.

@gsitaram
Copy link

I like the idea of unifying what -b means in profile and analyze modes. But this change would break backward compatibility for the profile mode as -b will now mean something different. We have to think of providing appropriate warnings for a few releases for those users who may have hardcoded their commands into their scripts, etc.

@vedithal-amd
Copy link
Contributor Author

I like the idea of unifying what -b means in profile and analyze modes. But this change would break backward compatibility for the profile mode as -b will now mean something different. We have to think of providing appropriate warnings for a few releases for those users who may have hardcoded their commands into their scripts, etc.

Sure, we will add a warning upon usage of -b options for a few releases to warn people of change in functionality.

One thing to note, in this PR, I have updated profile mode to dump the profiling filters in the workload folder so that when analyze mode is run it will only show the report blocks that have been filtered during profiling. If you want to see other report blocks you will explicitly have to mention them in the -b filter during analyze mode. Hope that is OK. This makes logical sense to me as other report blocks might have empty values due to required counters not being collected.

@skyreflectedinmirrors
Copy link
Contributor

I like the idea, but I think this would whole concept will need accompanying docs updates. A few specific comments:

Users can filter based on multiple report blocks and sub-blocks using block numbers (instead of ambiguous acronyms), for example, 'rocprof-compute profile -b 4, 4.5, 5, 5.6'

One question there: does 4.5 match 14.5 and 4.5 (e.g.)? I.e., is this an exact match, or a regex search, etc.?

To get the report block numbers corresponding to report block titles, you can use the '--list-metrics' options during 'analyze' mode. We want to replicate this in 'profile' mode, such that, users can grep for the report block title name and obtain the report block numbers to be used for filtering.

I would like to see what that looks like, but I like the general concept.

Sure, we will add a warning upon usage of -b options for a few releases to warn people of change in functionality.

Instead of changing the default, I'd suggest you simply expand the list of choices (

choices=["SQ", "SQC", "TA", "TD", "TCP", "TCC", "SPI", "CPC", "CPF"],
) to include the table numbers, and after parsing, warn users if they are still passing in 'SQ' / 'TCC', etc. If they are, maintain the old code-path for filtering metrics for 2-3 releases, and if not, go down the new code path (and probably disallow mixes?)

That way you don't break anyone's existing workflow, while also ensuring anyone using this option will see the warning.

@vedithal-amd
Copy link
Contributor Author

vedithal-amd commented Feb 25, 2025

One question there: does 4.5 match 14.5 and 4.5 (e.g.)? I.e., is this an exact match, or a regex search, etc.?
In profile mode:

It is going to be an exact match not regex. Default is filter for all report blocks and IP blocks. If you want to filter for all report blocks but one, you need to specify all but one on cmdline. I think adding regex will be confusing for developer and user even though it is more flexible. Analyze mode also does exact match.

I would like to see what that looks like, but I like the general concept.

It would look like this like I mentioned above :)

$ build/rocprof-compute.bin analyze -p tests/workloads/device_filter/MI200 --list-metrics gfx90a | grep -i "instruction mix"
10 -> Compute Units - Instruction Mix
        10.1 -> Overall Instruction Mix

That way you don't break anyone's existing workflow, while also ensuring anyone using this option will see the warning.

I like the idea of phased deprecation. In first phase (ROCm 6.5) -b will take both report blocks and IP blocks (mixing will be allowed) and warning will be emitted when IP block is detected in filter. In second phase (ROCm 6.6 or 6.7 or 6.x ?) -b will not accept IP blocks and error will be emitted.

I like the idea, but I think this would whole concept will need accompanying docs update

Thanks for your feedback, I will add checklist item to update rocprof-compute public docs and also update changelog

@vedithal-amd vedithal-amd changed the title Analysis section based filters for profiling Analysis report block based filtering for profiling Feb 25, 2025
@vedithal-amd vedithal-amd force-pushed the vedithal/selective-counter branch 2 times, most recently from 491d6ec to 40c11f8 Compare March 3, 2025 22:04
@vedithal-amd vedithal-amd requested a review from a team as a code owner March 3, 2025 22:04
@vedithal-amd vedithal-amd force-pushed the vedithal/selective-counter branch 10 times, most recently from 6a48faa to c014784 Compare March 4, 2025 01:53
@vedithal-amd vedithal-amd force-pushed the vedithal/selective-counter branch 7 times, most recently from e0d9ead to 3842785 Compare March 7, 2025 01:56
@vedithal-amd vedithal-amd requested a review from gsitaram March 7, 2025 01:58
@vedithal-amd
Copy link
Contributor Author

@gsitaram @skyreflectedinmirrors I have implemented the changes discussed above and tested them thoroughly (both manual and automatic) in this feature branch. Please let me know if you have any more comments before i merge this PR?

@feizheng10 @coleramos425 , there has been considerable changes to the code post our discussion, could you please review and approve

Thanks!
Vignesh

@vedithal-amd vedithal-amd force-pushed the vedithal/selective-counter branch 3 times, most recently from 4e177a0 to 5b3a2c8 Compare March 7, 2025 03:29
Copy link
Collaborator

@coleramos425 coleramos425 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vedithal-amd, this looks great! Two bugs I caught while testing:

  1. Need to add title to src/rocprof_compute_soc/analysis_configs/gfx90a/0400_roofline_chart.yml or remove the file, otherwise we'll fail with
 $ ./src/rocprof-compute analyze -p workloads/mix/MI200/ 
...
   INFO Not showing table not selected during profiling: 2.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 3.1 Memory Chart
Traceback (most recent call last):
  File "/work1/amd/colramos/audacious/omniperf/./src/rocprof-compute", line 156, in <module>
    main()
  File "/work1/amd/colramos/audacious/omniperf/./src/rocprof-compute", line 148, in main
    rocprof_compute.run_analysis()
  File "/work1/amd/colramos/audacious/omniperf/src/utils/utils.py", line 53, in wrap_function
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work1/amd/colramos/audacious/omniperf/src/rocprof_compute_base.py", line 423, in run_analysis
    analyzer.run_analysis()
  File "/work1/amd/colramos/audacious/omniperf/src/utils/utils.py", line 53, in wrap_function
    result = function(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/work1/amd/colramos/audacious/omniperf/src/rocprof_compute_analyze/analysis_cli.py", line 96, in run_analysis
    tty.show_all(
  File "/work1/amd/colramos/audacious/omniperf/src/utils/tty.py", line 102, in show_all
    f"Not showing table not selected during profiling: {table_id_str} {table_config['title']}"
                                                                       ~~~~~~~~~~~~^^^^^^^^^
KeyError: 'title'
  1. I could be confused, but my expectation would be that based on these commands, the analysis output would print my SOL table... Thoughts?
Log
$ ./src/rocprof-compute profile -n mixbench_test_sol -b 2 --no-roof -- $WORK/dev/mixbench/build/mixbench-hip 

                                 __                                       _
 _ __ ___   ___ _ __  _ __ ___  / _|       ___ ___  _ __ ___  _ __  _   _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) |  _|_____| (_| (_) | | | | | | |_) | |_| | ||  __/
|_|  \___/ \___| .__/|_|  \___/|_|        \___\___/|_| |_| |_| .__/ \__,_|\__\___|
               |_|                                           |_|

   INFO Not collecting following counters per provided filter: TCP_GATE_EN1_sum, TCP_GATE_EN2_sum, TCP_TD_TCP_STALL_CYCLES_sum, TCP_TCR_TCP_STALL_CYCLES_sum, TCP_READ_TAGCONFLICT_STALL_CYCLES_sum, TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum, TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum, TCP_TA_TCP_STATE_READ_sum, TCP_VOLATILE_sum, TCP_TOTAL_ACCESSES_sum, TCP_TOTAL_READ_sum, TCP_TOTAL_WRITE_sum, TCP_TOTAL_ATOMIC_WITH_RET_sum, TCP_TOTAL_ATOMIC_WITHOUT_RET_sum, TCP_TOTAL_WRITEBACK_INVALIDATES_sum, TCP_UTCL1_TRANSLATION_MISS_sum, TCP_UTCL1_TRANSLATION_HIT_sum, TCP_UTCL1_PERMISSION_MISS_sum, TCP_UTCL1_REQUEST_sum, TCP_TCP_LATENCY_sum, TCP_TCC_READ_REQ_LATENCY_sum, TCP_TCC_WRITE_REQ_LATENCY_sum, TCP_TCC_NC_READ_REQ_sum, TCP_TCC_NC_WRITE_REQ_sum, TCP_TCC_NC_ATOMIC_REQ_sum, TCP_TCC_UC_READ_REQ_sum, TCP_TCC_UC_WRITE_REQ_sum, TCP_TCC_UC_ATOMIC_REQ_sum, TCP_TCC_CC_READ_REQ_sum, TCP_TCC_CC_WRITE_REQ_sum, TCP_TCC_CC_ATOMIC_REQ_sum, TCP_TCC_RW_READ_REQ_sum, TCP_TCC_RW_WRITE_REQ_sum, TCP_TCC_RW_ATOMIC_REQ_sum, TCP_PENDING_STALL_CYCLES_sum, TCC_CYCLE_sum, TCC_BUSY_sum, TCC_PROBE_sum, TCC_PROBE_ALL_sum, TCC_NC_REQ_sum, TCC_UC_REQ_sum, TCC_CC_REQ_sum, TCC_RW_REQ_sum, TCC_STREAMING_REQ_sum, TCC_READ_sum, TCC_WRITE_sum, TCC_ATOMIC_sum, TCC_WRITEBACK_sum, TCC_EA_WR_UNCACHED_32B_sum, TCC_EA_WRREQ_DRAM_sum, TCC_EA_WRREQ_STALL_sum, TCC_EA_RD_UNCACHED_32B_sum, TCC_EA_RDREQ_DRAM_sum, TCC_TAG_STALL_sum, TCC_NORMAL_WRITEBACK_sum, TCC_ALL_TC_OP_WB_WRITEBACK_sum, TCC_NORMAL_EVICT_sum, TCC_ALL_TC_OP_INV_EVICT_sum, TCC_TOO_MANY_EA_WRREQS_STALL_sum, TCC_EA_ATOMIC_sum, TCC_EA_ATOMIC_LEVEL_sum, TA_TA_BUSY_sum, TA_BUFFER_WAVEFRONTS_sum, TA_BUFFER_READ_WAVEFRONTS_sum, TA_BUFFER_WRITE_WAVEFRONTS_sum, TA_BUFFER_ATOMIC_WAVEFRONTS_sum, TA_BUFFER_TOTAL_CYCLES_sum, TA_BUFFER_COALESCED_READ_CYCLES_sum, TA_BUFFER_COALESCED_WRITE_CYCLES_sum, TA_ADDR_STALLED_BY_TC_CYCLES_sum, TA_TOTAL_WAVEFRONTS_sum, TA_ADDR_STALLED_BY_TD_CYCLES_sum, TA_DATA_STALLED_BY_TC_CYCLES_sum, TA_FLAT_WAVEFRONTS_sum, TA_FLAT_READ_WAVEFRONTS_sum, TA_FLAT_WRITE_WAVEFRONTS_sum, TA_FLAT_ATOMIC_WAVEFRONTS_sum, CPF_CPF_STAT_BUSY, CPF_CPF_STAT_STALL, CPF_CPF_TCIU_BUSY, CPF_CPF_TCIU_STALL, CPF_CPF_STAT_IDLE, CPF_CPF_TCIU_IDLE, CPF_CMP_UTCL1_STALL_ON_TRANSLATION, TD_TD_BUSY_sum, TD_TC_STALL_sum, TD_SPI_STALL_sum, TD_LOAD_WAVEFRONT_sum, TD_ATOMIC_WAVEFRONT_sum, TD_STORE_WAVEFRONT_sum, TD_COALESCABLE_WAVEFRONT_sum, SQC_TC_INST_REQ, SQC_TC_DATA_READ_REQ, SQC_TC_DATA_WRITE_REQ, SQC_TC_DATA_ATOMIC_REQ, SQC_TC_STALL, SQC_TC_REQ, SQC_DCACHE_REQ_READ_16, SQC_ICACHE_MISSES_DUPLICATE, SQC_DCACHE_INPUT_VALID_READYB, SQC_DCACHE_ATOMIC, SQC_DCACHE_REQ_READ_8, SQC_DCACHE_MISSES_DUPLICATE, SQC_DCACHE_REQ_READ_1, SQC_DCACHE_REQ_READ_2, SQC_DCACHE_REQ_READ_4, SQ_INSTS_VALU_CVT, SQ_INSTS_VMEM_WR, SQ_INSTS_VMEM_RD, SQ_INSTS_SALU, SQ_INSTS_VSKIPPED, SQ_INSTS_VALU, SQ_INSTS_FLAT, SQ_INSTS_GDS, SQ_INSTS_EXP_GDS, SQ_INSTS_BRANCH, SQ_INSTS_SENDMSG, SQ_WAIT_ANY, SQ_WAIT_INST_ANY, SQ_ACTIVE_INST_ANY, SQ_ACTIVE_INST_LDS, SQ_ACTIVE_INST_EXP_GDS, SQ_INST_CYCLES_VMEM_WR, SQ_INST_CYCLES_VMEM_RD, SQ_INST_CYCLES_SMEM, SQ_INST_CYCLES_SALU, SQ_LDS_ADDR_CONFLICT, SQ_LDS_UNALIGNED_STALL, SQ_WAVES_EQ_64, SQ_WAVES_LT_64, SQ_WAVES_LT_48, SQ_WAVES_LT_32, SQ_WAVES_LT_16, SQ_ITEMS, SQ_LDS_MEM_VIOLATIONS, SQ_LDS_ATOMIC_RETURN, SQ_WAVES_RESTORED, SQ_WAVES_SAVED, SQ_INSTS_SMEM_NORM, SQ_INSTS_MFMA, SQ_INSTS_VALU_MFMA_I8, SQ_INSTS_VALU_MFMA_F16, SQ_INSTS_VALU_MFMA_BF16, SQ_INSTS_VALU_MFMA_F32, SQ_INSTS_VALU_MFMA_F64, SQ_INSTS_FLAT_LDS_ONLY, CPC_CPC_STAT_BUSY, CPC_CPC_STAT_IDLE, CPC_CPC_TCIU_BUSY, CPC_CPC_TCIU_IDLE, CPC_CPC_STAT_STALL, CPC_UTCL1_STALL_ON_TRANSLATION, CPC_CPC_UTCL2IU_BUSY, CPC_CPC_UTCL2IU_IDLE, CPC_CPC_UTCL2IU_STALL, CPC_ME1_DC0_SPI_BUSY, TCC_CYCLE_expand, TCC_RW_REQ_expand, TCC_READ_expand, TCC_WRITE_expand, TCC_ATOMIC_expand, TCC_EA_ATOMIC_expand, TCC_EA_ATOMIC_LEVEL_expand, TCC_EA_RDREQ_IO_CREDIT_STALL_expand, TCC_EA_RDREQ_GMI_CREDIT_STALL_expand, TCC_EA_RDREQ_DRAM_CREDIT_STALL_expand, TCC_EA_WRREQ_IO_CREDIT_STALL_expand, TCC_EA_WRREQ_GMI_CREDIT_STALL_expand, TCC_EA_WRREQ_DRAM_CREDIT_STALL_expand, TCC_TOO_MANY_EA_WRREQS_STALL_expand, GRBM_SPI_BUSY, SPI_CSN_WINDOW_VALID, SPI_CSN_BUSY, SPI_CSN_NUM_THREADGROUPS, SPI_CSN_WAVE, SPI_RA_REQ_NO_ALLOC, SPI_RA_REQ_NO_ALLOC_CSN, SPI_RA_RES_STALL_CSN, SPI_RA_TMP_STALL_CSN, SPI_RA_WAVE_SIMD_FULL_CSN, SPI_RA_VGPR_SIMD_FULL_CSN, SPI_RA_SGPR_SIMD_FULL_CSN, SPI_RA_LDS_CU_FULL_CSN, SPI_RA_BAR_CU_FULL_CSN, SPI_RA_TGLIM_CU_FULL_CSN, SPI_RA_WVLIM_STALL_CSN, SPI_SWC_CSC_WR, SPI_VWC_CSC_WR, SPI_RA_BULKY_CU_FULL_CSN 
   INFO Rocprofiler-Compute version: 3.1.0
   INFO Profiler choice: rocprofv1
   INFO Path: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200
   INFO Target: MI200
   INFO Command: /work1/amd/colramos/dev/mixbench/build/mixbench-hip
   INFO Kernel Selection: None
   INFO Dispatch Selection: None
   INFO Hardware Blocks: []
   INFO Report Sections: ['2']
   INFO 
   INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   INFO Collecting Performance Counters
   INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
   INFO 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_IFETCH_LEVEL.txt
   INFO    |-> [rocprof] RPL: on '250307_101513' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_IFETCH_LEVEL.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101513_2796904'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101513_2796904/input0_results_250307_101513'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101513_2796904/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 15 metrics
   INFO    |-> [rocprof] SQ_WAVES, SQ_IFETCH, SQ_IFETCH_LEVEL, SQ_ACCUM_PREV_HIRES, SQC_DCACHE_HITS, SQC_DCACHE_MISSES, SQ_INSTS, SQ_INSTS_VALU_ADD_F16, TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum, TCC_EA_WRREQ_64B_sum, TCC_EA_RDREQ_sum, TCC_EA_RDREQ_32B_sum, TCC_EA_RDREQ_LEVEL_sum, GRBM_COUNT, GRBM_GUI_ACTIVE
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  364.71,1458.86,      0.250,    0.18,  375.83,1503.30,      0.125,    0.18,  187.08,1496.60,      0.500,    0.09,  709.68,1419.36,     0.250,    0.09,  358.48,1433.92
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1069.95,1426.60,      0.750,    0.18, 1115.48,1487.31,      0.375,    0.18,  564.75,1506.00,      1.500,    0.09, 2169.42,1446.28,     0.750,    0.09, 1071.78,1429.03
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1804.74,1443.79,      1.250,    0.18, 1875.78,1500.62,      0.625,    0.18,  939.56,1503.30,      2.500,    0.09, 3584.80,1433.92,     1.250,    0.09, 1798.55,1438.84
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2509.36,1433.92,      1.750,    0.18, 2619.04,1496.60,      0.875,    0.18, 1314.22,1501.96,      3.500,    0.09, 5018.77,1433.93,     1.750,    0.09, 2505.08,1431.47
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3220.81,1431.47,      2.250,    0.18, 3385.48,1504.66,      1.125,    0.18, 1686.68,1499.27,      4.500,    0.09, 6474.77,1438.84,     2.250,    0.09, 3242.95,1441.31
   INFO    |-> [rocprof] 5,      2.750,    0.09, 3923.16,1426.60,      2.750,    0.18, 4097.36,1489.95,      1.375,    0.18, 2065.19,1501.96,      5.500,    0.09, 7900.06,1436.37,     2.750,    0.09, 3950.03,1436.37
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4741.28,1458.86,      3.250,    0.18, 4863.94,1496.60,      1.625,    0.18, 2440.69,1501.96,      6.500,    0.09, 9368.52,1441.31,     3.250,    0.09, 4708.53,1448.78
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5395.64,1438.84,      3.750,    0.18, 5567.57,1484.68,      1.875,    0.18, 2818.69,1503.30,      7.500,    0.09,10699.53,1426.60,     3.750,    0.09, 5349.76,1426.60
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6073.39,1429.03,      4.250,    0.18, 6349.21,1493.93,      2.125,    0.18, 3185.95,1499.27,      8.500,    0.09,12085.03,1421.77,     4.250,    0.09, 6042.51,1421.77
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6799.49,1431.47,      4.750,    0.18, 7096.17,1493.93,      2.375,    0.18, 3554.48,1496.62,      9.500,    0.09,13692.60,1441.33,     4.750,    0.09, 6893.68,1451.30
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7619.33,1451.30,      5.250,    0.18, 7836.29,1492.63,      2.625,    0.18, 3925.13,1495.29,     10.500,    0.10,14753.83,1405.13,     5.250,    0.09, 7528.16,1433.93
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8175.25,1421.78,      5.750,    0.18, 8544.59,1486.02,      2.875,    0.18, 4314.33,1500.64,     11.500,    0.09,16350.50,1421.78,     5.750,    0.09, 8133.98,1414.61
   INFO    |-> [rocprof] 12,      6.250,    0.09, 8962.09,1433.93,      6.250,    0.18, 9262.99,1482.08,      3.125,    0.18, 4660.31,1491.30,     12.500,    0.09,17712.43,1416.99,     6.250,    0.10, 8796.78,1407.48
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9679.16,1433.95,      6.750,    0.18,10012.87,1483.39,      3.375,    0.18, 5006.44,1483.39,     13.500,    0.09,19194.07,1421.78,     6.750,    0.09, 9728.95,1441.33
   INFO    |-> [rocprof] 14,      7.250,    0.09,10576.83,1458.87,      7.250,    0.18,10716.66,1478.16,      3.625,    0.18, 5353.62,1476.86,     14.500,    0.10,20442.83,1409.85,     7.250,    0.10,10170.14,1402.78
   INFO    |-> [rocprof] 15,      7.750,    0.09,11094.03,1431.49,      7.750,    0.18,11395.51,1470.39,      3.875,    0.18, 5794.24,1495.29,     15.500,    0.10,21742.82,1402.76,     7.750,    0.10,10853.26,1400.42
   INFO    |-> [rocprof] 16,      8.250,    0.09,11789.78,1429.06,      8.250,    0.19,11962.95,1450.05,      4.125,    0.18, 6146.11,1489.97,     16.500,    0.10,23145.83,1402.78,     8.250,    0.10,11534.34,1398.10
   INFO    |-> [rocprof] 17,      8.750,    0.09,12461.72,1424.20,      8.750,    0.19,12655.16,1446.30,      4.375,    0.18, 6524.44,1491.30,     17.500,    0.10,24385.23,1393.44,     8.750,    0.10,12274.17,1402.76
   INFO    |-> [rocprof] 18,      9.250,    0.09,13401.34,1448.79,      9.250,    0.19,13332.34,1441.33,      4.625,    0.18, 6897.26,1491.30,     18.500,    0.10,25907.78,1400.42,     9.250,    0.09,13084.96,1414.59
   INFO    |-> [rocprof] 20,     10.250,    0.09,14672.91,1431.50,     10.250,    0.19,14710.40,1435.16,      5.125,    0.18, 7649.71,1492.63,     20.500,    0.10,28237.22,1377.43,    10.250,    0.10,14450.96,1409.85
   INFO    |-> [rocprof] 22,     11.250,    0.09,16215.09,1441.34,     11.250,    0.20,15432.69,1371.79,      5.625,    0.18, 8381.11,1489.98,     22.500,    0.10,29537.06,1312.76,    11.250,    0.10,15369.85,1366.21
   INFO    |-> [rocprof] 24,     12.250,    0.09,17446.41,1424.20,     12.250,    0.20,16337.03,1333.64,      6.125,    0.18, 9021.94,1472.97,     24.500,    0.11,30046.64,1226.39,    12.250,    0.11,15546.06,1269.07
   INFO    |-> [rocprof] 28,     14.250,    0.09,20503.89,1438.87,     14.250,    0.22,17199.59,1206.99,      7.125,    0.18,10504.13,1474.26,     28.500,    0.12,31961.67,1121.46,    14.250,    0.12,16602.31,1165.07
   INFO    |-> [rocprof] 32,     16.250,    0.09,23064.88,1419.38,     16.250,    0.24,17947.90,1104.49,      8.125,    0.18,11873.99,1461.41,     32.500,    0.13,33825.03,1040.77,    16.250,    0.13,17233.23,1060.51
   INFO    |-> [rocprof] 40,     20.250,    0.10,27576.19,1361.79,     20.250,    0.29,18759.60, 926.40,     10.125,    0.19,14518.67,1433.94,     40.500,    0.15,35426.11, 874.72,    20.250,    0.15,17694.60, 873.81
   INFO    |-> [rocprof] 48,     24.250,    0.11,29185.62,1203.53,     24.250,    0.34,19082.79, 786.92,     12.125,    0.20,16234.77,1338.95,     48.500,    0.18,36586.80, 754.37,    24.250,    0.18,18326.36, 755.73
   INFO    |-> [rocprof] 56,     28.250,    0.12,31554.75,1116.98,     28.250,    0.39,19250.72, 681.44,     14.125,    0.22,17247.24,1221.04,     56.500,    0.20,37496.36, 663.65,    28.250,    0.20,18644.83, 659.99
   INFO    |-> [rocprof] 64,     32.250,    0.13,33689.98,1044.65,     32.250,    0.45,19427.70, 602.41,     16.125,    0.24,17975.44,1114.76,     64.500,    0.23,38156.76, 591.58,    32.250,    0.24,17904.06, 555.16
   INFO    |-> [rocprof] 80,     40.250,    0.16,34665.23, 861.25,     40.250,    0.55,19635.88, 487.85,     20.125,    0.29,18541.42, 921.31,     80.500,    0.28,38390.02, 476.89,    40.250,    0.30,18300.23, 454.66
   INFO    |-> [rocprof] 96,     48.250,    0.18,36529.40, 757.09,     48.250,    0.65,19792.04, 410.20,     24.125,    0.34,18940.01, 785.08,     96.500,    0.33,39162.83, 405.83,    48.250,    0.35,18669.19, 386.93
   INFO    |-> [rocprof] 128,     64.250,    0.43,20170.82, 313.94,     64.250,    0.87,19932.11, 310.23,     32.125,    0.44,19436.15, 605.02,    128.500,    0.43,39775.97, 309.54,    64.250,    0.46,18720.55, 291.37
   INFO    |-> [rocprof] 256,    128.250,    0.83,20649.35, 161.01,    128.250,    1.70,20228.10, 157.72,     64.125,    0.86,20049.04, 312.66,    256.500,    0.84,41054.43, 160.06,   128.250,    0.90,19170.20, 149.48
   INFO    |-> [rocprof] 512,    256.250,    1.64,20936.66,  81.70,    256.250,    3.37,20384.70,  79.55,    128.125,    1.69,20326.81, 158.65,    512.500,    1.65,41617.81,  81.21,   256.250,    1.77,19381.12,  75.63
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101513_2796904/input0_results_250307_101513
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_IFETCH_LEVEL.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_LDS.txt
   INFO    |-> [rocprof] RPL: on '250307_101514' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_LDS.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101514_2797119'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101514_2797119/input0_results_250307_101514'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101514_2797119/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 16 metrics
   INFO    |-> [rocprof] SQ_INSTS_LDS, SQ_INST_LEVEL_LDS, SQ_ACCUM_PREV_HIRES, SQ_BUSY_CU_CYCLES, SQC_ICACHE_REQ, SQC_ICACHE_HITS, SQC_ICACHE_MISSES, SQC_DCACHE_REQ, TCP_TOTAL_CACHE_ACCESSES_sum, TCP_TCC_READ_REQ_sum, TCP_TCC_WRITE_REQ_sum, TCP_TCC_ATOMIC_WITH_RET_REQ_sum, TCC_REQ_sum, TCC_HIT_sum, TCC_MISS_sum, TCC_EA_WRREQ_sum
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  365.99,1463.95,      0.250,    0.18,  375.83,1503.31,      0.125,    0.18,  187.91,1503.31,      0.500,    0.09,  715.74,1431.49,     0.250,    0.09,  359.71,1438.85
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1082.84,1443.79,      0.750,    0.18, 1122.45,1496.60,      0.375,    0.18,  561.73,1497.94,      1.500,    0.09, 2158.26,1438.84,     0.750,    0.09, 1073.62,1431.49
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1801.66,1441.33,      1.250,    0.18, 1874.10,1499.28,      0.625,    0.18,  937.89,1500.62,      2.500,    0.09, 3590.94,1436.37,     1.250,    0.09, 1804.74,1443.79
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2517.97,1438.84,      1.750,    0.18, 2626.09,1500.62,      0.875,    0.18, 1313.04,1500.62,      3.500,    0.09, 5018.77,1433.93,     1.750,    0.09, 2509.39,1433.93
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3237.39,1438.84,      2.250,    0.18, 3370.37,1497.94,      1.125,    0.18, 1694.26,1506.01,      4.500,    0.09, 6542.10,1453.80,     2.250,    0.09, 3254.13,1446.28
   INFO    |-> [rocprof] 5,      2.750,    0.09, 3977.31,1446.30,      2.750,    0.18, 4186.67,1522.42,      1.375,    0.18, 2067.06,1503.32,      5.500,    0.09, 7900.15,1436.39,     2.750,    0.09, 3943.32,1433.93
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4692.37,1443.81,      3.250,    0.18, 4825.28,1484.70,      1.625,    0.18, 2436.36,1499.30,      6.500,    0.09, 9384.74,1443.81,     3.250,    0.09, 4692.42,1443.82
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5404.97,1441.33,      3.750,    0.18, 5543.10,1478.16,      1.875,    0.18, 2816.21,1501.98,      7.500,    0.09,10847.34,1446.31,     3.750,    0.09, 5442.38,1451.30
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6136.18,1443.81,      4.250,    0.18, 6349.28,1493.95,      2.125,    0.18, 3174.64,1493.95,      8.500,    0.09,12272.49,1443.82,     4.250,    0.09, 6094.22,1433.93
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6869.91,1446.30,      4.750,    0.18, 7083.67,1491.30,      2.375,    0.18, 3557.65,1497.96,      9.500,    0.09,13739.96,1446.31,     4.750,    0.09, 6846.30,1441.33
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7593.14,1446.31,      5.250,    0.18, 7836.25,1492.62,      2.625,    0.18, 3921.63,1493.96,     10.500,    0.09,15056.31,1433.93,     5.250,    0.09, 7566.96,1441.33
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8231.06,1431.49,      5.750,    0.18, 8537.03,1484.70,      2.875,    0.18, 4295.12,1493.96,     11.500,    0.09,16378.26,1424.20,     5.750,    0.09, 8147.63,1416.98
   INFO    |-> [rocprof] 12,      6.250,    0.09, 8977.44,1436.39,      6.250,    0.18, 9206.06,1472.97,      3.125,    0.18, 4635.59,1483.39,     12.500,    0.09,17772.29,1421.78,     6.250,    0.09, 9070.63,1451.30
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9695.74,1436.41,      6.750,    0.18,10084.14,1493.95,      3.375,    0.18, 5019.75,1487.33,     13.500,    0.09,19226.66,1424.20,     6.750,    0.10, 9437.09,1398.09
   INFO    |-> [rocprof] 14,      7.250,    0.09,10342.99,1426.62,      7.250,    0.18,10697.81,1475.56,      3.625,    0.18, 5363.06,1479.46,     14.500,    0.09,20756.57,1431.49,     7.250,    0.10,10238.51,1412.21
   INFO    |-> [rocprof] 15,      7.750,    0.09,11151.24,1438.87,      7.750,    0.18,11425.55,1474.26,      3.875,    0.18, 5778.79,1491.30,     15.500,    0.09,22302.23,1438.85,     7.750,    0.10,10944.62,1412.21
   INFO    |-> [rocprof] 16,      8.250,    0.09,11829.96,1433.93,      8.250,    0.18,12025.31,1457.61,      4.125,    0.18, 6184.56,1499.29,     16.500,    0.10,23184.35,1405.11,     8.250,    0.09,11729.71,1421.78
   INFO    |-> [rocprof] 17,      8.750,    0.09,12461.72,1424.20,      8.750,    0.19,12676.88,1448.79,      4.375,    0.18, 6535.98,1493.94,     17.500,    0.10,24630.72,1407.47,     8.750,    0.09,12377.53,1414.58
   INFO    |-> [rocprof] 18,      9.250,    0.09,13241.26,1431.49,      9.250,    0.19,13343.73,1442.57,      4.625,    0.18, 6897.22,1491.29,     18.500,    0.10,25864.34,1398.07,     9.250,    0.10,12997.29,1405.11
   INFO    |-> [rocprof] 20,     10.250,    0.09,14597.86,1424.18,     10.250,    0.19,14597.94,1424.19,      5.125,    0.18, 7636.04,1489.96,     20.500,    0.10,28613.09,1395.76,    10.250,    0.09,14524.04,1416.98
   INFO    |-> [rocprof] 22,     11.250,    0.09,16214.92,1441.33,     11.250,    0.19,15559.83,1383.10,      5.625,    0.18, 8321.89,1479.45,     22.500,    0.10,29353.32,1304.59,    11.250,    0.10,14932.11,1327.30
   INFO    |-> [rocprof] 24,     12.250,    0.09,17416.84,1421.78,     12.250,    0.20,16119.05,1315.84,      6.125,    0.18, 8998.19,1469.09,     24.500,    0.11,29871.95,1219.26,    12.250,    0.11,15569.62,1270.99
   INFO    |-> [rocprof] 28,     14.250,    0.09,20157.70,1414.58,     14.250,    0.22,17261.52,1211.33,      7.125,    0.18,10430.69,1463.96,     28.500,    0.12,31961.41,1121.45,    14.250,    0.11,16883.55,1184.81
   INFO    |-> [rocprof] 32,     16.250,    0.09,23103.97,1421.78,     16.250,    0.24,17935.87,1103.75,      8.125,    0.18,11822.50,1455.08,     32.500,    0.13,33993.47,1045.95,    16.250,    0.13,16996.73,1045.95
   INFO    |-> [rocprof] 40,     20.250,    0.10,27575.91,1361.77,     20.250,    0.29,18790.59, 927.93,     10.125,    0.19,14456.73,1427.83,     40.500,    0.16,35060.07, 865.68,    20.250,    0.15,17787.01, 878.37
   INFO    |-> [rocprof] 48,     24.250,    0.11,29101.85,1200.08,     24.250,    0.34,19038.03, 785.07,     12.125,    0.20,15992.20,1318.94,     48.500,    0.18,36003.60, 742.34,    24.250,    0.18,18409.08, 759.14
   INFO    |-> [rocprof] 56,     28.250,    0.12,31937.22,1130.52,     28.250,    0.39,19235.00, 680.88,     14.125,    0.22,17024.06,1205.24,     56.500,    0.21,36969.53, 654.33,    28.250,    0.20,18630.18, 659.48
   INFO    |-> [rocprof] 64,     32.250,    0.13,34028.99,1055.16,     32.250,    0.45,19434.55, 602.62,     16.125,    0.24,17903.91,1110.32,     64.500,    0.23,38156.25, 591.57,    32.250,    0.24,17880.25, 554.43
   INFO    |-> [rocprof] 80,     40.250,    0.16,34243.12, 850.76,     40.250,    0.55,19641.63, 487.99,     20.125,    0.29,18654.10, 926.91,     80.500,    0.28,38521.29, 478.53,    40.250,    0.29,18349.96, 455.90
   INFO    |-> [rocprof] 96,     48.250,    0.18,35945.66, 744.99,     48.250,    0.65,19796.91, 410.30,     24.125,    0.34,19020.05, 788.40,     96.500,    0.33,39068.33, 404.85,    48.250,    0.35,18660.53, 386.75
   INFO    |-> [rocprof] 128,     64.250,    0.43,20163.27, 313.83,     64.250,    0.86,19961.64, 310.69,     32.125,    0.44,19450.18, 605.45,    128.500,    0.43,39834.76, 310.00,    64.250,    0.46,18740.07, 291.67
   INFO    |-> [rocprof] 256,    128.250,    0.83,20681.11, 161.26,    128.250,    1.70,20237.62, 157.80,     64.125,    0.86,20030.38, 312.36,    256.500,    0.84,41046.65, 160.03,   128.250,    0.90,19176.99, 149.53
   INFO    |-> [rocprof] 512,    256.250,    1.64,20938.64,  81.71,    256.250,    3.38,20379.82,  79.53,    128.125,    1.69,20324.90, 158.63,    512.500,    1.65,41597.70,  81.17,   256.250,    1.78,19368.91,  75.59
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101514_2797119/input0_results_250307_101514
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_INST_LEVEL_LDS.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_SMEM.txt
   INFO    |-> [rocprof] RPL: on '250307_101515' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_SMEM.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101515_2797331'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101515_2797331/input0_results_250307_101515'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101515_2797331/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 105 metrics
   INFO    |-> [rocprof] SQ_INSTS_SMEM, SQ_INST_LEVEL_SMEM, SQ_ACCUM_PREV_HIRES, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MUL_F32, TCC_HIT[0], TCC_MISS[0], TCC_REQ[0], TCC_HIT[1], TCC_MISS[1], TCC_REQ[1], TCC_HIT[2], TCC_MISS[2], TCC_REQ[2], TCC_HIT[3], TCC_MISS[3], TCC_REQ[3], TCC_HIT[4], TCC_MISS[4], TCC_REQ[4], TCC_HIT[5], TCC_MISS[5], TCC_REQ[5], TCC_HIT[6], TCC_MISS[6], TCC_REQ[6], TCC_HIT[7], TCC_MISS[7], TCC_REQ[7], TCC_HIT[8], TCC_MISS[8], TCC_REQ[8], TCC_HIT[9], TCC_MISS[9], TCC_REQ[9], TCC_HIT[10], TCC_MISS[10], TCC_REQ[10], TCC_HIT[11], TCC_MISS[11], TCC_REQ[11], TCC_HIT[12], TCC_MISS[12], TCC_REQ[12], TCC_HIT[13], TCC_MISS[13], TCC_REQ[13], TCC_HIT[14], TCC_MISS[14], TCC_REQ[14], TCC_HIT[15], TCC_MISS[15], TCC_REQ[15], TCC_HIT[16], TCC_MISS[16], TCC_REQ[16], TCC_HIT[17], TCC_MISS[17], TCC_REQ[17], TCC_HIT[18], TCC_MISS[18], TCC_REQ[18], TCC_HIT[19], TCC_MISS[19], TCC_REQ[19], TCC_HIT[20], TCC_MISS[20], TCC_REQ[20], TCC_HIT[21], TCC_MISS[21], TCC_REQ[21], TCC_HIT[22], TCC_MISS[22], TCC_REQ[22], TCC_HIT[23], TCC_MISS[23], TCC_REQ[23], TCC_HIT[24], TCC_MISS[24], TCC_REQ[24], TCC_HIT[25], TCC_MISS[25], TCC_REQ[25], TCC_HIT[26], TCC_MISS[26], TCC_REQ[26], TCC_HIT[27], TCC_MISS[27], TCC_REQ[27], TCC_HIT[28], TCC_MISS[28], TCC_REQ[28], TCC_HIT[29], TCC_MISS[29], TCC_REQ[29], TCC_HIT[30], TCC_MISS[30], TCC_REQ[30], TCC_HIT[31], TCC_MISS[31], TCC_REQ[31], TCC_EA_WRREQ_LEVEL_sum
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  362.20,1448.79,      0.250,    0.18,  377.52,1510.07,      0.125,    0.18,  188.42,1507.37,      0.500,    0.09,  724.40,1448.79,     0.250,    0.09,  361.57,1446.28
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1082.86,1443.81,      0.750,    0.18, 1133.58,1511.44,      0.375,    0.18,  565.77,1508.72,      1.500,    0.09, 2184.49,1456.32,     0.750,    0.09, 1084.72,1446.30
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1814.11,1451.28,      1.250,    0.18, 1892.70,1514.16,      0.625,    0.18,  943.80,1510.08,      2.500,    0.09, 3590.98,1436.39,     1.250,    0.09, 1795.49,1436.39
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2539.78,1451.30,      1.750,    0.18, 2642.63,1510.07,      0.875,    0.18, 1322.51,1511.44,      3.500,    0.09, 5070.78,1448.79,     1.750,    0.09, 2531.02,1446.30
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3276.76,1456.34,      2.250,    0.18, 3397.67,1510.07,      1.125,    0.18, 1700.36,1511.44,      4.500,    0.09, 6497.06,1443.79,     2.250,    0.09, 3271.05,1453.80
   INFO    |-> [rocprof] 5,      2.750,    0.09, 3991.08,1451.30,      2.750,    0.18, 4152.71,1510.07,      1.375,    0.18, 2072.62,1507.36,      5.500,    0.09, 7982.15,1451.30,     2.750,    0.09, 3997.99,1453.82
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4708.58,1448.79,      3.250,    0.18, 4833.78,1487.32,      1.625,    0.18, 2460.53,1514.17,      6.500,    0.09, 9384.64,1443.79,     3.250,    0.09, 4700.41,1446.28
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5442.32,1451.28,      3.750,    0.18, 5597.29,1492.61,      1.875,    0.18, 2833.96,1511.44,      7.500,    0.09,10865.95,1448.79,     3.750,    0.09, 5423.61,1446.30
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6157.31,1448.78,      4.250,    0.18, 6400.54,1506.01,      2.125,    0.18, 3208.93,1510.08,      8.500,    0.09,12272.36,1443.81,     4.250,    0.09, 6073.46,1429.05
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6858.08,1443.81,      4.750,    0.18, 7134.33,1501.96,      2.375,    0.18, 3583.24,1508.73,      9.500,    0.09,13645.85,1436.41,     4.750,    0.09, 6822.85,1436.39
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7632.53,1453.82,      5.250,    0.18, 7871.26,1499.29,      2.625,    0.18, 3946.21,1503.32,     10.500,    0.09,15108.13,1438.87,     5.250,    0.09, 7553.98,1438.85
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8359.44,1453.82,      5.750,    0.18, 8590.25,1493.96,      2.875,    0.18, 4337.61,1508.73,     11.500,    0.09,16603.78,1443.81,     5.750,    0.09, 8245.21,1433.95
   INFO    |-> [rocprof] 12,      6.250,    0.09, 9023.89,1443.82,      6.250,    0.18, 9295.83,1487.33,      3.125,    0.18, 4676.91,1496.61,     12.500,    0.09,18016.58,1441.33,     6.250,    0.09, 8946.80,1431.49
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9745.70,1443.81,      6.750,    0.18,10165.67,1506.03,      3.375,    0.18, 5033.14,1491.30,     13.500,    0.09,19358.33,1433.95,     6.750,    0.09, 9597.03,1421.78
   INFO    |-> [rocprof] 14,      7.250,    0.09,10449.73,1441.34,      7.250,    0.18,10764.08,1484.70,      3.625,    0.18, 5425.25,1496.62,     14.500,    0.09,20792.05,1433.93,     7.250,    0.09,10343.10,1426.63
   INFO    |-> [rocprof] 15,      7.750,    0.09,11094.15,1431.50,      7.750,    0.18,11506.37,1484.69,      3.875,    0.18, 5851.57,1510.08,     15.500,    0.09,22112.83,1426.63,     7.750,    0.09,10981.71,1416.99
   INFO    |-> [rocprof] 16,      8.250,    0.09,11789.78,1429.06,      8.250,    0.18,12046.24,1460.15,      4.125,    0.18, 6195.67,1501.98,     16.500,    0.09,23340.98,1414.61,     8.250,    0.09,11690.21,1416.99
   INFO    |-> [rocprof] 17,      8.750,    0.09,12676.95,1448.79,      8.750,    0.18,12765.20,1458.88,      4.375,    0.18, 6594.78,1507.38,     17.500,    0.10,24713.91,1412.22,     8.750,    0.09,12419.55,1419.38
   INFO    |-> [rocprof] 18,      9.250,    0.09,13589.25,1469.11,      9.250,    0.18,13494.64,1458.88,      4.625,    0.18, 6984.14,1510.08,     18.500,    0.10,25951.11,1402.76,     9.250,    0.09,13084.96,1414.59
   INFO    |-> [rocprof] 20,     10.250,    0.09,14723.00,1436.39,     10.250,    0.19,14811.77,1445.05,      5.125,    0.18, 7711.46,1504.67,     20.500,    0.10,28756.63,1402.76,    10.250,    0.10,14402.40,1405.11
   INFO    |-> [rocprof] 22,     11.250,    0.09,16243.00,1443.82,     11.250,    0.20,15483.33,1376.30,      5.625,    0.18, 8410.99,1495.29,     22.500,    0.10,29959.02,1331.51,    11.250,    0.10,15147.97,1346.49
   INFO    |-> [rocprof] 24,     12.250,    0.09,17595.78,1436.39,     12.250,    0.20,16169.94,1319.99,      6.125,    0.18, 9109.92,1487.33,     24.500,    0.11,30858.70,1259.54,    12.250,    0.11,15616.94,1274.85
   INFO    |-> [rocprof] 28,     14.250,    0.09,20363.95,1429.05,     14.250,    0.22,17361.96,1218.38,      7.125,    0.18,10569.08,1483.38,     28.500,    0.12,32438.71,1138.20,    14.250,    0.12,16579.28,1163.46
   INFO    |-> [rocprof] 32,     16.250,    0.09,23064.88,1419.38,     16.250,    0.24,17971.49,1105.94,      8.125,    0.18,11967.88,1472.97,     32.500,    0.13,34206.73,1052.51,    16.250,    0.13,17103.36,1052.51
   INFO    |-> [rocprof] 40,     20.250,    0.10,28311.26,1398.09,     20.250,    0.29,18811.53, 928.96,     10.125,    0.19,14656.46,1447.55,     40.500,    0.15,35799.41, 883.94,    20.250,    0.15,17824.57, 880.23
   INFO    |-> [rocprof] 48,     24.250,    0.11,29610.44,1221.05,     24.250,    0.34,19100.71, 787.66,     12.125,    0.20,16055.46,1324.16,     48.500,    0.18,36196.20, 746.31,    24.250,    0.18,18326.36, 755.73
   INFO    |-> [rocprof] 56,     28.250,    0.12,32154.16,1138.20,     28.250,    0.39,19242.95, 681.17,     14.125,    0.22,17209.59,1218.38,     56.500,    0.20,37085.42, 656.38,    28.250,    0.20,18644.93, 660.00
   INFO    |-> [rocprof] 64,     32.250,    0.13,34288.04,1063.19,     32.250,    0.45,19392.89, 601.33,     16.125,    0.24,17856.79,1107.40,     64.500,    0.23,38237.49, 592.83,    32.250,    0.24,17880.39, 554.43
   INFO    |-> [rocprof] 80,     40.250,    0.16,34665.23, 861.25,     40.250,    0.55,19635.88, 487.85,     20.125,    0.29,18716.14, 929.99,     80.500,    0.28,38543.13, 478.80,    40.250,    0.29,18359.87, 456.15
   INFO    |-> [rocprof] 96,     48.250,    0.18,36041.48, 746.97,     48.250,    0.66,19762.96, 409.60,     24.125,    0.34,19037.89, 789.14,     96.500,    0.33,39086.96, 405.05,    48.250,    0.35,18651.93, 386.57
   INFO    |-> [rocprof] 128,     64.250,    0.43,20163.18, 313.82,     64.250,    0.86,19939.42, 310.34,     32.125,    0.44,19436.06, 605.01,    128.500,    0.43,39923.10, 310.69,    64.250,    0.46,18726.97, 291.47
   INFO    |-> [rocprof] 256,    128.250,    0.83,20657.21, 161.07,    128.250,    1.70,20216.61, 157.63,     64.125,    0.86,20026.55, 312.30,    256.500,    0.84,40976.10, 159.75,   128.250,    0.90,19142.83, 149.26
   INFO    |-> [rocprof] 512,    256.250,    1.64,20918.23,  81.63,    256.250,    3.38,20372.05,  79.50,    128.125,    1.69,20305.69, 158.48,    512.500,    1.65,41585.63,  81.14,   256.250,    1.78,19361.93,  75.56
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101515_2797331/input0_results_250307_101515
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_INST_LEVEL_SMEM.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_VMEM.txt
   INFO    |-> [rocprof] RPL: on '250307_101516' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_VMEM.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101516_2797554'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101516_2797554/input0_results_250307_101516'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101516_2797554/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 136 metrics
   INFO    |-> [rocprof] SQ_INSTS_VMEM, SQ_INST_LEVEL_VMEM, SQ_ACCUM_PREV_HIRES, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_FMA_F64, TCC_EA_RDREQ[0], TCC_EA_RDREQ_32B[0], TCC_EA_WRREQ[0], TCC_EA_WRREQ_64B[0], TCC_EA_RDREQ[1], TCC_EA_RDREQ_32B[1], TCC_EA_WRREQ[1], TCC_EA_WRREQ_64B[1], TCC_EA_RDREQ[2], TCC_EA_RDREQ_32B[2], TCC_EA_WRREQ[2], TCC_EA_WRREQ_64B[2], TCC_EA_RDREQ[3], TCC_EA_RDREQ_32B[3], TCC_EA_WRREQ[3], TCC_EA_WRREQ_64B[3], TCC_EA_RDREQ[4], TCC_EA_RDREQ_32B[4], TCC_EA_WRREQ[4], TCC_EA_WRREQ_64B[4], TCC_EA_RDREQ[5], TCC_EA_RDREQ_32B[5], TCC_EA_WRREQ[5], TCC_EA_WRREQ_64B[5], TCC_EA_RDREQ[6], TCC_EA_RDREQ_32B[6], TCC_EA_WRREQ[6], TCC_EA_WRREQ_64B[6], TCC_EA_RDREQ[7], TCC_EA_RDREQ_32B[7], TCC_EA_WRREQ[7], TCC_EA_WRREQ_64B[7], TCC_EA_RDREQ[8], TCC_EA_RDREQ_32B[8], TCC_EA_WRREQ[8], TCC_EA_WRREQ_64B[8], TCC_EA_RDREQ[9], TCC_EA_RDREQ_32B[9], TCC_EA_WRREQ[9], TCC_EA_WRREQ_64B[9], TCC_EA_RDREQ[10], TCC_EA_RDREQ_32B[10], TCC_EA_WRREQ[10], TCC_EA_WRREQ_64B[10], TCC_EA_RDREQ[11], TCC_EA_RDREQ_32B[11], TCC_EA_WRREQ[11], TCC_EA_WRREQ_64B[11], TCC_EA_RDREQ[12], TCC_EA_RDREQ_32B[12], TCC_EA_WRREQ[12], TCC_EA_WRREQ_64B[12], TCC_EA_RDREQ[13], TCC_EA_RDREQ_32B[13], TCC_EA_WRREQ[13], TCC_EA_WRREQ_64B[13], TCC_EA_RDREQ[14], TCC_EA_RDREQ_32B[14], TCC_EA_WRREQ[14], TCC_EA_WRREQ_64B[14], TCC_EA_RDREQ[15], TCC_EA_RDREQ_32B[15], TCC_EA_WRREQ[15], TCC_EA_WRREQ_64B[15], TCC_EA_RDREQ[16], TCC_EA_RDREQ_32B[16], TCC_EA_WRREQ[16], TCC_EA_WRREQ_64B[16], TCC_EA_RDREQ[17], TCC_EA_RDREQ_32B[17], TCC_EA_WRREQ[17], TCC_EA_WRREQ_64B[17], TCC_EA_RDREQ[18], TCC_EA_RDREQ_32B[18], TCC_EA_WRREQ[18], TCC_EA_WRREQ_64B[18], TCC_EA_RDREQ[19], TCC_EA_RDREQ_32B[19], TCC_EA_WRREQ[19], TCC_EA_WRREQ_64B[19], TCC_EA_RDREQ[20], TCC_EA_RDREQ_32B[20], TCC_EA_WRREQ[20], TCC_EA_WRREQ_64B[20], TCC_EA_RDREQ[21], TCC_EA_RDREQ_32B[21], TCC_EA_WRREQ[21], TCC_EA_WRREQ_64B[21], TCC_EA_RDREQ[22], TCC_EA_RDREQ_32B[22], TCC_EA_WRREQ[22], TCC_EA_WRREQ_64B[22], TCC_EA_RDREQ[23], TCC_EA_RDREQ_32B[23], TCC_EA_WRREQ[23], TCC_EA_WRREQ_64B[23], TCC_EA_RDREQ[24], TCC_EA_RDREQ_32B[24], TCC_EA_WRREQ[24], TCC_EA_WRREQ_64B[24], TCC_EA_RDREQ[25], TCC_EA_RDREQ_32B[25], TCC_EA_WRREQ[25], TCC_EA_WRREQ_64B[25], TCC_EA_RDREQ[26], TCC_EA_RDREQ_32B[26], TCC_EA_WRREQ[26], TCC_EA_WRREQ_64B[26], TCC_EA_RDREQ[27], TCC_EA_RDREQ_32B[27], TCC_EA_WRREQ[27], TCC_EA_WRREQ_64B[27], TCC_EA_RDREQ[28], TCC_EA_RDREQ_32B[28], TCC_EA_WRREQ[28], TCC_EA_WRREQ_64B[28], TCC_EA_RDREQ[29], TCC_EA_RDREQ_32B[29], TCC_EA_WRREQ[29], TCC_EA_WRREQ_64B[29], TCC_EA_RDREQ[30], TCC_EA_RDREQ_32B[30], TCC_EA_WRREQ[30], TCC_EA_WRREQ_64B[30], TCC_EA_RDREQ[31], TCC_EA_RDREQ_32B[31], TCC_EA_WRREQ[31], TCC_EA_WRREQ_64B[31]
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  357.27,1429.06,      0.250,    0.18,  374.82,1499.30,      0.125,    0.18,  189.10,1512.82,      0.500,    0.09,  715.75,1431.50,     0.250,    0.09,  358.49,1433.95
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1103.75,1471.67,      0.750,    0.18, 1132.56,1510.08,      0.375,    0.18,  565.78,1508.73,      1.500,    0.09, 2180.75,1453.83,     0.750,    0.09, 1082.87,1443.82
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1810.99,1448.79,      1.250,    0.18, 1882.53,1506.03,      0.625,    0.18,  942.11,1507.38,      2.500,    0.09, 3584.88,1433.95,     1.250,    0.09, 1820.44,1456.36
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2522.32,1441.33,      1.750,    0.18, 2642.65,1510.08,      0.875,    0.18, 1315.41,1503.33,      3.500,    0.09, 5053.32,1443.81,     1.750,    0.09, 2526.66,1443.81
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3265.43,1451.30,      2.250,    0.18, 3388.56,1506.03,      1.125,    0.18, 1694.28,1506.03,      4.500,    0.09, 6497.20,1443.82,     2.250,    0.09, 3237.46,1438.87
   INFO    |-> [rocprof] 5,      2.750,    0.09, 3956.85,1438.85,      2.750,    0.18, 4134.12,1503.32,      1.375,    0.18, 2076.38,1510.09,      5.500,    0.09, 7900.15,1436.39,     2.750,    0.09, 3984.23,1448.81
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4692.42,1443.82,      3.250,    0.18, 4829.55,1486.02,      1.625,    0.18, 2451.69,1508.73,      6.500,    0.09, 9384.74,1443.81,     3.250,    0.09, 4692.37,1443.81
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5432.98,1448.79,      3.750,    0.18, 5592.37,1491.30,      1.875,    0.18, 2826.33,1507.38,      7.500,    0.09,10847.22,1446.30,     3.750,    0.09, 5404.97,1441.33
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6136.25,1443.82,      4.250,    0.18, 6394.83,1504.67,      2.125,    0.18, 3200.30,1506.03,      8.500,    0.09,12167.78,1431.50,     4.250,    0.09, 6125.64,1441.33
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6858.16,1443.82,      4.750,    0.18, 7140.80,1503.33,      2.375,    0.18, 3570.40,1503.33,      9.500,    0.09,13692.60,1441.33,     4.750,    0.09, 6869.91,1446.30
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7541.05,1436.39,      5.250,    0.18, 7871.30,1499.30,      2.625,    0.18, 3953.32,1506.03,     10.500,    0.09,15160.13,1443.82,     5.250,    0.09, 7566.96,1441.33
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8287.71,1441.34,      5.750,    0.18, 8559.76,1488.65,      2.875,    0.18, 4333.69,1507.37,     11.500,    0.09,16490.43,1433.95,     5.750,    0.09, 8231.06,1431.49
   INFO    |-> [rocprof] 12,      6.250,    0.09, 9039.45,1446.31,      6.250,    0.18, 9304.08,1488.65,      3.125,    0.18, 4652.04,1488.65,     12.500,    0.09,17863.12,1429.05,     6.250,    0.09, 8856.22,1416.99
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9779.36,1448.79,      6.750,    0.18,10093.13,1495.28,      3.375,    0.18, 5019.75,1487.33,     13.500,    0.09,19292.37,1429.06,     6.750,    0.10, 9500.42,1407.47
   INFO    |-> [rocprof] 14,      7.250,    0.09,10485.65,1446.30,      7.250,    0.18,10745.01,1482.07,      3.625,    0.18, 5382.04,1484.70,     14.500,    0.09,20792.05,1433.93,     7.250,    0.09,10290.48,1419.38
   INFO    |-> [rocprof] 15,      7.750,    0.09,11132.14,1436.41,      7.750,    0.18,11486.04,1482.07,      3.875,    0.18, 5814.97,1500.64,     15.500,    0.09,22188.30,1431.50,     7.750,    0.09,11037.64,1424.21
   INFO    |-> [rocprof] 16,      8.250,    0.09,11850.34,1436.41,      8.250,    0.18,11983.66,1452.56,      4.125,    0.18, 6184.60,1499.30,     16.500,    0.09,23459.67,1421.80,     8.250,    0.10,11611.63,1407.47
   INFO    |-> [rocprof] 17,      8.750,    0.09,12547.06,1433.95,      8.750,    0.18,12787.44,1461.42,      4.375,    0.18, 6559.42,1499.30,     17.500,    0.09,24755.59,1414.61,     8.750,    0.09,12377.66,1414.59
   INFO    |-> [rocprof] 18,      9.250,    0.09,13332.27,1441.33,      9.250,    0.18,13436.23,1452.56,      4.625,    0.18, 6952.85,1503.32,     18.500,    0.10,26082.23,1409.85,     9.250,    0.10,13019.23,1407.48
   INFO    |-> [rocprof] 20,     10.250,    0.09,14824.54,1446.30,     10.250,    0.19,14799.02,1443.81,      5.125,    0.18, 7697.65,1501.98,     20.500,    0.10,28853.13,1407.47,    10.250,    0.10,14306.55,1395.76
   INFO    |-> [rocprof] 22,     11.250,    0.09,16104.24,1431.49,     11.250,    0.19,15637.26,1389.98,      5.625,    0.18, 8396.02,1492.63,     22.500,    0.10,30054.43,1335.75,    11.250,    0.10,15099.34,1342.16
   INFO    |-> [rocprof] 24,     12.250,    0.09,17446.41,1424.20,     12.250,    0.20,16144.45,1317.91,      6.125,    0.18, 9061.72,1479.46,     24.500,    0.11,30402.22,1240.91,    12.250,    0.11,15499.31,1265.25
   INFO    |-> [rocprof] 28,     14.250,    0.09,20329.32,1426.62,     14.250,    0.22,17324.14,1215.73,      7.125,    0.18,10606.60,1488.65,     28.500,    0.12,32704.96,1147.54,    14.250,    0.11,16812.46,1179.82
   INFO    |-> [rocprof] 32,     16.250,    0.09,23064.88,1419.38,     16.250,    0.24,18007.10,1108.13,      8.125,    0.18,11936.37,1469.09,     32.500,    0.13,34163.86,1051.20,    16.250,    0.13,17146.39,1055.16
   INFO    |-> [rocprof] 40,     20.250,    0.10,28170.41,1391.13,     20.250,    0.29,18811.53, 928.96,     10.125,    0.19,14656.38,1447.54,     40.500,    0.15,35686.60, 881.15,    20.250,    0.15,17768.64, 877.46
   INFO    |-> [rocprof] 48,     24.250,    0.11,29610.17,1221.04,     24.250,    0.34,19082.73, 786.92,     12.125,    0.20,16106.31,1328.36,     48.500,    0.18,36488.15, 752.33,    24.250,    0.18,18376.03, 757.77
   INFO    |-> [rocprof] 56,     28.250,    0.12,32507.02,1150.69,     28.250,    0.39,19227.29, 680.61,     14.125,    0.22,17424.71,1233.61,     56.500,    0.20,37056.61, 655.87,    28.250,    0.20,18718.47, 662.60
   INFO    |-> [rocprof] 64,     32.250,    0.13,34114.81,1057.82,     32.250,    0.45,19413.76, 601.98,     16.125,    0.24,17915.92,1111.06,     64.500,    0.23,37995.82, 589.08,    32.250,    0.24,17892.29, 554.80
   INFO    |-> [rocprof] 80,     40.250,    0.16,34594.19, 859.48,     40.250,    0.55,19601.68, 487.00,     20.125,    0.29,18664.41, 927.42,     80.500,    0.28,38455.61, 477.71,    40.250,    0.29,18330.03, 455.40
   INFO    |-> [rocprof] 96,     48.250,    0.18,36430.76, 755.04,     48.250,    0.65,19787.23, 410.10,     24.125,    0.34,18922.30, 784.34,     96.500,    0.33,38974.28, 403.88,    48.250,    0.35,18651.98, 386.57
   INFO    |-> [rocprof] 128,     64.250,    0.43,20201.06, 314.41,     64.250,    0.86,19939.49, 310.34,     32.125,    0.44,19436.15, 605.02,    128.500,    0.43,39834.76, 310.00,    64.250,    0.46,18727.01, 291.47
   INFO    |-> [rocprof] 256,    128.250,    0.83,20641.43, 160.95,    128.250,    1.70,20231.90, 157.75,     64.125,    0.86,20015.47, 312.13,    256.500,    0.84,41015.35, 159.90,   128.250,    0.90,19183.88, 149.58
   INFO    |-> [rocprof] 512,    256.250,    1.64,20926.45,  81.66,    256.250,    3.38,20371.17,  79.50,    128.125,    1.69,20309.58, 158.51,    512.500,    1.65,41597.78,  81.17,   256.250,    1.78,19367.20,  75.58
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101516_2797554/input0_results_250307_101516
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_INST_LEVEL_VMEM.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_LEVEL_WAVES.txt
   INFO    |-> [rocprof] RPL: on '250307_101517' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_LEVEL_WAVES.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101517_2797786'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101517_2797786/input0_results_250307_101517'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101517_2797786/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 75 metrics
   INFO    |-> [rocprof] SQ_CYCLES, SQ_WAVES, SQ_WAVE_CYCLES, SQ_BUSY_CYCLES, SQ_LEVEL_WAVES, SQ_ACCUM_PREV_HIRES, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_INT32, TCC_EA_RDREQ_LEVEL[0], TCC_EA_WRREQ_LEVEL[0], TCC_EA_RDREQ_LEVEL[1], TCC_EA_WRREQ_LEVEL[1], TCC_EA_RDREQ_LEVEL[2], TCC_EA_WRREQ_LEVEL[2], TCC_EA_RDREQ_LEVEL[3], TCC_EA_WRREQ_LEVEL[3], TCC_EA_RDREQ_LEVEL[4], TCC_EA_WRREQ_LEVEL[4], TCC_EA_RDREQ_LEVEL[5], TCC_EA_WRREQ_LEVEL[5], TCC_EA_RDREQ_LEVEL[6], TCC_EA_WRREQ_LEVEL[6], TCC_EA_RDREQ_LEVEL[7], TCC_EA_WRREQ_LEVEL[7], TCC_EA_RDREQ_LEVEL[8], TCC_EA_WRREQ_LEVEL[8], TCC_EA_RDREQ_LEVEL[9], TCC_EA_WRREQ_LEVEL[9], TCC_EA_RDREQ_LEVEL[10], TCC_EA_WRREQ_LEVEL[10], TCC_EA_RDREQ_LEVEL[11], TCC_EA_WRREQ_LEVEL[11], TCC_EA_RDREQ_LEVEL[12], TCC_EA_WRREQ_LEVEL[12], TCC_EA_RDREQ_LEVEL[13], TCC_EA_WRREQ_LEVEL[13], TCC_EA_RDREQ_LEVEL[14], TCC_EA_WRREQ_LEVEL[14], TCC_EA_RDREQ_LEVEL[15], TCC_EA_WRREQ_LEVEL[15], TCC_EA_RDREQ_LEVEL[16], TCC_EA_WRREQ_LEVEL[16], TCC_EA_RDREQ_LEVEL[17], TCC_EA_WRREQ_LEVEL[17], TCC_EA_RDREQ_LEVEL[18], TCC_EA_WRREQ_LEVEL[18], TCC_EA_RDREQ_LEVEL[19], TCC_EA_WRREQ_LEVEL[19], TCC_EA_RDREQ_LEVEL[20], TCC_EA_WRREQ_LEVEL[20], TCC_EA_RDREQ_LEVEL[21], TCC_EA_WRREQ_LEVEL[21], TCC_EA_RDREQ_LEVEL[22], TCC_EA_WRREQ_LEVEL[22], TCC_EA_RDREQ_LEVEL[23], TCC_EA_WRREQ_LEVEL[23], TCC_EA_RDREQ_LEVEL[24], TCC_EA_WRREQ_LEVEL[24], TCC_EA_RDREQ_LEVEL[25], TCC_EA_WRREQ_LEVEL[25], TCC_EA_RDREQ_LEVEL[26], TCC_EA_WRREQ_LEVEL[26], TCC_EA_RDREQ_LEVEL[27], TCC_EA_WRREQ_LEVEL[27], TCC_EA_RDREQ_LEVEL[28], TCC_EA_WRREQ_LEVEL[28], TCC_EA_RDREQ_LEVEL[29], TCC_EA_WRREQ_LEVEL[29], TCC_EA_RDREQ_LEVEL[30], TCC_EA_WRREQ_LEVEL[30], TCC_EA_RDREQ_LEVEL[31], TCC_EA_WRREQ_LEVEL[31], CPC_ME1_BUSY_FOR_PACKET_DECODE, GRBM_COUNT, GRBM_GUI_ACTIVE
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  359.71,1438.85,      0.250,    0.18,  377.52,1510.08,      0.125,    0.18,  189.96,1519.66,      0.500,    0.09,  719.43,1438.85,     0.250,    0.09,  359.71,1438.85
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1079.14,1438.85,      0.750,    0.18, 1129.51,1506.02,      0.375,    0.18,  566.79,1511.44,      1.500,    0.09, 2173.19,1448.79,     0.750,    0.09, 1082.86,1443.81
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1810.99,1448.79,      1.250,    0.18, 1880.83,1504.67,      0.625,    0.18,  943.80,1510.08,      2.500,    0.09, 3609.52,1443.81,     1.250,    0.09, 1804.76,1443.81
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2575.42,1471.67,      1.750,    0.18, 2633.17,1504.67,      0.875,    0.18, 1316.58,1504.67,      3.500,    0.09, 5044.64,1441.33,     1.750,    0.09, 2535.36,1448.78
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3259.79,1448.79,      2.250,    0.18, 3382.47,1503.32,      1.125,    0.18, 1694.27,1506.02,      4.500,    0.09, 6497.13,1443.81,     2.250,    0.09, 3242.98,1441.33
   INFO    |-> [rocprof] 5,      2.750,    0.09, 3970.47,1443.81,      2.750,    0.18, 4156.47,1511.44,      1.375,    0.18, 2070.77,1506.02,      5.500,    0.09, 7982.15,1451.30,     2.750,    0.09, 3956.85,1438.85
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4700.46,1446.30,      3.250,    0.18, 4829.53,1486.01,      1.625,    0.18, 2445.08,1504.67,      6.500,    0.09, 9352.55,1438.85,     3.250,    0.09, 4676.28,1438.85
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5395.70,1438.85,      3.750,    0.18, 5592.34,1491.29,      1.875,    0.18, 2836.51,1512.81,      7.500,    0.09,10772.93,1436.39,     3.750,    0.09, 5377.26,1433.93
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6125.64,1441.33,      4.250,    0.18, 6400.57,1506.02,      2.125,    0.18, 3211.82,1511.44,      8.500,    0.09,12272.36,1443.81,     4.250,    0.09, 6115.13,1438.85
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6893.68,1451.30,      4.750,    0.18, 7108.91,1496.61,      2.375,    0.18, 3589.68,1511.44,      9.500,    0.09,13716.17,1443.81,     4.750,    0.09, 6834.56,1438.85
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7579.99,1443.81,      5.250,    0.18, 7878.34,1500.64,      2.625,    0.18, 3949.75,1504.67,     10.500,    0.09,15030.62,1431.49,     5.250,    0.09, 7554.06,1438.87
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8287.71,1441.34,      5.750,    0.18, 8574.97,1491.30,      2.875,    0.18, 4341.51,1510.09,     11.500,    0.09,16462.29,1431.50,     5.750,    0.09, 8203.06,1426.62
   INFO    |-> [rocprof] 12,      6.250,    0.09, 8992.93,1438.87,      6.250,    0.18, 9295.83,1487.33,      3.125,    0.18, 4672.77,1495.29,     12.500,    0.09,17985.87,1438.87,     6.250,    0.09, 8931.65,1429.06
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9728.95,1441.33,      6.750,    0.18,10111.21,1497.96,      3.375,    0.18, 5015.30,1486.02,     13.500,    0.09,19292.16,1429.05,     6.750,    0.09, 9580.80,1419.38
   INFO    |-> [rocprof] 14,      7.250,    0.09,10413.94,1436.41,      7.250,    0.18,10783.17,1487.33,      3.625,    0.18, 5401.16,1489.98,     14.500,    0.09,20685.97,1426.62,     7.250,    0.09,10290.48,1419.38
   INFO    |-> [rocprof] 15,      7.750,    0.09,11075.13,1429.05,      7.750,    0.18,11465.78,1479.46,      3.875,    0.18, 5846.34,1508.73,     15.500,    0.09,22000.58,1419.39,     7.750,    0.09,11000.17,1419.38
   INFO    |-> [rocprof] 16,      8.250,    0.09,11830.09,1433.95,      8.250,    0.18,12162.68,1474.26,      4.125,    0.18, 6206.78,1504.67,     16.500,    0.09,23380.41,1416.99,     8.250,    0.09,11709.86,1419.38
   INFO    |-> [rocprof] 17,      8.750,    0.09,12568.41,1436.39,      8.750,    0.18,12765.20,1458.88,      4.375,    0.18, 6559.42,1499.30,     17.500,    0.09,24755.33,1414.59,     8.750,    0.09,12419.55,1419.38
   INFO    |-> [rocprof] 18,      9.250,    0.09,13309.54,1438.87,      9.250,    0.18,13518.15,1461.42,      4.625,    0.18, 6940.45,1500.64,     18.500,    0.09,26214.40,1416.99,     9.250,    0.09,13085.10,1414.61
   INFO    |-> [rocprof] 20,     10.250,    0.09,14723.00,1436.39,     10.250,    0.19,14735.69,1437.63,      5.125,    0.18, 7683.89,1499.30,     20.500,    0.10,28377.30,1384.26,    10.250,    0.10,14426.72,1407.48
   INFO    |-> [rocprof] 22,     11.250,    0.09,16159.39,1436.39,     11.250,    0.20,15369.93,1366.22,      5.625,    0.18, 8381.11,1489.98,     22.500,    0.10,29864.21,1327.30,    11.250,    0.10,15147.82,1346.47
   INFO    |-> [rocprof] 24,     12.250,    0.09,17565.70,1433.93,     12.250,    0.20,16246.63,1326.26,      6.125,    0.18, 9101.85,1486.02,     24.500,    0.11,30223.66,1233.62,    12.250,    0.11,15429.35,1259.54
   INFO    |-> [rocprof] 28,     14.250,    0.09,20398.92,1431.50,     14.250,    0.22,17236.79,1209.60,      7.125,    0.18,10541.18,1479.46,     28.500,    0.12,32526.98,1141.30,    14.250,    0.12,16602.31,1165.07
   INFO    |-> [rocprof] 32,     16.250,    0.09,23104.22,1421.80,     16.250,    0.24,17971.49,1105.94,      8.125,    0.18,12010.06,1478.16,     32.500,    0.13,34249.70,1053.84,    16.250,    0.13,17081.93,1051.20
   INFO    |-> [rocprof] 40,     20.250,    0.10,27847.14,1375.17,     20.250,    0.29,18759.60, 926.40,     10.125,    0.19,14469.20,1429.06,     40.500,    0.15,35278.96, 871.09,    20.250,    0.15,17824.57, 880.23
   INFO    |-> [rocprof] 48,     24.250,    0.11,29354.07,1210.48,     24.250,    0.34,19109.68, 788.03,     12.125,    0.20,16068.15,1325.21,     48.500,    0.18,36035.89, 743.01,    24.250,    0.18,18359.44, 757.09
   INFO    |-> [rocprof] 56,     28.250,    0.12,31851.91,1127.50,     28.250,    0.39,19227.29, 680.61,     14.125,    0.22,17322.89,1226.40,     56.500,    0.21,36854.72, 652.30,    28.250,    0.20,18644.93, 660.00
   INFO    |-> [rocprof] 64,     32.250,    0.13,33858.64,1049.88,     32.250,    0.45,19385.94, 601.11,     16.125,    0.24,17786.35,1103.03,     64.500,    0.23,38291.78, 593.67,    32.250,    0.24,17880.47, 554.43
   INFO    |-> [rocprof] 80,     40.250,    0.16,34488.18, 856.85,     40.250,    0.55,19613.11, 487.28,     20.125,    0.29,18674.79, 927.94,     80.500,    0.28,38433.59, 477.44,    40.250,    0.29,18339.99, 455.65
   INFO    |-> [rocprof] 96,     48.250,    0.18,35977.61, 745.65,     48.250,    0.66,19767.91, 409.70,     24.125,    0.34,19011.18, 788.03,     96.500,    0.33,39011.73, 404.27,    48.250,    0.35,18626.18, 386.03
   INFO    |-> [rocprof] 128,     64.250,    0.43,20163.23, 313.82,     64.250,    0.86,19961.57, 310.69,     32.125,    0.44,19429.06, 604.80,    128.500,    0.43,39790.46, 309.65,    64.250,    0.46,18726.97, 291.47
   INFO    |-> [rocprof] 256,    128.250,    0.83,20641.38, 160.95,    128.250,    1.70,20207.13, 157.56,     64.125,    0.86,20007.93, 312.01,    256.500,    0.84,40991.71, 159.81,   128.250,    0.90,19132.64, 149.18
   INFO    |-> [rocprof] 512,    256.250,    1.64,20920.26,  81.64,    256.250,    3.38,20370.12,  79.49,    128.125,    1.69,20309.50, 158.51,    512.500,    1.65,41633.91,  81.24,   256.250,    1.78,19358.37,  75.54
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101517_2797786/input0_results_250307_101517
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_LEVEL_WAVES.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_0.txt
   INFO    |-> [rocprof] RPL: on '250307_101518' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_0.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101518_2797975'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101518_2797975/input0_results_250307_101518'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101518_2797975/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 8 metrics
   INFO    |-> [rocprof] SQ_INSTS_VALU_INT64, SQ_ACTIVE_INST_VMEM, SQ_ACTIVE_INST_VALU, SQ_ACTIVE_INST_SCA, SQ_ACTIVE_INST_MISC, SQ_ACTIVE_INST_FLAT, SQ_THREAD_CYCLES_VALU, SQ_LDS_BANK_CONFLICT
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  359.71,1438.85,      0.250,    0.18,  375.83,1503.33,      0.125,    0.18,  188.08,1504.67,      0.500,    0.09,  720.67,1441.34,     0.250,    0.09,  360.34,1441.34
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1080.99,1441.33,      0.750,    0.18, 1125.48,1500.64,      0.375,    0.18,  566.28,1510.09,      1.500,    0.09, 2169.47,1446.31,     0.750,    0.09, 1082.87,1443.82
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1798.57,1438.85,      1.250,    0.18, 1889.32,1511.45,      0.625,    0.18,  942.11,1507.38,      2.500,    0.09, 3609.56,1443.82,     1.250,    0.09, 1811.01,1448.81
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2518.02,1438.87,      1.750,    0.18, 2642.66,1510.09,      0.875,    0.18, 1321.33,1510.09,      3.500,    0.09, 5035.99,1438.85,     1.750,    0.09, 2526.69,1443.82
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3243.02,1441.34,      2.250,    0.18, 3391.60,1507.38,      1.125,    0.18, 1700.38,1511.45,      4.500,    0.09, 6508.33,1446.30,     2.250,    0.09, 3242.98,1441.33
   INFO    |-> [rocprof] 5,      2.750,    0.09, 4054.20,1474.26,      2.750,    0.18, 4145.27,1507.37,      1.375,    0.18, 2074.51,1508.73,      5.500,    0.09, 7913.78,1438.87,     2.750,    0.09, 3950.11,1436.41
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4700.46,1446.30,      3.250,    0.18, 4838.12,1488.65,      1.625,    0.18, 2442.91,1503.33,      6.500,    0.09, 9336.64,1436.41,     3.250,    0.09, 4700.46,1446.30
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5404.97,1441.33,      3.750,    0.18, 5577.50,1487.33,      1.875,    0.18, 2823.80,1506.03,      7.500,    0.09,10809.95,1441.33,     3.750,    0.09, 5405.03,1441.34
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6146.82,1446.31,      4.250,    0.18, 6389.14,1503.33,      2.125,    0.18, 3191.71,1501.98,      8.500,    0.09,12209.45,1436.41,     4.250,    0.09, 6073.46,1429.05
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6869.91,1446.30,      4.750,    0.18, 7147.16,1504.67,      2.375,    0.18, 3576.79,1506.02,      9.500,    0.09,13692.60,1441.33,     4.750,    0.09, 6788.06,1429.06
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7593.14,1446.31,      5.250,    0.18, 7871.26,1499.29,      2.625,    0.18, 3949.77,1504.67,     10.500,    0.09,15082.26,1436.41,     5.250,    0.09, 7541.13,1436.41
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8259.33,1436.41,      5.750,    0.18, 8590.25,1493.96,      2.875,    0.18, 4337.61,1508.73,     11.500,    0.09,16546.82,1438.85,     5.750,    0.09, 8175.34,1421.80
   INFO    |-> [rocprof] 12,      6.250,    0.09, 8946.89,1431.50,      6.250,    0.18, 9304.08,1488.65,      3.125,    0.18, 4656.15,1489.97,     12.500,    0.09,17985.67,1438.85,     6.250,    0.09, 8916.46,1426.63
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9729.06,1441.34,      6.750,    0.18,10093.19,1495.29,      3.375,    0.18, 5019.72,1487.33,     13.500,    0.09,19325.29,1431.50,     6.750,    0.10, 9500.42,1407.47
   INFO    |-> [rocprof] 14,      7.250,    0.09,10431.69,1438.85,      7.250,    0.18,10745.07,1482.08,      3.625,    0.18, 5386.81,1486.02,     14.500,    0.09,20686.19,1426.63,     7.250,    0.09,10290.59,1419.39
   INFO    |-> [rocprof] 15,      7.750,    0.09,11113.11,1433.95,      7.750,    0.18,11455.75,1478.16,      3.875,    0.18, 5835.85,1506.03,     15.500,    0.09,21926.15,1414.59,     7.750,    0.10,10944.73,1412.22
   INFO    |-> [rocprof] 16,      8.250,    0.09,11769.73,1426.63,      8.250,    0.18,12151.93,1472.96,      4.125,    0.18, 6195.67,1501.98,     16.500,    0.10,23262.28,1409.84,     8.250,    0.09,11709.99,1419.39
   INFO    |-> [rocprof] 17,      8.750,    0.09,12590.11,1438.87,      8.750,    0.18,12720.96,1453.82,      4.375,    0.18, 6571.16,1501.98,     17.500,    0.09,24797.41,1416.99,     8.750,    0.09,12398.57,1416.98
   INFO    |-> [rocprof] 18,      9.250,    0.09,13332.41,1441.34,      9.250,    0.18,13447.87,1453.82,      4.625,    0.18, 6940.45,1500.64,     18.500,    0.10,26081.95,1409.84,     9.250,    0.09,13129.24,1419.38
   INFO    |-> [rocprof] 20,     10.250,    0.09,14647.75,1429.05,     10.250,    0.19,14863.05,1450.05,      5.125,    0.18, 7683.89,1499.30,     20.500,    0.10,28330.26,1381.96,    10.250,    0.10,14426.57,1407.47
   INFO    |-> [rocprof] 22,     11.250,    0.09,16159.39,1436.39,     11.250,    0.19,15534.38,1380.83,      5.625,    0.18, 8388.56,1491.30,     22.500,    0.10,29817.03,1325.20,    11.250,    0.10,15003.32,1333.63
   INFO    |-> [rocprof] 24,     12.250,    0.09,17446.60,1424.21,     12.250,    0.20,16081.29,1312.76,      6.125,    0.18, 9093.74,1484.69,     24.500,    0.11,30312.82,1237.26,    12.250,    0.11,15569.62,1270.99
   INFO    |-> [rocprof] 28,     14.250,    0.09,20329.32,1426.62,     14.250,    0.22,17336.78,1216.62,      7.125,    0.18,10606.60,1488.65,     28.500,    0.12,32975.62,1157.04,    14.250,    0.11,16741.82,1174.86
   INFO    |-> [rocprof] 32,     16.250,    0.09,23026.16,1416.99,     16.250,    0.24,17900.76,1101.59,      8.125,    0.18,11999.48,1476.86,     32.500,    0.13,34422.95,1059.17,    16.250,    0.13,17017.95,1047.26
   INFO    |-> [rocprof] 40,     20.250,    0.10,28077.57,1386.55,     20.250,    0.29,18790.72, 927.94,     10.125,    0.18,14732.73,1455.08,     40.500,    0.15,35837.17, 884.87,    20.250,    0.15,17843.30, 881.15
   INFO    |-> [rocprof] 48,     24.250,    0.11,29567.41,1219.27,     24.250,    0.34,19073.84, 786.55,     12.125,    0.20,16068.15,1325.21,     48.500,    0.18,36196.20, 746.31,    24.250,    0.18,18293.40, 754.37
   INFO    |-> [rocprof] 56,     28.250,    0.12,32329.63,1144.41,     28.250,    0.39,19211.75, 680.06,     14.125,    0.22,17284.98,1223.72,     56.500,    0.21,36854.90, 652.30,    28.250,    0.20,18703.70, 662.08
   INFO    |-> [rocprof] 64,     32.250,    0.13,33774.36,1047.27,     32.250,    0.45,19379.00, 600.90,     16.125,    0.24,17904.06,1110.33,     64.500,    0.23,38076.03, 590.33,    32.250,    0.24,17915.99, 555.53
   INFO    |-> [rocprof] 80,     40.250,    0.16,34665.23, 861.25,     40.250,    0.55,19607.38, 487.14,     20.125,    0.29,18654.10, 926.91,     80.500,    0.28,38521.29, 478.53,    40.250,    0.30,18310.15, 454.91
   INFO    |-> [rocprof] 96,     48.250,    0.18,35881.72, 743.66,     48.250,    0.66,19767.87, 409.70,     24.125,    0.34,18975.47, 786.55,     96.500,    0.33,39049.48, 404.66,    48.250,    0.35,18643.34, 386.39
   INFO    |-> [rocprof] 128,     64.250,    0.43,20201.06, 314.41,     64.250,    0.86,19939.46, 310.34,     32.125,    0.44,19436.15, 605.02,    128.500,    0.43,39775.97, 309.54,    64.250,    0.46,18733.56, 291.57
   INFO    |-> [rocprof] 256,    128.250,    0.83,20645.39, 160.98,    128.250,    1.70,20211.00, 157.59,     64.125,    0.86,19996.87, 311.84,    256.500,    0.84,40945.05, 159.63,   128.250,    0.90,19139.49, 149.24
   INFO    |-> [rocprof] 512,    256.250,    1.64,20912.20,  81.61,    256.250,    3.38,20361.52,  79.46,    128.125,    1.69,20305.74, 158.48,    512.500,    1.65,41585.53,  81.14,   256.250,    1.78,19354.90,  75.53
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101518_2797975/input0_results_250307_101518
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/pmc_perf_0.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_1.txt
   INFO    |-> [rocprof] RPL: on '250307_101519' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_1.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101519_2798199'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101519_2798199/input0_results_250307_101519'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101519_2798199/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 7 metrics
   INFO    |-> [rocprof] SQ_LDS_IDX_ACTIVE, SQ_VALU_MFMA_BUSY_CYCLES, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  360.33,1441.33,      0.250,    0.18,  377.18,1508.73,      0.125,    0.18,  188.59,1508.73,      0.500,    0.09,  715.74,1431.49,     0.250,    0.09,  360.96,1443.82
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1086.60,1448.79,      0.750,    0.18, 1131.54,1508.73,      0.375,    0.18,  565.26,1507.37,      1.500,    0.09, 2161.99,1441.33,     0.750,    0.09, 1086.60,1448.79
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1814.13,1451.30,      1.250,    0.18, 1882.52,1506.02,      0.625,    0.18,  943.80,1510.08,      2.500,    0.09, 3621.98,1448.79,     1.250,    0.09, 1807.87,1446.30
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2522.32,1441.33,      1.750,    0.18, 2635.53,1506.02,      0.875,    0.18, 1318.95,1507.37,      3.500,    0.09, 5079.55,1451.30,     1.750,    0.09, 2531.02,1446.30
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3259.79,1448.79,      2.250,    0.18, 3394.63,1508.73,      1.125,    0.18, 1694.28,1506.03,      4.500,    0.09, 6530.85,1451.30,     2.250,    0.09, 3248.57,1443.81
   INFO    |-> [rocprof] 5,      2.750,    0.09, 3977.36,1446.31,      2.750,    0.18, 4149.00,1508.73,      1.375,    0.18, 2074.50,1508.73,      5.500,    0.09, 7940.94,1443.81,     2.750,    0.09, 3970.47,1443.81
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4708.58,1448.79,      3.250,    0.18, 4833.81,1487.33,      1.625,    0.18, 2462.75,1515.54,      6.500,    0.09, 9320.68,1433.95,     3.250,    0.09, 4700.46,1446.30
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5432.98,1448.79,      3.750,    0.18, 5577.50,1487.33,      1.875,    0.18, 2818.72,1503.32,      7.500,    0.09,10809.95,1441.33,     3.750,    0.09, 5395.70,1438.85
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6136.18,1443.81,      4.250,    0.18, 6383.38,1501.97,      2.125,    0.18, 3203.16,1507.37,      8.500,    0.09,12209.32,1436.39,     4.250,    0.09, 6115.13,1438.85
   INFO    |-> [rocprof] 9,      4.750,    0.09, 6869.91,1446.30,      4.750,    0.18, 7147.16,1504.67,      2.375,    0.18, 3586.45,1510.08,      9.500,    0.09,13739.81,1446.30,     4.750,    0.09, 6834.56,1438.85
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7619.33,1451.30,      5.250,    0.18, 7885.35,1501.97,      2.625,    0.18, 3946.23,1503.33,     10.500,    0.09,15107.97,1438.85,     5.250,    0.09, 7553.98,1438.85
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8316.20,1446.30,      5.750,    0.18, 8613.21,1497.95,      2.875,    0.18, 4329.80,1506.02,     11.500,    0.09,16462.11,1431.49,     5.750,    0.09, 8245.12,1433.93
   INFO    |-> [rocprof] 12,      6.250,    0.09, 9039.35,1446.30,      6.250,    0.18, 9328.86,1492.62,      3.125,    0.18, 4672.75,1495.28,     12.500,    0.09,18016.58,1441.33,     6.250,    0.09, 8886.14,1421.78
   INFO    |-> [rocprof] 13,      6.750,    0.09, 9745.70,1443.81,      6.750,    0.18,10147.40,1503.32,      3.375,    0.18, 5006.44,1483.39,     13.500,    0.09,19424.53,1438.85,     6.750,    0.09, 9580.80,1419.38
   INFO    |-> [rocprof] 14,      7.250,    0.09,10431.80,1438.87,      7.250,    0.18,10773.56,1486.01,      3.625,    0.18, 5386.78,1486.01,     14.500,    0.09,20686.19,1426.63,     7.250,    0.09,10308.03,1421.80
   INFO    |-> [rocprof] 15,      7.750,    0.09,11151.12,1438.85,      7.750,    0.18,11506.37,1484.69,      3.875,    0.18, 5830.61,1504.67,     15.500,    0.09,22000.35,1419.38,     7.750,    0.09,11000.17,1419.38
   INFO    |-> [rocprof] 16,      8.250,    0.09,11809.77,1431.49,      8.250,    0.18,12077.77,1463.97,      4.125,    0.18, 6195.67,1501.98,     16.500,    0.09,23340.74,1414.59,     8.250,    0.10,11611.63,1407.47
   INFO    |-> [rocprof] 17,      8.750,    0.09,12568.41,1436.39,      8.750,    0.18,12709.94,1452.56,      4.375,    0.18, 6553.56,1497.96,     17.500,    0.09,24839.36,1419.39,     8.750,    0.09,12419.55,1419.38
   INFO    |-> [rocprof] 18,      9.250,    0.09,13332.27,1441.33,      9.250,    0.18,13506.39,1460.15,      4.625,    0.18, 6934.24,1499.30,     18.500,    0.10,25907.78,1400.42,     9.250,    0.10,13062.93,1412.21
   INFO    |-> [rocprof] 20,     10.250,    0.09,14748.25,1438.85,     10.250,    0.19,14799.10,1443.81,      5.125,    0.18, 7704.55,1503.33,     20.500,    0.10,28613.09,1395.76,    10.250,    0.09,14598.02,1424.20
   INFO    |-> [rocprof] 22,     11.250,    0.09,16159.39,1436.39,     11.250,    0.19,15496.04,1377.43,      5.625,    0.18, 8381.11,1489.98,     22.500,    0.10,29353.60,1304.60,    11.250,    0.10,15270.37,1357.37
   INFO    |-> [rocprof] 24,     12.250,    0.09,17565.70,1433.93,     12.250,    0.20,16337.03,1333.64,      6.125,    0.18, 9101.85,1486.02,     24.500,    0.11,30812.44,1257.65,    12.250,    0.11,15593.39,1272.93
   INFO    |-> [rocprof] 28,     14.250,    0.09,20364.17,1429.06,     14.250,    0.22,17349.29,1217.49,      7.125,    0.18,10597.25,1487.33,     28.500,    0.12,32438.71,1138.20,    14.250,    0.11,16859.89,1183.15
   INFO    |-> [rocprof] 32,     16.250,    0.09,23026.16,1416.99,     16.250,    0.24,18078.75,1112.54,      8.125,    0.18,12010.06,1478.16,     32.500,    0.13,34292.78,1055.16,    16.250,    0.13,17103.36,1052.51
   INFO    |-> [rocprof] 40,     20.250,    0.10,28217.49,1393.46,     20.250,    0.29,18769.96, 926.91,     10.125,    0.19,14555.99,1437.63,     40.500,    0.15,35611.78, 879.30,    20.250,    0.15,17918.47, 884.86
   INFO    |-> [rocprof] 48,     24.250,    0.11,29653.34,1222.82,     24.250,    0.34,19082.79, 786.92,     12.125,    0.20,16144.58,1331.51,     48.500,    0.18,36228.43, 746.98,    24.250,    0.18,18293.40, 754.37
   INFO    |-> [rocprof] 56,     28.250,    0.12,32507.29,1150.70,     28.250,    0.39,19242.91, 681.16,     14.125,    0.22,17373.61,1229.99,     56.500,    0.20,37056.61, 655.87,    28.250,    0.20,18644.83, 659.99
   INFO    |-> [rocprof] 64,     32.250,    0.13,34201.07,1060.50,     32.250,    0.45,19385.90, 601.11,     16.125,    0.24,17939.68,1112.54,     64.500,    0.23,38291.61, 593.67,    32.250,    0.24,17915.92, 555.53
   INFO    |-> [rocprof] 80,     40.250,    0.16,34665.23, 861.25,     40.250,    0.55,19618.77, 487.42,     20.125,    0.29,18643.73, 926.40,     80.500,    0.28,38499.19, 478.25,    40.250,    0.29,18329.97, 455.40
   INFO    |-> [rocprof] 96,     48.250,    0.18,36105.78, 748.31,     48.250,    0.65,19782.37, 410.00,     24.125,    0.34,19046.91, 789.51,     96.500,    0.33,39049.36, 404.66,    48.250,    0.35,18626.18, 386.03
   INFO    |-> [rocprof] 128,     64.250,    0.43,20163.23, 313.82,     64.250,    0.86,19954.23, 310.57,     32.125,    0.44,19436.10, 605.01,    128.500,    0.43,39805.34, 309.77,    64.250,    0.46,18779.21, 292.28
   INFO    |-> [rocprof] 256,    128.250,    0.83,20653.29, 161.04,    128.250,    1.70,20209.08, 157.58,     64.125,    0.86,20008.00, 312.02,    256.500,    0.84,40984.00, 159.78,   128.250,    0.90,19146.28, 149.29
   INFO    |-> [rocprof] 512,    256.250,    1.64,20912.17,  81.61,    256.250,    3.38,20365.35,  79.47,    128.125,    1.69,20303.79, 158.47,    512.500,    1.65,41589.60,  81.15,   256.250,    1.78,19353.19,  75.52
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101519_2798199/input0_results_250307_101519
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/pmc_perf_1.csv' is generating
   INFO    |-> [rocprof] 
   INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/timestamps.txt
   INFO    |-> [rocprof] RPL: on '250307_101520' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
   INFO    |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
   INFO    |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/timestamps.txt'
   INFO    |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101520_2798403'
   INFO    |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101520_2798403/input0_results_250307_101520'
   INFO    |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
   INFO    |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101520_2798403/input0.xml"
   INFO    |-> [rocprof] gpu_index =
   INFO    |-> [rocprof] kernel =
   INFO    |-> [rocprof] range =
   INFO    |-> [rocprof] 0 metrics
   INFO    |-> [rocprof] ------------------------ Device specifications ------------------------
   INFO    |-> [rocprof] Device:
   INFO    |-> [rocprof] CUDA driver version: 60342.133
   INFO    |-> [rocprof] GPU clock rate:      1700 MHz
   INFO    |-> [rocprof] WarpSize:            64
   INFO    |-> [rocprof] L2 cache size:       8192 KB
   INFO    |-> [rocprof] Total global mem:    65520 MB
   INFO    |-> [rocprof] Total SPs:           13312 (104 MPs x 128 SPs/MP)
   INFO    |-> [rocprof] Compute throughput:  45260.80 GFlops (theoretical single precision FMAs)
   INFO    |-> [rocprof] Memory bandwidth:    1638.40 GB/sec
   INFO    |-> [rocprof] -----------------------------------------------------------------------
   INFO    |-> [rocprof] Total GPU memory 68702699520, free 67905781760
   INFO    |-> [rocprof] Buffer size:          256MB
   INFO    |-> [rocprof] Trade-off type:       compute with global memory (block strided)
   INFO    |-> [rocprof] Elements per thread:  8
   INFO    |-> [rocprof] Thread fusion degree: 1
   INFO    |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] Experiment ID, Single Precision ops,,,,              Packed Single Precision ops,,,,       Double precision ops,,,,              Half precision ops,,,,                Integer operations,,,
   INFO    |-> [rocprof] Compute iters, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Flops/byte, ex.time,  GFLOPS, GB/sec, Iops/byte, ex.time,   GIOPS, GB/sec
   INFO    |-> [rocprof] 0,      0.250,    0.09,  372.49,1489.97,      0.250,    0.18,  383.39,1533.56,      0.125,    0.18,  191.52,1532.16,      0.500,    0.09,  746.32,1492.63,     0.250,    0.09,  373.15,1492.62
   INFO    |-> [rocprof] 1,      0.750,    0.09, 1123.46,1497.95,      0.750,    0.18, 1147.02,1529.36,      0.375,    0.18,  573.51,1529.36,      1.500,    0.09, 2242.94,1495.30,     0.750,    0.09, 1123.46,1497.95
   INFO    |-> [rocprof] 2,      1.250,    0.09, 1869.12,1495.30,      1.250,    0.18, 1906.48,1525.18,      0.625,    0.18,  956.72,1530.76,      2.500,    0.09, 3738.20,1495.28,     1.250,    0.09, 1872.46,1497.97
   INFO    |-> [rocprof] 3,      1.750,    0.09, 2612.08,1492.62,      1.750,    0.17, 2686.18,1534.96,      0.875,    0.18, 1340.64,1532.16,      3.500,    0.09, 5252.26,1500.65,     1.750,    0.09, 2616.74,1495.28
   INFO    |-> [rocprof] 4,      2.250,    0.09, 3376.45,1500.65,      2.250,    0.18, 3441.07,1529.36,      1.125,    0.18, 1723.67,1532.15,      4.500,    0.09, 6740.77,1497.95,     2.250,    0.09, 3364.38,1495.28
   INFO    |-> [rocprof] 5,      2.750,    0.09, 4112.02,1495.28,      2.750,    0.18, 4205.73,1529.36,      1.375,    0.17, 2112.49,1536.36,      5.500,    0.09, 8209.49,1492.63,     2.750,    0.09, 4082.91,1484.69
   INFO    |-> [rocprof] 6,      3.250,    0.09, 4868.39,1497.97,      3.250,    0.18, 4894.58,1506.03,      1.625,    0.18, 2492.02,1533.55,      6.500,    0.09, 9650.50,1484.69,     3.250,    0.09, 4877.10,1500.65
   INFO    |-> [rocprof] 7,      3.750,    0.09, 5597.32,1492.62,      3.750,    0.18, 5652.67,1507.38,      1.875,    0.17, 2878.05,1534.96,      7.500,    0.09,11214.59,1495.28,     3.750,    0.09, 5607.30,1495.28
   INFO    |-> [rocprof] 8,      4.250,    0.09, 6355.01,1495.30,      4.250,    0.18, 6493.88,1527.97,      2.125,    0.17, 3261.79,1534.96,      8.500,    0.09,12664.72,1489.97,     4.250,    0.09, 6343.63,1492.62
   INFO    |-> [rocprof] 9,      4.750,    0.09, 7090.02,1492.63,      4.750,    0.18, 7251.26,1526.58,      2.375,    0.18, 3635.55,1530.76,      9.500,    0.09,14179.87,1492.62,     4.750,    0.09, 7090.02,1492.63
   INFO    |-> [rocprof] 10,      5.250,    0.09, 7850.21,1495.28,      5.250,    0.18, 7992.73,1522.42,      2.625,    0.18, 4014.58,1529.36,     10.500,    0.09,15644.83,1489.98,     5.250,    0.09, 7808.46,1487.33
   INFO    |-> [rocprof] 11,      5.750,    0.09, 8628.71,1500.65,      5.750,    0.18, 8714.40,1515.55,      2.875,    0.18, 4400.91,1530.75,     11.500,    0.09,17073.97,1484.69,     5.750,    0.09, 8491.99,1476.87
   INFO    |-> [rocprof] 12,      6.250,    0.09, 9328.86,1492.62,      6.250,    0.18, 9345.55,1495.29,      3.125,    0.18, 4706.33,1506.03,     12.500,    0.09,18493.20,1479.46,     6.250,    0.09, 9263.04,1482.09
   INFO    |-> [rocprof] 13,      6.750,    0.09,10057.28,1489.97,      6.750,    0.18,10276.37,1522.42,      3.375,    0.18, 5101.12,1511.44,     13.500,    0.09,20008.16,1482.09,     6.750,    0.09, 9986.33,1479.46
   INFO    |-> [rocprof] 14,      7.250,    0.09,10840.89,1495.30,      7.250,    0.18,10977.81,1514.18,      3.625,    0.18, 5498.83,1516.92,     14.500,    0.09,21528.05,1484.69,     7.250,    0.09,10707.29,1476.87
   INFO    |-> [rocprof] 15,      7.750,    0.09,11547.37,1489.98,      7.750,    0.18,11671.63,1506.02,      3.875,    0.18, 5931.66,1530.75,     15.500,    0.09,22972.08,1482.07,     7.750,    0.09,11365.68,1466.54
   INFO    |-> [rocprof] 16,      8.250,    0.09,12270.43,1487.33,      8.250,    0.18,12004.45,1455.08,      4.125,    0.18, 6291.42,1525.19,     16.500,    0.09,24113.33,1461.41,     8.250,    0.09,12120.01,1469.09
   INFO    |-> [rocprof] 17,      8.750,    0.09,13060.41,1492.62,      8.750,    0.18,12787.44,1461.42,      4.375,    0.18, 6666.62,1523.80,     17.500,    0.09,25754.50,1471.69,     8.750,    0.09,12832.08,1466.52
   INFO    |-> [rocprof] 18,      9.250,    0.09,13782.20,1489.97,      9.250,    0.18,13565.42,1466.53,      4.625,    0.18, 7079.72,1530.75,     18.500,    0.09,27226.18,1471.69,     9.250,    0.09,13565.35,1466.52
   INFO    |-> [rocprof] 20,     10.250,    0.09,15272.16,1489.97,     10.250,    0.18,15018.82,1465.25,      5.125,    0.18, 7781.24,1518.29,     20.500,    0.09,29907.21,1458.89,    10.250,    0.09,14979.49,1461.41
   INFO    |-> [rocprof] 22,     11.250,    0.09,16732.60,1487.34,     11.250,    0.19,15954.58,1418.18,      5.625,    0.18, 8524.96,1515.55,     22.500,    0.10,30890.63,1372.92,    11.250,    0.10,15728.64,1398.10
   INFO    |-> [rocprof] 24,     12.250,    0.09,18123.34,1479.46,     12.250,    0.20,16804.48,1371.79,      6.125,    0.18, 9257.65,1511.45,     24.500,    0.10,31716.19,1294.54,    12.250,    0.10,16233.72,1325.20
   INFO    |-> [rocprof] 28,     14.250,    0.09,21082.25,1479.46,     14.250,    0.22,17735.48,1244.59,      7.125,    0.18,10711.20,1503.33,     28.500,    0.12,33250.80,1166.69,    14.250,    0.11,17199.51,1206.98
   INFO    |-> [rocprof] 32,     16.250,    0.09,23872.75,1469.09,     16.250,    0.24,18408.34,1132.82,      8.125,    0.18,12149.14,1495.28,     32.500,    0.12,35223.20,1083.79,    16.250,    0.12,17703.09,1089.42
   INFO    |-> [rocprof] 40,     20.250,    0.09,29237.09,1443.81,     20.250,    0.29,18874.24, 932.06,     10.125,    0.18,14758.33,1457.61,     40.500,    0.15,36374.10, 898.13,    20.250,    0.15,18226.20, 900.06
   INFO    |-> [rocprof] 48,     24.250,    0.11,30728.66,1267.16,     24.250,    0.34,19091.69, 787.29,     12.125,    0.20,16592.39,1368.44,     48.500,    0.18,37154.81, 766.08,    24.250,    0.17,18714.13, 771.72
   INFO    |-> [rocprof] 56,     28.250,    0.12,32731.51,1158.64,     28.250,    0.39,19297.75, 683.11,     14.125,    0.22,17605.94,1246.44,     56.500,    0.20,37946.48, 671.62,    28.250,    0.20,18988.54, 672.16
   INFO    |-> [rocprof] 64,     32.250,    0.12,34952.25,1083.79,     32.250,    0.44,19455.65, 603.28,     16.125,    0.24,18242.10,1131.29,     64.500,    0.22,38758.08, 600.90,    32.250,    0.24,18083.58, 560.73
   INFO    |-> [rocprof] 80,     40.250,    0.15,35244.18, 875.63,     40.250,    0.55,19698.89, 489.41,     20.125,    0.28,19054.12, 946.79,     80.500,    0.28,38853.74, 482.66,    40.250,    0.29,18480.52, 459.14
   INFO    |-> [rocprof] 96,     48.250,    0.17,37166.94, 770.30,     48.250,    0.65,19850.31, 411.41,     24.125,    0.34,19273.65, 798.91,     96.500,    0.33,39353.22, 407.81,    48.250,    0.35,18764.34, 388.90
   INFO    |-> [rocprof] 128,     64.250,    0.43,20269.43, 315.48,     64.250,    0.86,19991.26, 311.15,     32.125,    0.44,19541.86, 608.31,    128.500,    0.43,40027.06, 311.49,    64.250,    0.46,18825.17, 293.00
   INFO    |-> [rocprof] 256,    128.250,    0.83,20704.99, 161.44,    128.250,    1.70,20250.94, 157.90,     64.125,    0.86,20064.00, 312.89,    256.500,    0.84,41109.34, 160.27,   128.250,    0.90,19197.57, 149.69
   INFO    |-> [rocprof] 512,    256.250,    1.64,20952.97,  81.77,    256.250,    3.37,20387.59,  79.56,    128.125,    1.69,20332.64, 158.69,    512.500,    1.65,41658.26,  81.28,   256.250,    1.77,19388.17,  75.66
   INFO    |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
   INFO    |-> [rocprof] 
   INFO    |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101520_2798403/input0_results_250307_101520
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
   INFO    |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
   INFO    |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
   INFO    |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
   INFO    |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
   INFO    |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
   INFO    |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
   INFO    |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/timestamps.csv' is generating
   INFO    |-> [rocprof] 
   INFO [roofline] Skipping roofline

$ ./src/rocprof-compute analyze -p workloads/mixbench_test_sol/MI200/

                                 __                                       _
 _ __ ___   ___ _ __  _ __ ___  / _|       ___ ___  _ __ ___  _ __  _   _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) |  _|_____| (_| (_) | | | | | | |_) | |_| | ||  __/
|_|  \___/ \___| .__/|_|  \___/|_|        \___\___/|_| |_| |_| .__/ \__,_|\__\___|
               |_|                                           |_|

   INFO Analysis mode = cli
   INFO [analysis] deriving rocprofiler-compute metrics...
WARNING Couldn't load roofline.csv. This may result in missing analysis data.

--------------------------------------------------------------------------------
0. Top Stats
0.1 Top Kernels
╒════╤══════════════════════════════════════════╤═════════╤═════════════╤════════════╤══════════════╤═══════╕
│    │ Kernel_Name                              │   Count │     Sum(ns) │   Mean(ns) │   Median(ns) │   Pct │
╞════╪══════════════════════════════════════════╪═════════╪═════════════╪════════════╪══════════════╪═══════╡
│  0 │ void benchmark_func<HIP_vector_type<floa │    3.00 │ 10122632.00 │ 3374210.67 │   3373944.00 │  8.24 │
│    │ t, 2u>, 256, 8u, 512u>(HIP_vector_type<f │         │             │            │              │       │
│    │ loat, 2u>, HIP_vector_type<float, 2u>... │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  1 │ void benchmark_func<int, 256, 8u, 512u>( │    3.00 │  5322116.00 │ 1774038.67 │   1773932.00 │  4.33 │
│    │ int, int*) [clone .kd]                   │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  2 │ void benchmark_func<HIP_vector_type<floa │    3.00 │  5100516.00 │ 1700172.00 │   1700172.00 │  4.15 │
│    │ t, 2u>, 256, 8u, 256u>(HIP_vector_type<f │         │             │            │              │       │
│    │ loat, 2u>, HIP_vector_type<float, 2u>... │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  3 │ void benchmark_func<double, 256, 8u, 512 │    3.00 │  5075234.00 │ 1691744.67 │   1691692.00 │  4.13 │
│    │ u>(double, double*) [clone .kd]          │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  4 │ void benchmark_func<__half2, 256, 8u, 51 │    3.00 │  4954273.00 │ 1651424.33 │   1651371.00 │  4.03 │
│    │ 2u>(__half2, __half2*) [clone .kd]       │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  5 │ void benchmark_func<float, 256, 8u, 512u │    3.00 │  4925155.00 │ 1641718.33 │   1641452.00 │  4.01 │
│    │ >(float, float*) [clone .kd]             │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  6 │ void benchmark_func<int, 256, 8u, 256u>( │    3.00 │  2690098.00 │  896699.33 │    896646.00 │  2.19 │
│    │ int, int*) [clone .kd]                   │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  7 │ void benchmark_func<HIP_vector_type<floa │    3.00 │  2588978.00 │  862992.67 │    862726.00 │  2.11 │
│    │ t, 2u>, 256, 8u, 128u>(HIP_vector_type<f │         │             │            │              │       │
│    │ loat, 2u>, HIP_vector_type<float, 2u>... │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  8 │ void benchmark_func<double, 256, 8u, 256 │    3.00 │  2574098.00 │  858032.67 │    857926.00 │  2.10 │
│    │ u>(double, double*) [clone .kd]          │         │             │            │              │       │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│  9 │ void benchmark_func<__half2, 256, 8u, 25 │    3.00 │  2513777.00 │  837925.67 │    837926.00 │  2.05 │
│    │ 6u>(__half2, __half2*) [clone .kd]       │         │             │            │              │       │
╘════╧══════════════════════════════════════════╧═════════╧═════════════╧════════════╧══════════════╧═══════╛
0.2 Dispatch List
╒════╤═══════════════╤══════════════════════════════════════════════════════════════════════════════════╤══════════╕
│    │   Dispatch_ID │ Kernel_Name                                                                      │   GPU_ID │
╞════╪═══════════════╪══════════════════════════════════════════════════════════════════════════════════╪══════════╡
│  0 │             0 │ __amd_rocclr_fillBufferAligned.kd                                                │        2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  1 │             1 │ void benchmark_func<short, 256, 8u, 0u>(short, short*) [clone .kd]               │        2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  2 │             2 │ void benchmark_func<float, 256, 8u, 0u>(float, float*) [clone .kd]               │        2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  3 │             3 │ void benchmark_func<float, 256, 8u, 0u>(float, float*) [clone .kd]               │        2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  4 │             4 │ void benchmark_func<float, 256, 8u, 0u>(float, float*) [clone .kd]               │        2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  5 │             5 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 0u>(HIP_vector_type<flo │        2 │
│    │               │ at, 2u>, HIP_vector_type<float, 2u>*) [clone .kd]                                │          │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  6 │             6 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 0u>(HIP_vector_type<flo │        2 │
│    │               │ at, 2u>, HIP_vector_type<float, 2u>*) [clone .kd]                                │          │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  7 │             7 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 0u>(HIP_vector_type<flo │        2 │
│    │               │ at, 2u>, HIP_vector_type<float, 2u>*) [clone .kd]                                │          │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  8 │             8 │ void benchmark_func<double, 256, 8u, 0u>(double, double*) [clone .kd]            │        2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│  9 │             9 │ void benchmark_func<double, 256, 8u, 0u>(double, double*) [clone .kd]            │        2 │
╘════╧═══════════════╧══════════════════════════════════════════════════════════════════════════════════╧══════════╛


--------------------------------------------------------------------------------
1. System Info
╒════════════════════════╤═════════════════════════════════════════════════════╕
│                        │ Info                                                │
╞════════════════════════╪═════════════════════════════════════════════════════╡
│ workload_name          │ mixbench_test_sol                                   │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ command                │ /work1/amd/colramos/dev/mixbench/build/mixbench-hip │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ ip_blocks              │ SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ timestamp              │ Fri 07 Mar 2025 10:15:12 AM  (CST)                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ version                │ 3                                                   │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ hostname               │ login1.hpcfund                                      │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cpu_model              │ AMD EPYC 7V13 64-Core Processor                     │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ sbios                  │ American Megatrends Inc.0602                        │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ linux_distro           │ Rocky Linux 9.4 (Blue Onyx)                         │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ linux_kernel_version   │ 5.14.0-162.18.1.el9_1.x86_64                        │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ amd_gpu_kernel_version │ nan                                                 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cpu_memory             │ 527651060                                           │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_memory             │ nan                                                 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ rocm_version           │ 6.3.1-48                                            │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ vbios                  │ 113-D67301V-073                                     │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ compute_partition      │ nan                                                 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ memory_partition       │ nan                                                 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_series             │ MI200                                               │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_model              │ MI200                                               │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_arch               │ gfx90a                                              │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_l1                 │ 16                                                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_l2                 │ 8192                                                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cu_per_gpu             │ 104                                                 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ simd_per_cu            │ 4                                                   │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ se_per_gpu             │ 8                                                   │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ wave_size              │ 64                                                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ workgroup_max_size     │ 1024                                                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ chip_id                │ 29711                                               │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ max_waves_per_cu       │ 32                                                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ max_sclk               │ 1700                                                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ max_mclk               │ 1600                                                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cur_sclk               │ 1700                                                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cur_mclk               │ 1600                                                │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ total_l2_chan          │ 32                                                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ lds_banks_per_cu       │ 32                                                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ sqc_per_gpu            │ 56                                                  │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ pipes_per_gpu          │ 4                                                   │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ hbm_bw                 │ 1638.4                                              │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ num_xcd                │ 1                                                   │
╘════════════════════════╧═════════════════════════════════════════════════════╛

   INFO Not showing table not selected during profiling: 2.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 3.1 Memory Chart
   INFO Not showing table not selected during profiling: 4.1 Roofline
   INFO Not showing table not selected during profiling: 5.1 Command Processor Fetcher
   INFO Not showing table not selected during profiling: 5.2 Packet Processor
   INFO Not showing table not selected during profiling: 6.1 Workgroup Manager Utilizations
   INFO Not showing table not selected during profiling: 6.2 Workgroup Manager - Resource Allocation
   INFO Not showing table not selected during profiling: 7.1 Wavefront Launch Stats
   INFO Not showing table not selected during profiling: 7.2 Wavefront Runtime Stats
   INFO Not showing table not selected during profiling: 10.1 Overall Instruction Mix
   INFO Not showing table not selected during profiling: 10.2 VALU Arithmetic Instr Mix
   INFO Not showing table not selected during profiling: 10.3 VMEM Instr Mix
   INFO Not showing table not selected during profiling: 10.4 MFMA Arithmetic Instr Mix
   INFO Not showing table not selected during profiling: 11.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 11.2 Pipeline Stats
   INFO Not showing table not selected during profiling: 11.3 Arithmetic Operations
   INFO Not showing table not selected during profiling: 12.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 12.2 LDS Stats
   INFO Not showing table not selected during profiling: 13.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 13.2 Instruction Cache Accesses
   INFO Not showing table not selected during profiling: 13.3 Instruction Cache - L2 Interface
   INFO Not showing table not selected during profiling: 14.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 14.2 Scalar L1D Cache Accesses
   INFO Not showing table not selected during profiling: 14.3 Scalar L1D Cache - L2 Interface
   INFO Not showing table not selected during profiling: 15.1 Address Processing Unit
   INFO Not showing table not selected during profiling: 15.2 Data-Return Path
   INFO Not showing table not selected during profiling: 16.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 16.2 L1D Cache Stalls (%)
   INFO Not showing table not selected during profiling: 16.3 L1D Cache Accesses
   INFO Not showing table not selected during profiling: 16.4 L1D - L2 Transactions
   INFO Not showing table not selected during profiling: 16.5 L1D Addr Translation
   INFO Not showing table not selected during profiling: 17.1 Speed-of-Light
   INFO Not showing table not selected during profiling: 17.2 L2 - Fabric Transactions
   INFO Not showing table not selected during profiling: 17.3 L2 Cache Accesses
   INFO Not showing table not selected during profiling: 17.4 L2 - Fabric Interface Stalls
   INFO Not showing table not selected during profiling: 17.5 L2 - Fabric Detailed Transaction Breakdown
   INFO Not showing table not selected during profiling: 18.1 Aggregate Stats (All channels)
   INFO Not showing table not selected during profiling: 18.2 L2 Cache Hit Rate (pct)
   INFO Not showing table not selected during profiling: 18.3 L2 Requests (per normUnit)
   INFO Not showing table not selected during profiling: 18.4 L2 Requests (per normUnit)
   INFO Not showing table not selected during profiling: 18.5 L2-Fabric Requests (per normUnit)
   INFO Not showing table not selected during profiling: 18.6 L2-Fabric Read Latency (Cycles)
   INFO Not showing table not selected during profiling: 18.7 L2-Fabric Write and Atomic Latency (Cycles)
   INFO Not showing table not selected during profiling: 18.8 L2-Fabric Atomic Latency (Cycles)
   INFO Not showing table not selected during profiling: 18.9 L2-Fabric Read Stall (Cycles per normUnit)
   INFO Not showing table not selected during profiling: 18.10 L2-Fabric Write and Atomic Stall (Cycles per normUnit)

Other than this, looks good aside from the few minor comments below

analysis report which focuses on metrics associated with a hardware component or
a group of hardware components. All profiling results are accumulated in the same
target directory without overwriting those for other hardware components.
This enables incremental profiling and analysis.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is potentially the first time a new user is being exposed to the idea of "report block filtering" it may be useful to add a sentence that explains to map analysis blocks to section id's. If we explain it already, we can just link to it, but it's not obvious for a new user how to know which number to use

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 83, in the "Analyze in the command line" section, we may want to change "hardware block filters" -> "hardware report block filters" for consistency

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Want to change the example we print in rocprof-compute profile --help to use the new -b/--block formatting? Since it's still showing users the soon to be deprecated format

metavar="",
nargs="?",
const="",
help=print_avail_arch(supported_archs.keys()),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit picky, but i see print_avail_arch() only prints with two tabs, while all other sections use three. This makes for a little more ugly print out, i.e.,

(omniperf) [colramos@t008-006 omniperf]$ ./src/rocprof-compute profile -h
usage: 

rocprof-compute profile --name <workload_name> [profile options] [roofline options] -- <profile_cmd>

---------------------------------------------------------------------------------
Examples:
        rocprof-compute profile -n vcopy_all -- ./vcopy -n 1048576 -b 256
        rocprof-compute profile -n vcopy_SPI_TCC -b SQ TCC -- ./vcopy -n 1048576 -b 256
        rocprof-compute profile -n vcopy_kernel -k vecCopy -- ./vcopy -n 1048576 -b 256
        rocprof-compute profile -n vcopy_disp -d 0 -- ./vcopy -n 1048576 -b 256
        rocprof-compute profile -n vcopy_roof --roof-only -- ./vcopy -n 1048576 -b 256
---------------------------------------------------------------------------------
        

Help:
  -h, --help                       show this help message and exit

General Options:
  -v, --version                    show program's version number and exit
  -V, --verbose                    Increase output verbosity (use multiple times for higher levels)
  -q, --quiet                      Reduce output and run quietly.
  -s, --specs                      Print system specs and exit.

Profile Options:
  -n , --name                                           Assign a name to workload.
  -p , --path                                           Specify path to save workload.
                                                        (DEFAULT: /work1/amd/colramos/audacious/omniperf/workloads/<name>)
  --subpath                                             Specify the type of subpath to save workload: node_name, gpu_model.
  --hip-trace                                           HIP trace, execturion trace for the entire application at the HIP level.
  -k  [ ...], --kernel  [ ...]                          Kernel filtering.
  -d  [ ...], --dispatch  [ ...]                        Dispatch ID filtering.
  -b  [ ...], --block  [ ...]                           Specify metric id(s) from --list-metrics for filtering (e.g. 10, 4, 4.3).
                                                        Can provide multiple space separated arguments.
                                                        Can also accept Hardware blocks.
                                                        Hardware block filtering (to be deprecated soon):
                                                           SQ
                                                           SQC
                                                           TA
                                                           TD
                                                           TCP
                                                           TCC
                                                           SPI
                                                           CPC
                                                           CPF
  --list-metrics []                             List all available metrics for analysis on specified arch:
                                                   gfx906
                                                   gfx908
                                                   gfx90a
                                                   gfx940
                                                   gfx941
                                                   gfx942
  --config-dir                                  Specify the directory of customized report section configs.
  --join-type                                           Choose how to join rocprof runs: (DEFAULT: grid)
                                                           kernel (i.e. By unique kernel name dispatches)
                                                           grid (i.e. By unique kernel name + grid size dispatches)
  --no-roof                                             Profile without collecting roofline data.
  -- [ ...]                                             Provide command for profiling after double dash.
  --spatial-multiplexing  [ ...]                        Provide Node ID and GPU number per node.
  --format-rocprof-output                               Set the format of output file of rocprof.

Standalone Roofline Options:
  --roof-only                                           Profile roofline data only.
  --sort                                                Overlay top kernels or top dispatches: (DEFAULT: kernels)
                                                           kernels
                                                           dispatches
  -m  [ ...], --mem-level  [ ...]                       Filter by memory level: (DEFAULT: ALL)
                                                           HBM
                                                           L2
                                                           vL1D
                                                           LDS
  --device                                              Target GPU device ID. (DEFAULT: ALL)
  --kernel-names                                        Include kernel names in roofline plot.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That being said, I checked out how we format rocprof-compute analyze -h and it uses two tabs. Since print_avail_arch() is used in both analyze and profile mode, it may be worth checking to see if we can bump all profile options down to two tabs

src/argparser.py Outdated
"--config-dir",
dest="config_dir",
metavar="",
help="\t\tSpecify the directory of customized report section configs.",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should also use three tabs for the same reason cited above

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One other high-level comment for the arg parser. Since we've added some new options recently, it may make sense to go in and audit the order in which we print out options. For example, move --subpath lower next to --spatial-multiplexing and move -- [...] higer up since its more commonly used

)
profile_group.add_argument(
"--list-metrics",
metavar="",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can also use argparser's choices parameter to easily catch and handle invalid input (i.e. choices=supported_archs.keys())

@vedithal-amd
Copy link
Contributor Author

Need to add title to src/rocprof_compute_soc/analysis_configs/gfx90a/0400_roofline_chart.yml or remove the file, otherwise we'll fail with

This file does not have any metrics, so we can delete that

I could be confused, but my expectation would be that based on these commands, the analysis output would print my SOL table... Thoughts?

Yes that is a bug which I have fixed now, thanks for testing thoroughly @coleramos425

* Profiling mode changes

- `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23)
    - Only counters mentioned in the selected analysis report blocks will be collected
        - Add parsing logic to identify hardware counters from analysis report blocks
        - Add filtering logic to only write filtered counters in perfmon files
        - Log not collected counters in one line
- `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode
- Write arguments provided during profiling in profiling_configuration.yaml file

* Analysis mode changes

- During analysis mode, only show report blocks selected during profiling
    - If `-b` option is provided in analysis mode, then follow provided filters
- Do not show empty tables in analysis report

* Miscellaneous changes

- Update CHANGELOG
- Add test cases
    - Instruction mix report block filter
    - Instruction mix and Memory chart report block filter
    - Instruction mix report block filter and CPC hardware block filter
    - TA hardware block filter
    - --list-metrics in profile mode should work
- Move binary handler fixtures to conftest.py to avoid importing
  fixtures

* Public documentation changes

- Use the term "Hardware report block" instead of "Hardware block"
- Add documentation for "--list-metrics" option in profile mode
- Add example of filtering by hardware report block such as instruction
  mix and wavefront launch statistics
- Add deprecation warning for hardware component (sq, tcc) based filtering
@vedithal-amd vedithal-amd force-pushed the vedithal/selective-counter branch from 952ce42 to 75213ad Compare March 7, 2025 17:31
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___|
|_| |_|

rocprofiler-compute version: 2.0.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe 2.0.0 -->x.x.x

args = self.__args
for section in self.__filter_metric_ids:
section_num = convert_metric_id_to_panel_idx(section)
file_id = str(section_num // 100)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Eventually, we might want to centrelize the code in one place, not now

@vedithal-amd vedithal-amd merged commit 55cf0e2 into ROCm:develop Mar 10, 2025
11 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants