-
Notifications
You must be signed in to change notification settings - Fork 51
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Analysis report block based filtering for profiling #566
Analysis report block based filtering for profiling #566
Conversation
@skyreflectedinmirrors , @gsitaram, please help to review the new profiling option: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since there are quite a lot of changes, we must test the tool thoroughly before deployment. Submitted some comments from our team discussion earlier for now.
I feel that we should keep |
Me and @feizheng10 had a discussion about this feature today... Currently the way '-b' or '--block' option works in 'analyze' and 'profile' is different as shown below
The former filters the analysis report based on 'report block' such as 'System Speed of Light', 'Memory Chart', 'Wavefront Launch statistics' etc.. The latter filters the profiling operation based on hardware IP blocks such as TA, TCP, SQ etc... This behavior is inconsistent, and we would like to remove 'hardware IP block' based filtering in 'profile' mode in favor of 'report block' based filtering. The former is less useful for kernel developers/profilers as there is no one to one correspondence between hardware IP block and analysis report blocks. For example, filtering by only TCP (L1 cache) or TCC (L2 cache) will affect 'System Speed of Light', 'Memory Chart', 'Instruction Cache' report blocks. Both methods of filtering will save up on profiling time, so we are not losing up on that here. We are thinking of supporting all 19 yaml files for report blocks using the '-b' option during 'profile' mode (instead of specifying hardware IP block). Users can filter based on multiple report blocks and sub-blocks using block numbers (instead of ambiguous acronyms), for example, 'rocprof-compute profile -b 4, 4.5, 5, 5.6' To get the report block numbers corresponding to report block titles, you can use the '--list-metrics' options during 'analyze' mode. We want to replicate this in 'profile' mode, such that, users can grep for the report block title name and obtain the report block numbers to be used for filtering. For example:
--list-metrics will take an optional argument for GPU GFX architecture since report blocks maybe different per architecture. If no argument is provided, it will be automatically detected using 'rocm-smi' tool. To summarize:
NOTE that this will break backward compatibility in the sense that '-b' option in profile mode will work differently. @gsitaram, @skyreflectedinmirrors, could you please provide your comments on the above implementation suggestions. |
I like the idea of unifying what |
Sure, we will add a warning upon usage of One thing to note, in this PR, I have updated profile mode to dump the profiling filters in the workload folder so that when analyze mode is run it will only show the report blocks that have been filtered during profiling. If you want to see other report blocks you will explicitly have to mention them in the |
I like the idea, but I think this would whole concept will need accompanying docs updates. A few specific comments:
One question there: does 4.5 match 14.5 and 4.5 (e.g.)? I.e., is this an exact match, or a regex search, etc.?
I would like to see what that looks like, but I like the general concept.
Instead of changing the default, I'd suggest you simply expand the list of choices ( rocprofiler-compute/src/argparser.py Line 187 in 649660d
That way you don't break anyone's existing workflow, while also ensuring anyone using this option will see the warning. |
It is going to be an exact match not regex. Default is filter for all report blocks and IP blocks. If you want to filter for all report blocks but one, you need to specify all but one on cmdline. I think adding regex will be confusing for developer and user even though it is more flexible. Analyze mode also does exact match.
It would look like this like I mentioned above :)
I like the idea of phased deprecation. In first phase (ROCm 6.5)
Thanks for your feedback, I will add checklist item to update rocprof-compute public docs and also update changelog |
491d6ec
to
40c11f8
Compare
6a48faa
to
c014784
Compare
e0d9ead
to
3842785
Compare
@gsitaram @skyreflectedinmirrors I have implemented the changes discussed above and tested them thoroughly (both manual and automatic) in this feature branch. Please let me know if you have any more comments before i merge this PR? @feizheng10 @coleramos425 , there has been considerable changes to the code post our discussion, could you please review and approve Thanks! |
4e177a0
to
5b3a2c8
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vedithal-amd, this looks great! Two bugs I caught while testing:
- Need to add
title
tosrc/rocprof_compute_soc/analysis_configs/gfx90a/0400_roofline_chart.yml
or remove the file, otherwise we'll fail with
$ ./src/rocprof-compute analyze -p workloads/mix/MI200/
...
INFO Not showing table not selected during profiling: 2.1 Speed-of-Light
INFO Not showing table not selected during profiling: 3.1 Memory Chart
Traceback (most recent call last):
File "/work1/amd/colramos/audacious/omniperf/./src/rocprof-compute", line 156, in <module>
main()
File "/work1/amd/colramos/audacious/omniperf/./src/rocprof-compute", line 148, in main
rocprof_compute.run_analysis()
File "/work1/amd/colramos/audacious/omniperf/src/utils/utils.py", line 53, in wrap_function
result = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work1/amd/colramos/audacious/omniperf/src/rocprof_compute_base.py", line 423, in run_analysis
analyzer.run_analysis()
File "/work1/amd/colramos/audacious/omniperf/src/utils/utils.py", line 53, in wrap_function
result = function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/work1/amd/colramos/audacious/omniperf/src/rocprof_compute_analyze/analysis_cli.py", line 96, in run_analysis
tty.show_all(
File "/work1/amd/colramos/audacious/omniperf/src/utils/tty.py", line 102, in show_all
f"Not showing table not selected during profiling: {table_id_str} {table_config['title']}"
~~~~~~~~~~~~^^^^^^^^^
KeyError: 'title'
- I could be confused, but my expectation would be that based on these commands, the analysis output would print my SOL table... Thoughts?
Log
$ ./src/rocprof-compute profile -n mixbench_test_sol -b 2 --no-roof -- $WORK/dev/mixbench/build/mixbench-hip
__ _
_ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___|
|_| |_|
INFO Not collecting following counters per provided filter: TCP_GATE_EN1_sum, TCP_GATE_EN2_sum, TCP_TD_TCP_STALL_CYCLES_sum, TCP_TCR_TCP_STALL_CYCLES_sum, TCP_READ_TAGCONFLICT_STALL_CYCLES_sum, TCP_WRITE_TAGCONFLICT_STALL_CYCLES_sum, TCP_ATOMIC_TAGCONFLICT_STALL_CYCLES_sum, TCP_TA_TCP_STATE_READ_sum, TCP_VOLATILE_sum, TCP_TOTAL_ACCESSES_sum, TCP_TOTAL_READ_sum, TCP_TOTAL_WRITE_sum, TCP_TOTAL_ATOMIC_WITH_RET_sum, TCP_TOTAL_ATOMIC_WITHOUT_RET_sum, TCP_TOTAL_WRITEBACK_INVALIDATES_sum, TCP_UTCL1_TRANSLATION_MISS_sum, TCP_UTCL1_TRANSLATION_HIT_sum, TCP_UTCL1_PERMISSION_MISS_sum, TCP_UTCL1_REQUEST_sum, TCP_TCP_LATENCY_sum, TCP_TCC_READ_REQ_LATENCY_sum, TCP_TCC_WRITE_REQ_LATENCY_sum, TCP_TCC_NC_READ_REQ_sum, TCP_TCC_NC_WRITE_REQ_sum, TCP_TCC_NC_ATOMIC_REQ_sum, TCP_TCC_UC_READ_REQ_sum, TCP_TCC_UC_WRITE_REQ_sum, TCP_TCC_UC_ATOMIC_REQ_sum, TCP_TCC_CC_READ_REQ_sum, TCP_TCC_CC_WRITE_REQ_sum, TCP_TCC_CC_ATOMIC_REQ_sum, TCP_TCC_RW_READ_REQ_sum, TCP_TCC_RW_WRITE_REQ_sum, TCP_TCC_RW_ATOMIC_REQ_sum, TCP_PENDING_STALL_CYCLES_sum, TCC_CYCLE_sum, TCC_BUSY_sum, TCC_PROBE_sum, TCC_PROBE_ALL_sum, TCC_NC_REQ_sum, TCC_UC_REQ_sum, TCC_CC_REQ_sum, TCC_RW_REQ_sum, TCC_STREAMING_REQ_sum, TCC_READ_sum, TCC_WRITE_sum, TCC_ATOMIC_sum, TCC_WRITEBACK_sum, TCC_EA_WR_UNCACHED_32B_sum, TCC_EA_WRREQ_DRAM_sum, TCC_EA_WRREQ_STALL_sum, TCC_EA_RD_UNCACHED_32B_sum, TCC_EA_RDREQ_DRAM_sum, TCC_TAG_STALL_sum, TCC_NORMAL_WRITEBACK_sum, TCC_ALL_TC_OP_WB_WRITEBACK_sum, TCC_NORMAL_EVICT_sum, TCC_ALL_TC_OP_INV_EVICT_sum, TCC_TOO_MANY_EA_WRREQS_STALL_sum, TCC_EA_ATOMIC_sum, TCC_EA_ATOMIC_LEVEL_sum, TA_TA_BUSY_sum, TA_BUFFER_WAVEFRONTS_sum, TA_BUFFER_READ_WAVEFRONTS_sum, TA_BUFFER_WRITE_WAVEFRONTS_sum, TA_BUFFER_ATOMIC_WAVEFRONTS_sum, TA_BUFFER_TOTAL_CYCLES_sum, TA_BUFFER_COALESCED_READ_CYCLES_sum, TA_BUFFER_COALESCED_WRITE_CYCLES_sum, TA_ADDR_STALLED_BY_TC_CYCLES_sum, TA_TOTAL_WAVEFRONTS_sum, TA_ADDR_STALLED_BY_TD_CYCLES_sum, TA_DATA_STALLED_BY_TC_CYCLES_sum, TA_FLAT_WAVEFRONTS_sum, TA_FLAT_READ_WAVEFRONTS_sum, TA_FLAT_WRITE_WAVEFRONTS_sum, TA_FLAT_ATOMIC_WAVEFRONTS_sum, CPF_CPF_STAT_BUSY, CPF_CPF_STAT_STALL, CPF_CPF_TCIU_BUSY, CPF_CPF_TCIU_STALL, CPF_CPF_STAT_IDLE, CPF_CPF_TCIU_IDLE, CPF_CMP_UTCL1_STALL_ON_TRANSLATION, TD_TD_BUSY_sum, TD_TC_STALL_sum, TD_SPI_STALL_sum, TD_LOAD_WAVEFRONT_sum, TD_ATOMIC_WAVEFRONT_sum, TD_STORE_WAVEFRONT_sum, TD_COALESCABLE_WAVEFRONT_sum, SQC_TC_INST_REQ, SQC_TC_DATA_READ_REQ, SQC_TC_DATA_WRITE_REQ, SQC_TC_DATA_ATOMIC_REQ, SQC_TC_STALL, SQC_TC_REQ, SQC_DCACHE_REQ_READ_16, SQC_ICACHE_MISSES_DUPLICATE, SQC_DCACHE_INPUT_VALID_READYB, SQC_DCACHE_ATOMIC, SQC_DCACHE_REQ_READ_8, SQC_DCACHE_MISSES_DUPLICATE, SQC_DCACHE_REQ_READ_1, SQC_DCACHE_REQ_READ_2, SQC_DCACHE_REQ_READ_4, SQ_INSTS_VALU_CVT, SQ_INSTS_VMEM_WR, SQ_INSTS_VMEM_RD, SQ_INSTS_SALU, SQ_INSTS_VSKIPPED, SQ_INSTS_VALU, SQ_INSTS_FLAT, SQ_INSTS_GDS, SQ_INSTS_EXP_GDS, SQ_INSTS_BRANCH, SQ_INSTS_SENDMSG, SQ_WAIT_ANY, SQ_WAIT_INST_ANY, SQ_ACTIVE_INST_ANY, SQ_ACTIVE_INST_LDS, SQ_ACTIVE_INST_EXP_GDS, SQ_INST_CYCLES_VMEM_WR, SQ_INST_CYCLES_VMEM_RD, SQ_INST_CYCLES_SMEM, SQ_INST_CYCLES_SALU, SQ_LDS_ADDR_CONFLICT, SQ_LDS_UNALIGNED_STALL, SQ_WAVES_EQ_64, SQ_WAVES_LT_64, SQ_WAVES_LT_48, SQ_WAVES_LT_32, SQ_WAVES_LT_16, SQ_ITEMS, SQ_LDS_MEM_VIOLATIONS, SQ_LDS_ATOMIC_RETURN, SQ_WAVES_RESTORED, SQ_WAVES_SAVED, SQ_INSTS_SMEM_NORM, SQ_INSTS_MFMA, SQ_INSTS_VALU_MFMA_I8, SQ_INSTS_VALU_MFMA_F16, SQ_INSTS_VALU_MFMA_BF16, SQ_INSTS_VALU_MFMA_F32, SQ_INSTS_VALU_MFMA_F64, SQ_INSTS_FLAT_LDS_ONLY, CPC_CPC_STAT_BUSY, CPC_CPC_STAT_IDLE, CPC_CPC_TCIU_BUSY, CPC_CPC_TCIU_IDLE, CPC_CPC_STAT_STALL, CPC_UTCL1_STALL_ON_TRANSLATION, CPC_CPC_UTCL2IU_BUSY, CPC_CPC_UTCL2IU_IDLE, CPC_CPC_UTCL2IU_STALL, CPC_ME1_DC0_SPI_BUSY, TCC_CYCLE_expand, TCC_RW_REQ_expand, TCC_READ_expand, TCC_WRITE_expand, TCC_ATOMIC_expand, TCC_EA_ATOMIC_expand, TCC_EA_ATOMIC_LEVEL_expand, TCC_EA_RDREQ_IO_CREDIT_STALL_expand, TCC_EA_RDREQ_GMI_CREDIT_STALL_expand, TCC_EA_RDREQ_DRAM_CREDIT_STALL_expand, TCC_EA_WRREQ_IO_CREDIT_STALL_expand, TCC_EA_WRREQ_GMI_CREDIT_STALL_expand, TCC_EA_WRREQ_DRAM_CREDIT_STALL_expand, TCC_TOO_MANY_EA_WRREQS_STALL_expand, GRBM_SPI_BUSY, SPI_CSN_WINDOW_VALID, SPI_CSN_BUSY, SPI_CSN_NUM_THREADGROUPS, SPI_CSN_WAVE, SPI_RA_REQ_NO_ALLOC, SPI_RA_REQ_NO_ALLOC_CSN, SPI_RA_RES_STALL_CSN, SPI_RA_TMP_STALL_CSN, SPI_RA_WAVE_SIMD_FULL_CSN, SPI_RA_VGPR_SIMD_FULL_CSN, SPI_RA_SGPR_SIMD_FULL_CSN, SPI_RA_LDS_CU_FULL_CSN, SPI_RA_BAR_CU_FULL_CSN, SPI_RA_TGLIM_CU_FULL_CSN, SPI_RA_WVLIM_STALL_CSN, SPI_SWC_CSC_WR, SPI_VWC_CSC_WR, SPI_RA_BULKY_CU_FULL_CSN
INFO Rocprofiler-Compute version: 3.1.0
INFO Profiler choice: rocprofv1
INFO Path: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200
INFO Target: MI200
INFO Command: /work1/amd/colramos/dev/mixbench/build/mixbench-hip
INFO Kernel Selection: None
INFO Dispatch Selection: None
INFO Hardware Blocks: []
INFO Report Sections: ['2']
INFO
INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO Collecting Performance Counters
INFO ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
INFO
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_IFETCH_LEVEL.txt
INFO |-> [rocprof] RPL: on '250307_101513' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_IFETCH_LEVEL.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101513_2796904'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101513_2796904/input0_results_250307_101513'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101513_2796904/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 15 metrics
INFO |-> [rocprof] SQ_WAVES, SQ_IFETCH, SQ_IFETCH_LEVEL, SQ_ACCUM_PREV_HIRES, SQC_DCACHE_HITS, SQC_DCACHE_MISSES, SQ_INSTS, SQ_INSTS_VALU_ADD_F16, TCP_TCC_ATOMIC_WITHOUT_RET_REQ_sum, TCC_EA_WRREQ_64B_sum, TCC_EA_RDREQ_sum, TCC_EA_RDREQ_32B_sum, TCC_EA_RDREQ_LEVEL_sum, GRBM_COUNT, GRBM_GUI_ACTIVE
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 364.71,1458.86, 0.250, 0.18, 375.83,1503.30, 0.125, 0.18, 187.08,1496.60, 0.500, 0.09, 709.68,1419.36, 0.250, 0.09, 358.48,1433.92
INFO |-> [rocprof] 1, 0.750, 0.09, 1069.95,1426.60, 0.750, 0.18, 1115.48,1487.31, 0.375, 0.18, 564.75,1506.00, 1.500, 0.09, 2169.42,1446.28, 0.750, 0.09, 1071.78,1429.03
INFO |-> [rocprof] 2, 1.250, 0.09, 1804.74,1443.79, 1.250, 0.18, 1875.78,1500.62, 0.625, 0.18, 939.56,1503.30, 2.500, 0.09, 3584.80,1433.92, 1.250, 0.09, 1798.55,1438.84
INFO |-> [rocprof] 3, 1.750, 0.09, 2509.36,1433.92, 1.750, 0.18, 2619.04,1496.60, 0.875, 0.18, 1314.22,1501.96, 3.500, 0.09, 5018.77,1433.93, 1.750, 0.09, 2505.08,1431.47
INFO |-> [rocprof] 4, 2.250, 0.09, 3220.81,1431.47, 2.250, 0.18, 3385.48,1504.66, 1.125, 0.18, 1686.68,1499.27, 4.500, 0.09, 6474.77,1438.84, 2.250, 0.09, 3242.95,1441.31
INFO |-> [rocprof] 5, 2.750, 0.09, 3923.16,1426.60, 2.750, 0.18, 4097.36,1489.95, 1.375, 0.18, 2065.19,1501.96, 5.500, 0.09, 7900.06,1436.37, 2.750, 0.09, 3950.03,1436.37
INFO |-> [rocprof] 6, 3.250, 0.09, 4741.28,1458.86, 3.250, 0.18, 4863.94,1496.60, 1.625, 0.18, 2440.69,1501.96, 6.500, 0.09, 9368.52,1441.31, 3.250, 0.09, 4708.53,1448.78
INFO |-> [rocprof] 7, 3.750, 0.09, 5395.64,1438.84, 3.750, 0.18, 5567.57,1484.68, 1.875, 0.18, 2818.69,1503.30, 7.500, 0.09,10699.53,1426.60, 3.750, 0.09, 5349.76,1426.60
INFO |-> [rocprof] 8, 4.250, 0.09, 6073.39,1429.03, 4.250, 0.18, 6349.21,1493.93, 2.125, 0.18, 3185.95,1499.27, 8.500, 0.09,12085.03,1421.77, 4.250, 0.09, 6042.51,1421.77
INFO |-> [rocprof] 9, 4.750, 0.09, 6799.49,1431.47, 4.750, 0.18, 7096.17,1493.93, 2.375, 0.18, 3554.48,1496.62, 9.500, 0.09,13692.60,1441.33, 4.750, 0.09, 6893.68,1451.30
INFO |-> [rocprof] 10, 5.250, 0.09, 7619.33,1451.30, 5.250, 0.18, 7836.29,1492.63, 2.625, 0.18, 3925.13,1495.29, 10.500, 0.10,14753.83,1405.13, 5.250, 0.09, 7528.16,1433.93
INFO |-> [rocprof] 11, 5.750, 0.09, 8175.25,1421.78, 5.750, 0.18, 8544.59,1486.02, 2.875, 0.18, 4314.33,1500.64, 11.500, 0.09,16350.50,1421.78, 5.750, 0.09, 8133.98,1414.61
INFO |-> [rocprof] 12, 6.250, 0.09, 8962.09,1433.93, 6.250, 0.18, 9262.99,1482.08, 3.125, 0.18, 4660.31,1491.30, 12.500, 0.09,17712.43,1416.99, 6.250, 0.10, 8796.78,1407.48
INFO |-> [rocprof] 13, 6.750, 0.09, 9679.16,1433.95, 6.750, 0.18,10012.87,1483.39, 3.375, 0.18, 5006.44,1483.39, 13.500, 0.09,19194.07,1421.78, 6.750, 0.09, 9728.95,1441.33
INFO |-> [rocprof] 14, 7.250, 0.09,10576.83,1458.87, 7.250, 0.18,10716.66,1478.16, 3.625, 0.18, 5353.62,1476.86, 14.500, 0.10,20442.83,1409.85, 7.250, 0.10,10170.14,1402.78
INFO |-> [rocprof] 15, 7.750, 0.09,11094.03,1431.49, 7.750, 0.18,11395.51,1470.39, 3.875, 0.18, 5794.24,1495.29, 15.500, 0.10,21742.82,1402.76, 7.750, 0.10,10853.26,1400.42
INFO |-> [rocprof] 16, 8.250, 0.09,11789.78,1429.06, 8.250, 0.19,11962.95,1450.05, 4.125, 0.18, 6146.11,1489.97, 16.500, 0.10,23145.83,1402.78, 8.250, 0.10,11534.34,1398.10
INFO |-> [rocprof] 17, 8.750, 0.09,12461.72,1424.20, 8.750, 0.19,12655.16,1446.30, 4.375, 0.18, 6524.44,1491.30, 17.500, 0.10,24385.23,1393.44, 8.750, 0.10,12274.17,1402.76
INFO |-> [rocprof] 18, 9.250, 0.09,13401.34,1448.79, 9.250, 0.19,13332.34,1441.33, 4.625, 0.18, 6897.26,1491.30, 18.500, 0.10,25907.78,1400.42, 9.250, 0.09,13084.96,1414.59
INFO |-> [rocprof] 20, 10.250, 0.09,14672.91,1431.50, 10.250, 0.19,14710.40,1435.16, 5.125, 0.18, 7649.71,1492.63, 20.500, 0.10,28237.22,1377.43, 10.250, 0.10,14450.96,1409.85
INFO |-> [rocprof] 22, 11.250, 0.09,16215.09,1441.34, 11.250, 0.20,15432.69,1371.79, 5.625, 0.18, 8381.11,1489.98, 22.500, 0.10,29537.06,1312.76, 11.250, 0.10,15369.85,1366.21
INFO |-> [rocprof] 24, 12.250, 0.09,17446.41,1424.20, 12.250, 0.20,16337.03,1333.64, 6.125, 0.18, 9021.94,1472.97, 24.500, 0.11,30046.64,1226.39, 12.250, 0.11,15546.06,1269.07
INFO |-> [rocprof] 28, 14.250, 0.09,20503.89,1438.87, 14.250, 0.22,17199.59,1206.99, 7.125, 0.18,10504.13,1474.26, 28.500, 0.12,31961.67,1121.46, 14.250, 0.12,16602.31,1165.07
INFO |-> [rocprof] 32, 16.250, 0.09,23064.88,1419.38, 16.250, 0.24,17947.90,1104.49, 8.125, 0.18,11873.99,1461.41, 32.500, 0.13,33825.03,1040.77, 16.250, 0.13,17233.23,1060.51
INFO |-> [rocprof] 40, 20.250, 0.10,27576.19,1361.79, 20.250, 0.29,18759.60, 926.40, 10.125, 0.19,14518.67,1433.94, 40.500, 0.15,35426.11, 874.72, 20.250, 0.15,17694.60, 873.81
INFO |-> [rocprof] 48, 24.250, 0.11,29185.62,1203.53, 24.250, 0.34,19082.79, 786.92, 12.125, 0.20,16234.77,1338.95, 48.500, 0.18,36586.80, 754.37, 24.250, 0.18,18326.36, 755.73
INFO |-> [rocprof] 56, 28.250, 0.12,31554.75,1116.98, 28.250, 0.39,19250.72, 681.44, 14.125, 0.22,17247.24,1221.04, 56.500, 0.20,37496.36, 663.65, 28.250, 0.20,18644.83, 659.99
INFO |-> [rocprof] 64, 32.250, 0.13,33689.98,1044.65, 32.250, 0.45,19427.70, 602.41, 16.125, 0.24,17975.44,1114.76, 64.500, 0.23,38156.76, 591.58, 32.250, 0.24,17904.06, 555.16
INFO |-> [rocprof] 80, 40.250, 0.16,34665.23, 861.25, 40.250, 0.55,19635.88, 487.85, 20.125, 0.29,18541.42, 921.31, 80.500, 0.28,38390.02, 476.89, 40.250, 0.30,18300.23, 454.66
INFO |-> [rocprof] 96, 48.250, 0.18,36529.40, 757.09, 48.250, 0.65,19792.04, 410.20, 24.125, 0.34,18940.01, 785.08, 96.500, 0.33,39162.83, 405.83, 48.250, 0.35,18669.19, 386.93
INFO |-> [rocprof] 128, 64.250, 0.43,20170.82, 313.94, 64.250, 0.87,19932.11, 310.23, 32.125, 0.44,19436.15, 605.02, 128.500, 0.43,39775.97, 309.54, 64.250, 0.46,18720.55, 291.37
INFO |-> [rocprof] 256, 128.250, 0.83,20649.35, 161.01, 128.250, 1.70,20228.10, 157.72, 64.125, 0.86,20049.04, 312.66, 256.500, 0.84,41054.43, 160.06, 128.250, 0.90,19170.20, 149.48
INFO |-> [rocprof] 512, 256.250, 1.64,20936.66, 81.70, 256.250, 3.37,20384.70, 79.55, 128.125, 1.69,20326.81, 158.65, 512.500, 1.65,41617.81, 81.21, 256.250, 1.77,19381.12, 75.63
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101513_2796904/input0_results_250307_101513
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_IFETCH_LEVEL.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_LDS.txt
INFO |-> [rocprof] RPL: on '250307_101514' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_LDS.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101514_2797119'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101514_2797119/input0_results_250307_101514'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101514_2797119/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 16 metrics
INFO |-> [rocprof] SQ_INSTS_LDS, SQ_INST_LEVEL_LDS, SQ_ACCUM_PREV_HIRES, SQ_BUSY_CU_CYCLES, SQC_ICACHE_REQ, SQC_ICACHE_HITS, SQC_ICACHE_MISSES, SQC_DCACHE_REQ, TCP_TOTAL_CACHE_ACCESSES_sum, TCP_TCC_READ_REQ_sum, TCP_TCC_WRITE_REQ_sum, TCP_TCC_ATOMIC_WITH_RET_REQ_sum, TCC_REQ_sum, TCC_HIT_sum, TCC_MISS_sum, TCC_EA_WRREQ_sum
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 365.99,1463.95, 0.250, 0.18, 375.83,1503.31, 0.125, 0.18, 187.91,1503.31, 0.500, 0.09, 715.74,1431.49, 0.250, 0.09, 359.71,1438.85
INFO |-> [rocprof] 1, 0.750, 0.09, 1082.84,1443.79, 0.750, 0.18, 1122.45,1496.60, 0.375, 0.18, 561.73,1497.94, 1.500, 0.09, 2158.26,1438.84, 0.750, 0.09, 1073.62,1431.49
INFO |-> [rocprof] 2, 1.250, 0.09, 1801.66,1441.33, 1.250, 0.18, 1874.10,1499.28, 0.625, 0.18, 937.89,1500.62, 2.500, 0.09, 3590.94,1436.37, 1.250, 0.09, 1804.74,1443.79
INFO |-> [rocprof] 3, 1.750, 0.09, 2517.97,1438.84, 1.750, 0.18, 2626.09,1500.62, 0.875, 0.18, 1313.04,1500.62, 3.500, 0.09, 5018.77,1433.93, 1.750, 0.09, 2509.39,1433.93
INFO |-> [rocprof] 4, 2.250, 0.09, 3237.39,1438.84, 2.250, 0.18, 3370.37,1497.94, 1.125, 0.18, 1694.26,1506.01, 4.500, 0.09, 6542.10,1453.80, 2.250, 0.09, 3254.13,1446.28
INFO |-> [rocprof] 5, 2.750, 0.09, 3977.31,1446.30, 2.750, 0.18, 4186.67,1522.42, 1.375, 0.18, 2067.06,1503.32, 5.500, 0.09, 7900.15,1436.39, 2.750, 0.09, 3943.32,1433.93
INFO |-> [rocprof] 6, 3.250, 0.09, 4692.37,1443.81, 3.250, 0.18, 4825.28,1484.70, 1.625, 0.18, 2436.36,1499.30, 6.500, 0.09, 9384.74,1443.81, 3.250, 0.09, 4692.42,1443.82
INFO |-> [rocprof] 7, 3.750, 0.09, 5404.97,1441.33, 3.750, 0.18, 5543.10,1478.16, 1.875, 0.18, 2816.21,1501.98, 7.500, 0.09,10847.34,1446.31, 3.750, 0.09, 5442.38,1451.30
INFO |-> [rocprof] 8, 4.250, 0.09, 6136.18,1443.81, 4.250, 0.18, 6349.28,1493.95, 2.125, 0.18, 3174.64,1493.95, 8.500, 0.09,12272.49,1443.82, 4.250, 0.09, 6094.22,1433.93
INFO |-> [rocprof] 9, 4.750, 0.09, 6869.91,1446.30, 4.750, 0.18, 7083.67,1491.30, 2.375, 0.18, 3557.65,1497.96, 9.500, 0.09,13739.96,1446.31, 4.750, 0.09, 6846.30,1441.33
INFO |-> [rocprof] 10, 5.250, 0.09, 7593.14,1446.31, 5.250, 0.18, 7836.25,1492.62, 2.625, 0.18, 3921.63,1493.96, 10.500, 0.09,15056.31,1433.93, 5.250, 0.09, 7566.96,1441.33
INFO |-> [rocprof] 11, 5.750, 0.09, 8231.06,1431.49, 5.750, 0.18, 8537.03,1484.70, 2.875, 0.18, 4295.12,1493.96, 11.500, 0.09,16378.26,1424.20, 5.750, 0.09, 8147.63,1416.98
INFO |-> [rocprof] 12, 6.250, 0.09, 8977.44,1436.39, 6.250, 0.18, 9206.06,1472.97, 3.125, 0.18, 4635.59,1483.39, 12.500, 0.09,17772.29,1421.78, 6.250, 0.09, 9070.63,1451.30
INFO |-> [rocprof] 13, 6.750, 0.09, 9695.74,1436.41, 6.750, 0.18,10084.14,1493.95, 3.375, 0.18, 5019.75,1487.33, 13.500, 0.09,19226.66,1424.20, 6.750, 0.10, 9437.09,1398.09
INFO |-> [rocprof] 14, 7.250, 0.09,10342.99,1426.62, 7.250, 0.18,10697.81,1475.56, 3.625, 0.18, 5363.06,1479.46, 14.500, 0.09,20756.57,1431.49, 7.250, 0.10,10238.51,1412.21
INFO |-> [rocprof] 15, 7.750, 0.09,11151.24,1438.87, 7.750, 0.18,11425.55,1474.26, 3.875, 0.18, 5778.79,1491.30, 15.500, 0.09,22302.23,1438.85, 7.750, 0.10,10944.62,1412.21
INFO |-> [rocprof] 16, 8.250, 0.09,11829.96,1433.93, 8.250, 0.18,12025.31,1457.61, 4.125, 0.18, 6184.56,1499.29, 16.500, 0.10,23184.35,1405.11, 8.250, 0.09,11729.71,1421.78
INFO |-> [rocprof] 17, 8.750, 0.09,12461.72,1424.20, 8.750, 0.19,12676.88,1448.79, 4.375, 0.18, 6535.98,1493.94, 17.500, 0.10,24630.72,1407.47, 8.750, 0.09,12377.53,1414.58
INFO |-> [rocprof] 18, 9.250, 0.09,13241.26,1431.49, 9.250, 0.19,13343.73,1442.57, 4.625, 0.18, 6897.22,1491.29, 18.500, 0.10,25864.34,1398.07, 9.250, 0.10,12997.29,1405.11
INFO |-> [rocprof] 20, 10.250, 0.09,14597.86,1424.18, 10.250, 0.19,14597.94,1424.19, 5.125, 0.18, 7636.04,1489.96, 20.500, 0.10,28613.09,1395.76, 10.250, 0.09,14524.04,1416.98
INFO |-> [rocprof] 22, 11.250, 0.09,16214.92,1441.33, 11.250, 0.19,15559.83,1383.10, 5.625, 0.18, 8321.89,1479.45, 22.500, 0.10,29353.32,1304.59, 11.250, 0.10,14932.11,1327.30
INFO |-> [rocprof] 24, 12.250, 0.09,17416.84,1421.78, 12.250, 0.20,16119.05,1315.84, 6.125, 0.18, 8998.19,1469.09, 24.500, 0.11,29871.95,1219.26, 12.250, 0.11,15569.62,1270.99
INFO |-> [rocprof] 28, 14.250, 0.09,20157.70,1414.58, 14.250, 0.22,17261.52,1211.33, 7.125, 0.18,10430.69,1463.96, 28.500, 0.12,31961.41,1121.45, 14.250, 0.11,16883.55,1184.81
INFO |-> [rocprof] 32, 16.250, 0.09,23103.97,1421.78, 16.250, 0.24,17935.87,1103.75, 8.125, 0.18,11822.50,1455.08, 32.500, 0.13,33993.47,1045.95, 16.250, 0.13,16996.73,1045.95
INFO |-> [rocprof] 40, 20.250, 0.10,27575.91,1361.77, 20.250, 0.29,18790.59, 927.93, 10.125, 0.19,14456.73,1427.83, 40.500, 0.16,35060.07, 865.68, 20.250, 0.15,17787.01, 878.37
INFO |-> [rocprof] 48, 24.250, 0.11,29101.85,1200.08, 24.250, 0.34,19038.03, 785.07, 12.125, 0.20,15992.20,1318.94, 48.500, 0.18,36003.60, 742.34, 24.250, 0.18,18409.08, 759.14
INFO |-> [rocprof] 56, 28.250, 0.12,31937.22,1130.52, 28.250, 0.39,19235.00, 680.88, 14.125, 0.22,17024.06,1205.24, 56.500, 0.21,36969.53, 654.33, 28.250, 0.20,18630.18, 659.48
INFO |-> [rocprof] 64, 32.250, 0.13,34028.99,1055.16, 32.250, 0.45,19434.55, 602.62, 16.125, 0.24,17903.91,1110.32, 64.500, 0.23,38156.25, 591.57, 32.250, 0.24,17880.25, 554.43
INFO |-> [rocprof] 80, 40.250, 0.16,34243.12, 850.76, 40.250, 0.55,19641.63, 487.99, 20.125, 0.29,18654.10, 926.91, 80.500, 0.28,38521.29, 478.53, 40.250, 0.29,18349.96, 455.90
INFO |-> [rocprof] 96, 48.250, 0.18,35945.66, 744.99, 48.250, 0.65,19796.91, 410.30, 24.125, 0.34,19020.05, 788.40, 96.500, 0.33,39068.33, 404.85, 48.250, 0.35,18660.53, 386.75
INFO |-> [rocprof] 128, 64.250, 0.43,20163.27, 313.83, 64.250, 0.86,19961.64, 310.69, 32.125, 0.44,19450.18, 605.45, 128.500, 0.43,39834.76, 310.00, 64.250, 0.46,18740.07, 291.67
INFO |-> [rocprof] 256, 128.250, 0.83,20681.11, 161.26, 128.250, 1.70,20237.62, 157.80, 64.125, 0.86,20030.38, 312.36, 256.500, 0.84,41046.65, 160.03, 128.250, 0.90,19176.99, 149.53
INFO |-> [rocprof] 512, 256.250, 1.64,20938.64, 81.71, 256.250, 3.38,20379.82, 79.53, 128.125, 1.69,20324.90, 158.63, 512.500, 1.65,41597.70, 81.17, 256.250, 1.78,19368.91, 75.59
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101514_2797119/input0_results_250307_101514
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_INST_LEVEL_LDS.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_SMEM.txt
INFO |-> [rocprof] RPL: on '250307_101515' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_SMEM.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101515_2797331'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101515_2797331/input0_results_250307_101515'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101515_2797331/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 105 metrics
INFO |-> [rocprof] SQ_INSTS_SMEM, SQ_INST_LEVEL_SMEM, SQ_ACCUM_PREV_HIRES, SQ_INSTS_VALU_MUL_F16, SQ_INSTS_VALU_FMA_F16, SQ_INSTS_VALU_TRANS_F16, SQ_INSTS_VALU_ADD_F32, SQ_INSTS_VALU_MUL_F32, TCC_HIT[0], TCC_MISS[0], TCC_REQ[0], TCC_HIT[1], TCC_MISS[1], TCC_REQ[1], TCC_HIT[2], TCC_MISS[2], TCC_REQ[2], TCC_HIT[3], TCC_MISS[3], TCC_REQ[3], TCC_HIT[4], TCC_MISS[4], TCC_REQ[4], TCC_HIT[5], TCC_MISS[5], TCC_REQ[5], TCC_HIT[6], TCC_MISS[6], TCC_REQ[6], TCC_HIT[7], TCC_MISS[7], TCC_REQ[7], TCC_HIT[8], TCC_MISS[8], TCC_REQ[8], TCC_HIT[9], TCC_MISS[9], TCC_REQ[9], TCC_HIT[10], TCC_MISS[10], TCC_REQ[10], TCC_HIT[11], TCC_MISS[11], TCC_REQ[11], TCC_HIT[12], TCC_MISS[12], TCC_REQ[12], TCC_HIT[13], TCC_MISS[13], TCC_REQ[13], TCC_HIT[14], TCC_MISS[14], TCC_REQ[14], TCC_HIT[15], TCC_MISS[15], TCC_REQ[15], TCC_HIT[16], TCC_MISS[16], TCC_REQ[16], TCC_HIT[17], TCC_MISS[17], TCC_REQ[17], TCC_HIT[18], TCC_MISS[18], TCC_REQ[18], TCC_HIT[19], TCC_MISS[19], TCC_REQ[19], TCC_HIT[20], TCC_MISS[20], TCC_REQ[20], TCC_HIT[21], TCC_MISS[21], TCC_REQ[21], TCC_HIT[22], TCC_MISS[22], TCC_REQ[22], TCC_HIT[23], TCC_MISS[23], TCC_REQ[23], TCC_HIT[24], TCC_MISS[24], TCC_REQ[24], TCC_HIT[25], TCC_MISS[25], TCC_REQ[25], TCC_HIT[26], TCC_MISS[26], TCC_REQ[26], TCC_HIT[27], TCC_MISS[27], TCC_REQ[27], TCC_HIT[28], TCC_MISS[28], TCC_REQ[28], TCC_HIT[29], TCC_MISS[29], TCC_REQ[29], TCC_HIT[30], TCC_MISS[30], TCC_REQ[30], TCC_HIT[31], TCC_MISS[31], TCC_REQ[31], TCC_EA_WRREQ_LEVEL_sum
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 362.20,1448.79, 0.250, 0.18, 377.52,1510.07, 0.125, 0.18, 188.42,1507.37, 0.500, 0.09, 724.40,1448.79, 0.250, 0.09, 361.57,1446.28
INFO |-> [rocprof] 1, 0.750, 0.09, 1082.86,1443.81, 0.750, 0.18, 1133.58,1511.44, 0.375, 0.18, 565.77,1508.72, 1.500, 0.09, 2184.49,1456.32, 0.750, 0.09, 1084.72,1446.30
INFO |-> [rocprof] 2, 1.250, 0.09, 1814.11,1451.28, 1.250, 0.18, 1892.70,1514.16, 0.625, 0.18, 943.80,1510.08, 2.500, 0.09, 3590.98,1436.39, 1.250, 0.09, 1795.49,1436.39
INFO |-> [rocprof] 3, 1.750, 0.09, 2539.78,1451.30, 1.750, 0.18, 2642.63,1510.07, 0.875, 0.18, 1322.51,1511.44, 3.500, 0.09, 5070.78,1448.79, 1.750, 0.09, 2531.02,1446.30
INFO |-> [rocprof] 4, 2.250, 0.09, 3276.76,1456.34, 2.250, 0.18, 3397.67,1510.07, 1.125, 0.18, 1700.36,1511.44, 4.500, 0.09, 6497.06,1443.79, 2.250, 0.09, 3271.05,1453.80
INFO |-> [rocprof] 5, 2.750, 0.09, 3991.08,1451.30, 2.750, 0.18, 4152.71,1510.07, 1.375, 0.18, 2072.62,1507.36, 5.500, 0.09, 7982.15,1451.30, 2.750, 0.09, 3997.99,1453.82
INFO |-> [rocprof] 6, 3.250, 0.09, 4708.58,1448.79, 3.250, 0.18, 4833.78,1487.32, 1.625, 0.18, 2460.53,1514.17, 6.500, 0.09, 9384.64,1443.79, 3.250, 0.09, 4700.41,1446.28
INFO |-> [rocprof] 7, 3.750, 0.09, 5442.32,1451.28, 3.750, 0.18, 5597.29,1492.61, 1.875, 0.18, 2833.96,1511.44, 7.500, 0.09,10865.95,1448.79, 3.750, 0.09, 5423.61,1446.30
INFO |-> [rocprof] 8, 4.250, 0.09, 6157.31,1448.78, 4.250, 0.18, 6400.54,1506.01, 2.125, 0.18, 3208.93,1510.08, 8.500, 0.09,12272.36,1443.81, 4.250, 0.09, 6073.46,1429.05
INFO |-> [rocprof] 9, 4.750, 0.09, 6858.08,1443.81, 4.750, 0.18, 7134.33,1501.96, 2.375, 0.18, 3583.24,1508.73, 9.500, 0.09,13645.85,1436.41, 4.750, 0.09, 6822.85,1436.39
INFO |-> [rocprof] 10, 5.250, 0.09, 7632.53,1453.82, 5.250, 0.18, 7871.26,1499.29, 2.625, 0.18, 3946.21,1503.32, 10.500, 0.09,15108.13,1438.87, 5.250, 0.09, 7553.98,1438.85
INFO |-> [rocprof] 11, 5.750, 0.09, 8359.44,1453.82, 5.750, 0.18, 8590.25,1493.96, 2.875, 0.18, 4337.61,1508.73, 11.500, 0.09,16603.78,1443.81, 5.750, 0.09, 8245.21,1433.95
INFO |-> [rocprof] 12, 6.250, 0.09, 9023.89,1443.82, 6.250, 0.18, 9295.83,1487.33, 3.125, 0.18, 4676.91,1496.61, 12.500, 0.09,18016.58,1441.33, 6.250, 0.09, 8946.80,1431.49
INFO |-> [rocprof] 13, 6.750, 0.09, 9745.70,1443.81, 6.750, 0.18,10165.67,1506.03, 3.375, 0.18, 5033.14,1491.30, 13.500, 0.09,19358.33,1433.95, 6.750, 0.09, 9597.03,1421.78
INFO |-> [rocprof] 14, 7.250, 0.09,10449.73,1441.34, 7.250, 0.18,10764.08,1484.70, 3.625, 0.18, 5425.25,1496.62, 14.500, 0.09,20792.05,1433.93, 7.250, 0.09,10343.10,1426.63
INFO |-> [rocprof] 15, 7.750, 0.09,11094.15,1431.50, 7.750, 0.18,11506.37,1484.69, 3.875, 0.18, 5851.57,1510.08, 15.500, 0.09,22112.83,1426.63, 7.750, 0.09,10981.71,1416.99
INFO |-> [rocprof] 16, 8.250, 0.09,11789.78,1429.06, 8.250, 0.18,12046.24,1460.15, 4.125, 0.18, 6195.67,1501.98, 16.500, 0.09,23340.98,1414.61, 8.250, 0.09,11690.21,1416.99
INFO |-> [rocprof] 17, 8.750, 0.09,12676.95,1448.79, 8.750, 0.18,12765.20,1458.88, 4.375, 0.18, 6594.78,1507.38, 17.500, 0.10,24713.91,1412.22, 8.750, 0.09,12419.55,1419.38
INFO |-> [rocprof] 18, 9.250, 0.09,13589.25,1469.11, 9.250, 0.18,13494.64,1458.88, 4.625, 0.18, 6984.14,1510.08, 18.500, 0.10,25951.11,1402.76, 9.250, 0.09,13084.96,1414.59
INFO |-> [rocprof] 20, 10.250, 0.09,14723.00,1436.39, 10.250, 0.19,14811.77,1445.05, 5.125, 0.18, 7711.46,1504.67, 20.500, 0.10,28756.63,1402.76, 10.250, 0.10,14402.40,1405.11
INFO |-> [rocprof] 22, 11.250, 0.09,16243.00,1443.82, 11.250, 0.20,15483.33,1376.30, 5.625, 0.18, 8410.99,1495.29, 22.500, 0.10,29959.02,1331.51, 11.250, 0.10,15147.97,1346.49
INFO |-> [rocprof] 24, 12.250, 0.09,17595.78,1436.39, 12.250, 0.20,16169.94,1319.99, 6.125, 0.18, 9109.92,1487.33, 24.500, 0.11,30858.70,1259.54, 12.250, 0.11,15616.94,1274.85
INFO |-> [rocprof] 28, 14.250, 0.09,20363.95,1429.05, 14.250, 0.22,17361.96,1218.38, 7.125, 0.18,10569.08,1483.38, 28.500, 0.12,32438.71,1138.20, 14.250, 0.12,16579.28,1163.46
INFO |-> [rocprof] 32, 16.250, 0.09,23064.88,1419.38, 16.250, 0.24,17971.49,1105.94, 8.125, 0.18,11967.88,1472.97, 32.500, 0.13,34206.73,1052.51, 16.250, 0.13,17103.36,1052.51
INFO |-> [rocprof] 40, 20.250, 0.10,28311.26,1398.09, 20.250, 0.29,18811.53, 928.96, 10.125, 0.19,14656.46,1447.55, 40.500, 0.15,35799.41, 883.94, 20.250, 0.15,17824.57, 880.23
INFO |-> [rocprof] 48, 24.250, 0.11,29610.44,1221.05, 24.250, 0.34,19100.71, 787.66, 12.125, 0.20,16055.46,1324.16, 48.500, 0.18,36196.20, 746.31, 24.250, 0.18,18326.36, 755.73
INFO |-> [rocprof] 56, 28.250, 0.12,32154.16,1138.20, 28.250, 0.39,19242.95, 681.17, 14.125, 0.22,17209.59,1218.38, 56.500, 0.20,37085.42, 656.38, 28.250, 0.20,18644.93, 660.00
INFO |-> [rocprof] 64, 32.250, 0.13,34288.04,1063.19, 32.250, 0.45,19392.89, 601.33, 16.125, 0.24,17856.79,1107.40, 64.500, 0.23,38237.49, 592.83, 32.250, 0.24,17880.39, 554.43
INFO |-> [rocprof] 80, 40.250, 0.16,34665.23, 861.25, 40.250, 0.55,19635.88, 487.85, 20.125, 0.29,18716.14, 929.99, 80.500, 0.28,38543.13, 478.80, 40.250, 0.29,18359.87, 456.15
INFO |-> [rocprof] 96, 48.250, 0.18,36041.48, 746.97, 48.250, 0.66,19762.96, 409.60, 24.125, 0.34,19037.89, 789.14, 96.500, 0.33,39086.96, 405.05, 48.250, 0.35,18651.93, 386.57
INFO |-> [rocprof] 128, 64.250, 0.43,20163.18, 313.82, 64.250, 0.86,19939.42, 310.34, 32.125, 0.44,19436.06, 605.01, 128.500, 0.43,39923.10, 310.69, 64.250, 0.46,18726.97, 291.47
INFO |-> [rocprof] 256, 128.250, 0.83,20657.21, 161.07, 128.250, 1.70,20216.61, 157.63, 64.125, 0.86,20026.55, 312.30, 256.500, 0.84,40976.10, 159.75, 128.250, 0.90,19142.83, 149.26
INFO |-> [rocprof] 512, 256.250, 1.64,20918.23, 81.63, 256.250, 3.38,20372.05, 79.50, 128.125, 1.69,20305.69, 158.48, 512.500, 1.65,41585.63, 81.14, 256.250, 1.78,19361.93, 75.56
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101515_2797331/input0_results_250307_101515
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_INST_LEVEL_SMEM.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_VMEM.txt
INFO |-> [rocprof] RPL: on '250307_101516' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_INST_LEVEL_VMEM.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101516_2797554'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101516_2797554/input0_results_250307_101516'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101516_2797554/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 136 metrics
INFO |-> [rocprof] SQ_INSTS_VMEM, SQ_INST_LEVEL_VMEM, SQ_ACCUM_PREV_HIRES, SQ_INSTS_VALU_FMA_F32, SQ_INSTS_VALU_TRANS_F32, SQ_INSTS_VALU_ADD_F64, SQ_INSTS_VALU_MUL_F64, SQ_INSTS_VALU_FMA_F64, TCC_EA_RDREQ[0], TCC_EA_RDREQ_32B[0], TCC_EA_WRREQ[0], TCC_EA_WRREQ_64B[0], TCC_EA_RDREQ[1], TCC_EA_RDREQ_32B[1], TCC_EA_WRREQ[1], TCC_EA_WRREQ_64B[1], TCC_EA_RDREQ[2], TCC_EA_RDREQ_32B[2], TCC_EA_WRREQ[2], TCC_EA_WRREQ_64B[2], TCC_EA_RDREQ[3], TCC_EA_RDREQ_32B[3], TCC_EA_WRREQ[3], TCC_EA_WRREQ_64B[3], TCC_EA_RDREQ[4], TCC_EA_RDREQ_32B[4], TCC_EA_WRREQ[4], TCC_EA_WRREQ_64B[4], TCC_EA_RDREQ[5], TCC_EA_RDREQ_32B[5], TCC_EA_WRREQ[5], TCC_EA_WRREQ_64B[5], TCC_EA_RDREQ[6], TCC_EA_RDREQ_32B[6], TCC_EA_WRREQ[6], TCC_EA_WRREQ_64B[6], TCC_EA_RDREQ[7], TCC_EA_RDREQ_32B[7], TCC_EA_WRREQ[7], TCC_EA_WRREQ_64B[7], TCC_EA_RDREQ[8], TCC_EA_RDREQ_32B[8], TCC_EA_WRREQ[8], TCC_EA_WRREQ_64B[8], TCC_EA_RDREQ[9], TCC_EA_RDREQ_32B[9], TCC_EA_WRREQ[9], TCC_EA_WRREQ_64B[9], TCC_EA_RDREQ[10], TCC_EA_RDREQ_32B[10], TCC_EA_WRREQ[10], TCC_EA_WRREQ_64B[10], TCC_EA_RDREQ[11], TCC_EA_RDREQ_32B[11], TCC_EA_WRREQ[11], TCC_EA_WRREQ_64B[11], TCC_EA_RDREQ[12], TCC_EA_RDREQ_32B[12], TCC_EA_WRREQ[12], TCC_EA_WRREQ_64B[12], TCC_EA_RDREQ[13], TCC_EA_RDREQ_32B[13], TCC_EA_WRREQ[13], TCC_EA_WRREQ_64B[13], TCC_EA_RDREQ[14], TCC_EA_RDREQ_32B[14], TCC_EA_WRREQ[14], TCC_EA_WRREQ_64B[14], TCC_EA_RDREQ[15], TCC_EA_RDREQ_32B[15], TCC_EA_WRREQ[15], TCC_EA_WRREQ_64B[15], TCC_EA_RDREQ[16], TCC_EA_RDREQ_32B[16], TCC_EA_WRREQ[16], TCC_EA_WRREQ_64B[16], TCC_EA_RDREQ[17], TCC_EA_RDREQ_32B[17], TCC_EA_WRREQ[17], TCC_EA_WRREQ_64B[17], TCC_EA_RDREQ[18], TCC_EA_RDREQ_32B[18], TCC_EA_WRREQ[18], TCC_EA_WRREQ_64B[18], TCC_EA_RDREQ[19], TCC_EA_RDREQ_32B[19], TCC_EA_WRREQ[19], TCC_EA_WRREQ_64B[19], TCC_EA_RDREQ[20], TCC_EA_RDREQ_32B[20], TCC_EA_WRREQ[20], TCC_EA_WRREQ_64B[20], TCC_EA_RDREQ[21], TCC_EA_RDREQ_32B[21], TCC_EA_WRREQ[21], TCC_EA_WRREQ_64B[21], TCC_EA_RDREQ[22], TCC_EA_RDREQ_32B[22], TCC_EA_WRREQ[22], TCC_EA_WRREQ_64B[22], TCC_EA_RDREQ[23], TCC_EA_RDREQ_32B[23], TCC_EA_WRREQ[23], TCC_EA_WRREQ_64B[23], TCC_EA_RDREQ[24], TCC_EA_RDREQ_32B[24], TCC_EA_WRREQ[24], TCC_EA_WRREQ_64B[24], TCC_EA_RDREQ[25], TCC_EA_RDREQ_32B[25], TCC_EA_WRREQ[25], TCC_EA_WRREQ_64B[25], TCC_EA_RDREQ[26], TCC_EA_RDREQ_32B[26], TCC_EA_WRREQ[26], TCC_EA_WRREQ_64B[26], TCC_EA_RDREQ[27], TCC_EA_RDREQ_32B[27], TCC_EA_WRREQ[27], TCC_EA_WRREQ_64B[27], TCC_EA_RDREQ[28], TCC_EA_RDREQ_32B[28], TCC_EA_WRREQ[28], TCC_EA_WRREQ_64B[28], TCC_EA_RDREQ[29], TCC_EA_RDREQ_32B[29], TCC_EA_WRREQ[29], TCC_EA_WRREQ_64B[29], TCC_EA_RDREQ[30], TCC_EA_RDREQ_32B[30], TCC_EA_WRREQ[30], TCC_EA_WRREQ_64B[30], TCC_EA_RDREQ[31], TCC_EA_RDREQ_32B[31], TCC_EA_WRREQ[31], TCC_EA_WRREQ_64B[31]
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 357.27,1429.06, 0.250, 0.18, 374.82,1499.30, 0.125, 0.18, 189.10,1512.82, 0.500, 0.09, 715.75,1431.50, 0.250, 0.09, 358.49,1433.95
INFO |-> [rocprof] 1, 0.750, 0.09, 1103.75,1471.67, 0.750, 0.18, 1132.56,1510.08, 0.375, 0.18, 565.78,1508.73, 1.500, 0.09, 2180.75,1453.83, 0.750, 0.09, 1082.87,1443.82
INFO |-> [rocprof] 2, 1.250, 0.09, 1810.99,1448.79, 1.250, 0.18, 1882.53,1506.03, 0.625, 0.18, 942.11,1507.38, 2.500, 0.09, 3584.88,1433.95, 1.250, 0.09, 1820.44,1456.36
INFO |-> [rocprof] 3, 1.750, 0.09, 2522.32,1441.33, 1.750, 0.18, 2642.65,1510.08, 0.875, 0.18, 1315.41,1503.33, 3.500, 0.09, 5053.32,1443.81, 1.750, 0.09, 2526.66,1443.81
INFO |-> [rocprof] 4, 2.250, 0.09, 3265.43,1451.30, 2.250, 0.18, 3388.56,1506.03, 1.125, 0.18, 1694.28,1506.03, 4.500, 0.09, 6497.20,1443.82, 2.250, 0.09, 3237.46,1438.87
INFO |-> [rocprof] 5, 2.750, 0.09, 3956.85,1438.85, 2.750, 0.18, 4134.12,1503.32, 1.375, 0.18, 2076.38,1510.09, 5.500, 0.09, 7900.15,1436.39, 2.750, 0.09, 3984.23,1448.81
INFO |-> [rocprof] 6, 3.250, 0.09, 4692.42,1443.82, 3.250, 0.18, 4829.55,1486.02, 1.625, 0.18, 2451.69,1508.73, 6.500, 0.09, 9384.74,1443.81, 3.250, 0.09, 4692.37,1443.81
INFO |-> [rocprof] 7, 3.750, 0.09, 5432.98,1448.79, 3.750, 0.18, 5592.37,1491.30, 1.875, 0.18, 2826.33,1507.38, 7.500, 0.09,10847.22,1446.30, 3.750, 0.09, 5404.97,1441.33
INFO |-> [rocprof] 8, 4.250, 0.09, 6136.25,1443.82, 4.250, 0.18, 6394.83,1504.67, 2.125, 0.18, 3200.30,1506.03, 8.500, 0.09,12167.78,1431.50, 4.250, 0.09, 6125.64,1441.33
INFO |-> [rocprof] 9, 4.750, 0.09, 6858.16,1443.82, 4.750, 0.18, 7140.80,1503.33, 2.375, 0.18, 3570.40,1503.33, 9.500, 0.09,13692.60,1441.33, 4.750, 0.09, 6869.91,1446.30
INFO |-> [rocprof] 10, 5.250, 0.09, 7541.05,1436.39, 5.250, 0.18, 7871.30,1499.30, 2.625, 0.18, 3953.32,1506.03, 10.500, 0.09,15160.13,1443.82, 5.250, 0.09, 7566.96,1441.33
INFO |-> [rocprof] 11, 5.750, 0.09, 8287.71,1441.34, 5.750, 0.18, 8559.76,1488.65, 2.875, 0.18, 4333.69,1507.37, 11.500, 0.09,16490.43,1433.95, 5.750, 0.09, 8231.06,1431.49
INFO |-> [rocprof] 12, 6.250, 0.09, 9039.45,1446.31, 6.250, 0.18, 9304.08,1488.65, 3.125, 0.18, 4652.04,1488.65, 12.500, 0.09,17863.12,1429.05, 6.250, 0.09, 8856.22,1416.99
INFO |-> [rocprof] 13, 6.750, 0.09, 9779.36,1448.79, 6.750, 0.18,10093.13,1495.28, 3.375, 0.18, 5019.75,1487.33, 13.500, 0.09,19292.37,1429.06, 6.750, 0.10, 9500.42,1407.47
INFO |-> [rocprof] 14, 7.250, 0.09,10485.65,1446.30, 7.250, 0.18,10745.01,1482.07, 3.625, 0.18, 5382.04,1484.70, 14.500, 0.09,20792.05,1433.93, 7.250, 0.09,10290.48,1419.38
INFO |-> [rocprof] 15, 7.750, 0.09,11132.14,1436.41, 7.750, 0.18,11486.04,1482.07, 3.875, 0.18, 5814.97,1500.64, 15.500, 0.09,22188.30,1431.50, 7.750, 0.09,11037.64,1424.21
INFO |-> [rocprof] 16, 8.250, 0.09,11850.34,1436.41, 8.250, 0.18,11983.66,1452.56, 4.125, 0.18, 6184.60,1499.30, 16.500, 0.09,23459.67,1421.80, 8.250, 0.10,11611.63,1407.47
INFO |-> [rocprof] 17, 8.750, 0.09,12547.06,1433.95, 8.750, 0.18,12787.44,1461.42, 4.375, 0.18, 6559.42,1499.30, 17.500, 0.09,24755.59,1414.61, 8.750, 0.09,12377.66,1414.59
INFO |-> [rocprof] 18, 9.250, 0.09,13332.27,1441.33, 9.250, 0.18,13436.23,1452.56, 4.625, 0.18, 6952.85,1503.32, 18.500, 0.10,26082.23,1409.85, 9.250, 0.10,13019.23,1407.48
INFO |-> [rocprof] 20, 10.250, 0.09,14824.54,1446.30, 10.250, 0.19,14799.02,1443.81, 5.125, 0.18, 7697.65,1501.98, 20.500, 0.10,28853.13,1407.47, 10.250, 0.10,14306.55,1395.76
INFO |-> [rocprof] 22, 11.250, 0.09,16104.24,1431.49, 11.250, 0.19,15637.26,1389.98, 5.625, 0.18, 8396.02,1492.63, 22.500, 0.10,30054.43,1335.75, 11.250, 0.10,15099.34,1342.16
INFO |-> [rocprof] 24, 12.250, 0.09,17446.41,1424.20, 12.250, 0.20,16144.45,1317.91, 6.125, 0.18, 9061.72,1479.46, 24.500, 0.11,30402.22,1240.91, 12.250, 0.11,15499.31,1265.25
INFO |-> [rocprof] 28, 14.250, 0.09,20329.32,1426.62, 14.250, 0.22,17324.14,1215.73, 7.125, 0.18,10606.60,1488.65, 28.500, 0.12,32704.96,1147.54, 14.250, 0.11,16812.46,1179.82
INFO |-> [rocprof] 32, 16.250, 0.09,23064.88,1419.38, 16.250, 0.24,18007.10,1108.13, 8.125, 0.18,11936.37,1469.09, 32.500, 0.13,34163.86,1051.20, 16.250, 0.13,17146.39,1055.16
INFO |-> [rocprof] 40, 20.250, 0.10,28170.41,1391.13, 20.250, 0.29,18811.53, 928.96, 10.125, 0.19,14656.38,1447.54, 40.500, 0.15,35686.60, 881.15, 20.250, 0.15,17768.64, 877.46
INFO |-> [rocprof] 48, 24.250, 0.11,29610.17,1221.04, 24.250, 0.34,19082.73, 786.92, 12.125, 0.20,16106.31,1328.36, 48.500, 0.18,36488.15, 752.33, 24.250, 0.18,18376.03, 757.77
INFO |-> [rocprof] 56, 28.250, 0.12,32507.02,1150.69, 28.250, 0.39,19227.29, 680.61, 14.125, 0.22,17424.71,1233.61, 56.500, 0.20,37056.61, 655.87, 28.250, 0.20,18718.47, 662.60
INFO |-> [rocprof] 64, 32.250, 0.13,34114.81,1057.82, 32.250, 0.45,19413.76, 601.98, 16.125, 0.24,17915.92,1111.06, 64.500, 0.23,37995.82, 589.08, 32.250, 0.24,17892.29, 554.80
INFO |-> [rocprof] 80, 40.250, 0.16,34594.19, 859.48, 40.250, 0.55,19601.68, 487.00, 20.125, 0.29,18664.41, 927.42, 80.500, 0.28,38455.61, 477.71, 40.250, 0.29,18330.03, 455.40
INFO |-> [rocprof] 96, 48.250, 0.18,36430.76, 755.04, 48.250, 0.65,19787.23, 410.10, 24.125, 0.34,18922.30, 784.34, 96.500, 0.33,38974.28, 403.88, 48.250, 0.35,18651.98, 386.57
INFO |-> [rocprof] 128, 64.250, 0.43,20201.06, 314.41, 64.250, 0.86,19939.49, 310.34, 32.125, 0.44,19436.15, 605.02, 128.500, 0.43,39834.76, 310.00, 64.250, 0.46,18727.01, 291.47
INFO |-> [rocprof] 256, 128.250, 0.83,20641.43, 160.95, 128.250, 1.70,20231.90, 157.75, 64.125, 0.86,20015.47, 312.13, 256.500, 0.84,41015.35, 159.90, 128.250, 0.90,19183.88, 149.58
INFO |-> [rocprof] 512, 256.250, 1.64,20926.45, 81.66, 256.250, 3.38,20371.17, 79.50, 128.125, 1.69,20309.58, 158.51, 512.500, 1.65,41597.78, 81.17, 256.250, 1.78,19367.20, 75.58
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101516_2797554/input0_results_250307_101516
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_INST_LEVEL_VMEM.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_LEVEL_WAVES.txt
INFO |-> [rocprof] RPL: on '250307_101517' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/SQ_LEVEL_WAVES.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101517_2797786'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101517_2797786/input0_results_250307_101517'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101517_2797786/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 75 metrics
INFO |-> [rocprof] SQ_CYCLES, SQ_WAVES, SQ_WAVE_CYCLES, SQ_BUSY_CYCLES, SQ_LEVEL_WAVES, SQ_ACCUM_PREV_HIRES, SQ_INSTS_VALU_TRANS_F64, SQ_INSTS_VALU_INT32, TCC_EA_RDREQ_LEVEL[0], TCC_EA_WRREQ_LEVEL[0], TCC_EA_RDREQ_LEVEL[1], TCC_EA_WRREQ_LEVEL[1], TCC_EA_RDREQ_LEVEL[2], TCC_EA_WRREQ_LEVEL[2], TCC_EA_RDREQ_LEVEL[3], TCC_EA_WRREQ_LEVEL[3], TCC_EA_RDREQ_LEVEL[4], TCC_EA_WRREQ_LEVEL[4], TCC_EA_RDREQ_LEVEL[5], TCC_EA_WRREQ_LEVEL[5], TCC_EA_RDREQ_LEVEL[6], TCC_EA_WRREQ_LEVEL[6], TCC_EA_RDREQ_LEVEL[7], TCC_EA_WRREQ_LEVEL[7], TCC_EA_RDREQ_LEVEL[8], TCC_EA_WRREQ_LEVEL[8], TCC_EA_RDREQ_LEVEL[9], TCC_EA_WRREQ_LEVEL[9], TCC_EA_RDREQ_LEVEL[10], TCC_EA_WRREQ_LEVEL[10], TCC_EA_RDREQ_LEVEL[11], TCC_EA_WRREQ_LEVEL[11], TCC_EA_RDREQ_LEVEL[12], TCC_EA_WRREQ_LEVEL[12], TCC_EA_RDREQ_LEVEL[13], TCC_EA_WRREQ_LEVEL[13], TCC_EA_RDREQ_LEVEL[14], TCC_EA_WRREQ_LEVEL[14], TCC_EA_RDREQ_LEVEL[15], TCC_EA_WRREQ_LEVEL[15], TCC_EA_RDREQ_LEVEL[16], TCC_EA_WRREQ_LEVEL[16], TCC_EA_RDREQ_LEVEL[17], TCC_EA_WRREQ_LEVEL[17], TCC_EA_RDREQ_LEVEL[18], TCC_EA_WRREQ_LEVEL[18], TCC_EA_RDREQ_LEVEL[19], TCC_EA_WRREQ_LEVEL[19], TCC_EA_RDREQ_LEVEL[20], TCC_EA_WRREQ_LEVEL[20], TCC_EA_RDREQ_LEVEL[21], TCC_EA_WRREQ_LEVEL[21], TCC_EA_RDREQ_LEVEL[22], TCC_EA_WRREQ_LEVEL[22], TCC_EA_RDREQ_LEVEL[23], TCC_EA_WRREQ_LEVEL[23], TCC_EA_RDREQ_LEVEL[24], TCC_EA_WRREQ_LEVEL[24], TCC_EA_RDREQ_LEVEL[25], TCC_EA_WRREQ_LEVEL[25], TCC_EA_RDREQ_LEVEL[26], TCC_EA_WRREQ_LEVEL[26], TCC_EA_RDREQ_LEVEL[27], TCC_EA_WRREQ_LEVEL[27], TCC_EA_RDREQ_LEVEL[28], TCC_EA_WRREQ_LEVEL[28], TCC_EA_RDREQ_LEVEL[29], TCC_EA_WRREQ_LEVEL[29], TCC_EA_RDREQ_LEVEL[30], TCC_EA_WRREQ_LEVEL[30], TCC_EA_RDREQ_LEVEL[31], TCC_EA_WRREQ_LEVEL[31], CPC_ME1_BUSY_FOR_PACKET_DECODE, GRBM_COUNT, GRBM_GUI_ACTIVE
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 359.71,1438.85, 0.250, 0.18, 377.52,1510.08, 0.125, 0.18, 189.96,1519.66, 0.500, 0.09, 719.43,1438.85, 0.250, 0.09, 359.71,1438.85
INFO |-> [rocprof] 1, 0.750, 0.09, 1079.14,1438.85, 0.750, 0.18, 1129.51,1506.02, 0.375, 0.18, 566.79,1511.44, 1.500, 0.09, 2173.19,1448.79, 0.750, 0.09, 1082.86,1443.81
INFO |-> [rocprof] 2, 1.250, 0.09, 1810.99,1448.79, 1.250, 0.18, 1880.83,1504.67, 0.625, 0.18, 943.80,1510.08, 2.500, 0.09, 3609.52,1443.81, 1.250, 0.09, 1804.76,1443.81
INFO |-> [rocprof] 3, 1.750, 0.09, 2575.42,1471.67, 1.750, 0.18, 2633.17,1504.67, 0.875, 0.18, 1316.58,1504.67, 3.500, 0.09, 5044.64,1441.33, 1.750, 0.09, 2535.36,1448.78
INFO |-> [rocprof] 4, 2.250, 0.09, 3259.79,1448.79, 2.250, 0.18, 3382.47,1503.32, 1.125, 0.18, 1694.27,1506.02, 4.500, 0.09, 6497.13,1443.81, 2.250, 0.09, 3242.98,1441.33
INFO |-> [rocprof] 5, 2.750, 0.09, 3970.47,1443.81, 2.750, 0.18, 4156.47,1511.44, 1.375, 0.18, 2070.77,1506.02, 5.500, 0.09, 7982.15,1451.30, 2.750, 0.09, 3956.85,1438.85
INFO |-> [rocprof] 6, 3.250, 0.09, 4700.46,1446.30, 3.250, 0.18, 4829.53,1486.01, 1.625, 0.18, 2445.08,1504.67, 6.500, 0.09, 9352.55,1438.85, 3.250, 0.09, 4676.28,1438.85
INFO |-> [rocprof] 7, 3.750, 0.09, 5395.70,1438.85, 3.750, 0.18, 5592.34,1491.29, 1.875, 0.18, 2836.51,1512.81, 7.500, 0.09,10772.93,1436.39, 3.750, 0.09, 5377.26,1433.93
INFO |-> [rocprof] 8, 4.250, 0.09, 6125.64,1441.33, 4.250, 0.18, 6400.57,1506.02, 2.125, 0.18, 3211.82,1511.44, 8.500, 0.09,12272.36,1443.81, 4.250, 0.09, 6115.13,1438.85
INFO |-> [rocprof] 9, 4.750, 0.09, 6893.68,1451.30, 4.750, 0.18, 7108.91,1496.61, 2.375, 0.18, 3589.68,1511.44, 9.500, 0.09,13716.17,1443.81, 4.750, 0.09, 6834.56,1438.85
INFO |-> [rocprof] 10, 5.250, 0.09, 7579.99,1443.81, 5.250, 0.18, 7878.34,1500.64, 2.625, 0.18, 3949.75,1504.67, 10.500, 0.09,15030.62,1431.49, 5.250, 0.09, 7554.06,1438.87
INFO |-> [rocprof] 11, 5.750, 0.09, 8287.71,1441.34, 5.750, 0.18, 8574.97,1491.30, 2.875, 0.18, 4341.51,1510.09, 11.500, 0.09,16462.29,1431.50, 5.750, 0.09, 8203.06,1426.62
INFO |-> [rocprof] 12, 6.250, 0.09, 8992.93,1438.87, 6.250, 0.18, 9295.83,1487.33, 3.125, 0.18, 4672.77,1495.29, 12.500, 0.09,17985.87,1438.87, 6.250, 0.09, 8931.65,1429.06
INFO |-> [rocprof] 13, 6.750, 0.09, 9728.95,1441.33, 6.750, 0.18,10111.21,1497.96, 3.375, 0.18, 5015.30,1486.02, 13.500, 0.09,19292.16,1429.05, 6.750, 0.09, 9580.80,1419.38
INFO |-> [rocprof] 14, 7.250, 0.09,10413.94,1436.41, 7.250, 0.18,10783.17,1487.33, 3.625, 0.18, 5401.16,1489.98, 14.500, 0.09,20685.97,1426.62, 7.250, 0.09,10290.48,1419.38
INFO |-> [rocprof] 15, 7.750, 0.09,11075.13,1429.05, 7.750, 0.18,11465.78,1479.46, 3.875, 0.18, 5846.34,1508.73, 15.500, 0.09,22000.58,1419.39, 7.750, 0.09,11000.17,1419.38
INFO |-> [rocprof] 16, 8.250, 0.09,11830.09,1433.95, 8.250, 0.18,12162.68,1474.26, 4.125, 0.18, 6206.78,1504.67, 16.500, 0.09,23380.41,1416.99, 8.250, 0.09,11709.86,1419.38
INFO |-> [rocprof] 17, 8.750, 0.09,12568.41,1436.39, 8.750, 0.18,12765.20,1458.88, 4.375, 0.18, 6559.42,1499.30, 17.500, 0.09,24755.33,1414.59, 8.750, 0.09,12419.55,1419.38
INFO |-> [rocprof] 18, 9.250, 0.09,13309.54,1438.87, 9.250, 0.18,13518.15,1461.42, 4.625, 0.18, 6940.45,1500.64, 18.500, 0.09,26214.40,1416.99, 9.250, 0.09,13085.10,1414.61
INFO |-> [rocprof] 20, 10.250, 0.09,14723.00,1436.39, 10.250, 0.19,14735.69,1437.63, 5.125, 0.18, 7683.89,1499.30, 20.500, 0.10,28377.30,1384.26, 10.250, 0.10,14426.72,1407.48
INFO |-> [rocprof] 22, 11.250, 0.09,16159.39,1436.39, 11.250, 0.20,15369.93,1366.22, 5.625, 0.18, 8381.11,1489.98, 22.500, 0.10,29864.21,1327.30, 11.250, 0.10,15147.82,1346.47
INFO |-> [rocprof] 24, 12.250, 0.09,17565.70,1433.93, 12.250, 0.20,16246.63,1326.26, 6.125, 0.18, 9101.85,1486.02, 24.500, 0.11,30223.66,1233.62, 12.250, 0.11,15429.35,1259.54
INFO |-> [rocprof] 28, 14.250, 0.09,20398.92,1431.50, 14.250, 0.22,17236.79,1209.60, 7.125, 0.18,10541.18,1479.46, 28.500, 0.12,32526.98,1141.30, 14.250, 0.12,16602.31,1165.07
INFO |-> [rocprof] 32, 16.250, 0.09,23104.22,1421.80, 16.250, 0.24,17971.49,1105.94, 8.125, 0.18,12010.06,1478.16, 32.500, 0.13,34249.70,1053.84, 16.250, 0.13,17081.93,1051.20
INFO |-> [rocprof] 40, 20.250, 0.10,27847.14,1375.17, 20.250, 0.29,18759.60, 926.40, 10.125, 0.19,14469.20,1429.06, 40.500, 0.15,35278.96, 871.09, 20.250, 0.15,17824.57, 880.23
INFO |-> [rocprof] 48, 24.250, 0.11,29354.07,1210.48, 24.250, 0.34,19109.68, 788.03, 12.125, 0.20,16068.15,1325.21, 48.500, 0.18,36035.89, 743.01, 24.250, 0.18,18359.44, 757.09
INFO |-> [rocprof] 56, 28.250, 0.12,31851.91,1127.50, 28.250, 0.39,19227.29, 680.61, 14.125, 0.22,17322.89,1226.40, 56.500, 0.21,36854.72, 652.30, 28.250, 0.20,18644.93, 660.00
INFO |-> [rocprof] 64, 32.250, 0.13,33858.64,1049.88, 32.250, 0.45,19385.94, 601.11, 16.125, 0.24,17786.35,1103.03, 64.500, 0.23,38291.78, 593.67, 32.250, 0.24,17880.47, 554.43
INFO |-> [rocprof] 80, 40.250, 0.16,34488.18, 856.85, 40.250, 0.55,19613.11, 487.28, 20.125, 0.29,18674.79, 927.94, 80.500, 0.28,38433.59, 477.44, 40.250, 0.29,18339.99, 455.65
INFO |-> [rocprof] 96, 48.250, 0.18,35977.61, 745.65, 48.250, 0.66,19767.91, 409.70, 24.125, 0.34,19011.18, 788.03, 96.500, 0.33,39011.73, 404.27, 48.250, 0.35,18626.18, 386.03
INFO |-> [rocprof] 128, 64.250, 0.43,20163.23, 313.82, 64.250, 0.86,19961.57, 310.69, 32.125, 0.44,19429.06, 604.80, 128.500, 0.43,39790.46, 309.65, 64.250, 0.46,18726.97, 291.47
INFO |-> [rocprof] 256, 128.250, 0.83,20641.38, 160.95, 128.250, 1.70,20207.13, 157.56, 64.125, 0.86,20007.93, 312.01, 256.500, 0.84,40991.71, 159.81, 128.250, 0.90,19132.64, 149.18
INFO |-> [rocprof] 512, 256.250, 1.64,20920.26, 81.64, 256.250, 3.38,20370.12, 79.49, 128.125, 1.69,20309.50, 158.51, 512.500, 1.65,41633.91, 81.24, 256.250, 1.78,19358.37, 75.54
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101517_2797786/input0_results_250307_101517
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/SQ_LEVEL_WAVES.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_0.txt
INFO |-> [rocprof] RPL: on '250307_101518' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_0.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101518_2797975'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101518_2797975/input0_results_250307_101518'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101518_2797975/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 8 metrics
INFO |-> [rocprof] SQ_INSTS_VALU_INT64, SQ_ACTIVE_INST_VMEM, SQ_ACTIVE_INST_VALU, SQ_ACTIVE_INST_SCA, SQ_ACTIVE_INST_MISC, SQ_ACTIVE_INST_FLAT, SQ_THREAD_CYCLES_VALU, SQ_LDS_BANK_CONFLICT
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 359.71,1438.85, 0.250, 0.18, 375.83,1503.33, 0.125, 0.18, 188.08,1504.67, 0.500, 0.09, 720.67,1441.34, 0.250, 0.09, 360.34,1441.34
INFO |-> [rocprof] 1, 0.750, 0.09, 1080.99,1441.33, 0.750, 0.18, 1125.48,1500.64, 0.375, 0.18, 566.28,1510.09, 1.500, 0.09, 2169.47,1446.31, 0.750, 0.09, 1082.87,1443.82
INFO |-> [rocprof] 2, 1.250, 0.09, 1798.57,1438.85, 1.250, 0.18, 1889.32,1511.45, 0.625, 0.18, 942.11,1507.38, 2.500, 0.09, 3609.56,1443.82, 1.250, 0.09, 1811.01,1448.81
INFO |-> [rocprof] 3, 1.750, 0.09, 2518.02,1438.87, 1.750, 0.18, 2642.66,1510.09, 0.875, 0.18, 1321.33,1510.09, 3.500, 0.09, 5035.99,1438.85, 1.750, 0.09, 2526.69,1443.82
INFO |-> [rocprof] 4, 2.250, 0.09, 3243.02,1441.34, 2.250, 0.18, 3391.60,1507.38, 1.125, 0.18, 1700.38,1511.45, 4.500, 0.09, 6508.33,1446.30, 2.250, 0.09, 3242.98,1441.33
INFO |-> [rocprof] 5, 2.750, 0.09, 4054.20,1474.26, 2.750, 0.18, 4145.27,1507.37, 1.375, 0.18, 2074.51,1508.73, 5.500, 0.09, 7913.78,1438.87, 2.750, 0.09, 3950.11,1436.41
INFO |-> [rocprof] 6, 3.250, 0.09, 4700.46,1446.30, 3.250, 0.18, 4838.12,1488.65, 1.625, 0.18, 2442.91,1503.33, 6.500, 0.09, 9336.64,1436.41, 3.250, 0.09, 4700.46,1446.30
INFO |-> [rocprof] 7, 3.750, 0.09, 5404.97,1441.33, 3.750, 0.18, 5577.50,1487.33, 1.875, 0.18, 2823.80,1506.03, 7.500, 0.09,10809.95,1441.33, 3.750, 0.09, 5405.03,1441.34
INFO |-> [rocprof] 8, 4.250, 0.09, 6146.82,1446.31, 4.250, 0.18, 6389.14,1503.33, 2.125, 0.18, 3191.71,1501.98, 8.500, 0.09,12209.45,1436.41, 4.250, 0.09, 6073.46,1429.05
INFO |-> [rocprof] 9, 4.750, 0.09, 6869.91,1446.30, 4.750, 0.18, 7147.16,1504.67, 2.375, 0.18, 3576.79,1506.02, 9.500, 0.09,13692.60,1441.33, 4.750, 0.09, 6788.06,1429.06
INFO |-> [rocprof] 10, 5.250, 0.09, 7593.14,1446.31, 5.250, 0.18, 7871.26,1499.29, 2.625, 0.18, 3949.77,1504.67, 10.500, 0.09,15082.26,1436.41, 5.250, 0.09, 7541.13,1436.41
INFO |-> [rocprof] 11, 5.750, 0.09, 8259.33,1436.41, 5.750, 0.18, 8590.25,1493.96, 2.875, 0.18, 4337.61,1508.73, 11.500, 0.09,16546.82,1438.85, 5.750, 0.09, 8175.34,1421.80
INFO |-> [rocprof] 12, 6.250, 0.09, 8946.89,1431.50, 6.250, 0.18, 9304.08,1488.65, 3.125, 0.18, 4656.15,1489.97, 12.500, 0.09,17985.67,1438.85, 6.250, 0.09, 8916.46,1426.63
INFO |-> [rocprof] 13, 6.750, 0.09, 9729.06,1441.34, 6.750, 0.18,10093.19,1495.29, 3.375, 0.18, 5019.72,1487.33, 13.500, 0.09,19325.29,1431.50, 6.750, 0.10, 9500.42,1407.47
INFO |-> [rocprof] 14, 7.250, 0.09,10431.69,1438.85, 7.250, 0.18,10745.07,1482.08, 3.625, 0.18, 5386.81,1486.02, 14.500, 0.09,20686.19,1426.63, 7.250, 0.09,10290.59,1419.39
INFO |-> [rocprof] 15, 7.750, 0.09,11113.11,1433.95, 7.750, 0.18,11455.75,1478.16, 3.875, 0.18, 5835.85,1506.03, 15.500, 0.09,21926.15,1414.59, 7.750, 0.10,10944.73,1412.22
INFO |-> [rocprof] 16, 8.250, 0.09,11769.73,1426.63, 8.250, 0.18,12151.93,1472.96, 4.125, 0.18, 6195.67,1501.98, 16.500, 0.10,23262.28,1409.84, 8.250, 0.09,11709.99,1419.39
INFO |-> [rocprof] 17, 8.750, 0.09,12590.11,1438.87, 8.750, 0.18,12720.96,1453.82, 4.375, 0.18, 6571.16,1501.98, 17.500, 0.09,24797.41,1416.99, 8.750, 0.09,12398.57,1416.98
INFO |-> [rocprof] 18, 9.250, 0.09,13332.41,1441.34, 9.250, 0.18,13447.87,1453.82, 4.625, 0.18, 6940.45,1500.64, 18.500, 0.10,26081.95,1409.84, 9.250, 0.09,13129.24,1419.38
INFO |-> [rocprof] 20, 10.250, 0.09,14647.75,1429.05, 10.250, 0.19,14863.05,1450.05, 5.125, 0.18, 7683.89,1499.30, 20.500, 0.10,28330.26,1381.96, 10.250, 0.10,14426.57,1407.47
INFO |-> [rocprof] 22, 11.250, 0.09,16159.39,1436.39, 11.250, 0.19,15534.38,1380.83, 5.625, 0.18, 8388.56,1491.30, 22.500, 0.10,29817.03,1325.20, 11.250, 0.10,15003.32,1333.63
INFO |-> [rocprof] 24, 12.250, 0.09,17446.60,1424.21, 12.250, 0.20,16081.29,1312.76, 6.125, 0.18, 9093.74,1484.69, 24.500, 0.11,30312.82,1237.26, 12.250, 0.11,15569.62,1270.99
INFO |-> [rocprof] 28, 14.250, 0.09,20329.32,1426.62, 14.250, 0.22,17336.78,1216.62, 7.125, 0.18,10606.60,1488.65, 28.500, 0.12,32975.62,1157.04, 14.250, 0.11,16741.82,1174.86
INFO |-> [rocprof] 32, 16.250, 0.09,23026.16,1416.99, 16.250, 0.24,17900.76,1101.59, 8.125, 0.18,11999.48,1476.86, 32.500, 0.13,34422.95,1059.17, 16.250, 0.13,17017.95,1047.26
INFO |-> [rocprof] 40, 20.250, 0.10,28077.57,1386.55, 20.250, 0.29,18790.72, 927.94, 10.125, 0.18,14732.73,1455.08, 40.500, 0.15,35837.17, 884.87, 20.250, 0.15,17843.30, 881.15
INFO |-> [rocprof] 48, 24.250, 0.11,29567.41,1219.27, 24.250, 0.34,19073.84, 786.55, 12.125, 0.20,16068.15,1325.21, 48.500, 0.18,36196.20, 746.31, 24.250, 0.18,18293.40, 754.37
INFO |-> [rocprof] 56, 28.250, 0.12,32329.63,1144.41, 28.250, 0.39,19211.75, 680.06, 14.125, 0.22,17284.98,1223.72, 56.500, 0.21,36854.90, 652.30, 28.250, 0.20,18703.70, 662.08
INFO |-> [rocprof] 64, 32.250, 0.13,33774.36,1047.27, 32.250, 0.45,19379.00, 600.90, 16.125, 0.24,17904.06,1110.33, 64.500, 0.23,38076.03, 590.33, 32.250, 0.24,17915.99, 555.53
INFO |-> [rocprof] 80, 40.250, 0.16,34665.23, 861.25, 40.250, 0.55,19607.38, 487.14, 20.125, 0.29,18654.10, 926.91, 80.500, 0.28,38521.29, 478.53, 40.250, 0.30,18310.15, 454.91
INFO |-> [rocprof] 96, 48.250, 0.18,35881.72, 743.66, 48.250, 0.66,19767.87, 409.70, 24.125, 0.34,18975.47, 786.55, 96.500, 0.33,39049.48, 404.66, 48.250, 0.35,18643.34, 386.39
INFO |-> [rocprof] 128, 64.250, 0.43,20201.06, 314.41, 64.250, 0.86,19939.46, 310.34, 32.125, 0.44,19436.15, 605.02, 128.500, 0.43,39775.97, 309.54, 64.250, 0.46,18733.56, 291.57
INFO |-> [rocprof] 256, 128.250, 0.83,20645.39, 160.98, 128.250, 1.70,20211.00, 157.59, 64.125, 0.86,19996.87, 311.84, 256.500, 0.84,40945.05, 159.63, 128.250, 0.90,19139.49, 149.24
INFO |-> [rocprof] 512, 256.250, 1.64,20912.20, 81.61, 256.250, 3.38,20361.52, 79.46, 128.125, 1.69,20305.74, 158.48, 512.500, 1.65,41585.53, 81.14, 256.250, 1.78,19354.90, 75.53
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101518_2797975/input0_results_250307_101518
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/pmc_perf_0.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_1.txt
INFO |-> [rocprof] RPL: on '250307_101519' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/pmc_perf_1.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101519_2798199'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101519_2798199/input0_results_250307_101519'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101519_2798199/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 7 metrics
INFO |-> [rocprof] SQ_LDS_IDX_ACTIVE, SQ_VALU_MFMA_BUSY_CYCLES, SQ_INSTS_VALU_MFMA_MOPS_I8, SQ_INSTS_VALU_MFMA_MOPS_F16, SQ_INSTS_VALU_MFMA_MOPS_BF16, SQ_INSTS_VALU_MFMA_MOPS_F32, SQ_INSTS_VALU_MFMA_MOPS_F64
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 360.33,1441.33, 0.250, 0.18, 377.18,1508.73, 0.125, 0.18, 188.59,1508.73, 0.500, 0.09, 715.74,1431.49, 0.250, 0.09, 360.96,1443.82
INFO |-> [rocprof] 1, 0.750, 0.09, 1086.60,1448.79, 0.750, 0.18, 1131.54,1508.73, 0.375, 0.18, 565.26,1507.37, 1.500, 0.09, 2161.99,1441.33, 0.750, 0.09, 1086.60,1448.79
INFO |-> [rocprof] 2, 1.250, 0.09, 1814.13,1451.30, 1.250, 0.18, 1882.52,1506.02, 0.625, 0.18, 943.80,1510.08, 2.500, 0.09, 3621.98,1448.79, 1.250, 0.09, 1807.87,1446.30
INFO |-> [rocprof] 3, 1.750, 0.09, 2522.32,1441.33, 1.750, 0.18, 2635.53,1506.02, 0.875, 0.18, 1318.95,1507.37, 3.500, 0.09, 5079.55,1451.30, 1.750, 0.09, 2531.02,1446.30
INFO |-> [rocprof] 4, 2.250, 0.09, 3259.79,1448.79, 2.250, 0.18, 3394.63,1508.73, 1.125, 0.18, 1694.28,1506.03, 4.500, 0.09, 6530.85,1451.30, 2.250, 0.09, 3248.57,1443.81
INFO |-> [rocprof] 5, 2.750, 0.09, 3977.36,1446.31, 2.750, 0.18, 4149.00,1508.73, 1.375, 0.18, 2074.50,1508.73, 5.500, 0.09, 7940.94,1443.81, 2.750, 0.09, 3970.47,1443.81
INFO |-> [rocprof] 6, 3.250, 0.09, 4708.58,1448.79, 3.250, 0.18, 4833.81,1487.33, 1.625, 0.18, 2462.75,1515.54, 6.500, 0.09, 9320.68,1433.95, 3.250, 0.09, 4700.46,1446.30
INFO |-> [rocprof] 7, 3.750, 0.09, 5432.98,1448.79, 3.750, 0.18, 5577.50,1487.33, 1.875, 0.18, 2818.72,1503.32, 7.500, 0.09,10809.95,1441.33, 3.750, 0.09, 5395.70,1438.85
INFO |-> [rocprof] 8, 4.250, 0.09, 6136.18,1443.81, 4.250, 0.18, 6383.38,1501.97, 2.125, 0.18, 3203.16,1507.37, 8.500, 0.09,12209.32,1436.39, 4.250, 0.09, 6115.13,1438.85
INFO |-> [rocprof] 9, 4.750, 0.09, 6869.91,1446.30, 4.750, 0.18, 7147.16,1504.67, 2.375, 0.18, 3586.45,1510.08, 9.500, 0.09,13739.81,1446.30, 4.750, 0.09, 6834.56,1438.85
INFO |-> [rocprof] 10, 5.250, 0.09, 7619.33,1451.30, 5.250, 0.18, 7885.35,1501.97, 2.625, 0.18, 3946.23,1503.33, 10.500, 0.09,15107.97,1438.85, 5.250, 0.09, 7553.98,1438.85
INFO |-> [rocprof] 11, 5.750, 0.09, 8316.20,1446.30, 5.750, 0.18, 8613.21,1497.95, 2.875, 0.18, 4329.80,1506.02, 11.500, 0.09,16462.11,1431.49, 5.750, 0.09, 8245.12,1433.93
INFO |-> [rocprof] 12, 6.250, 0.09, 9039.35,1446.30, 6.250, 0.18, 9328.86,1492.62, 3.125, 0.18, 4672.75,1495.28, 12.500, 0.09,18016.58,1441.33, 6.250, 0.09, 8886.14,1421.78
INFO |-> [rocprof] 13, 6.750, 0.09, 9745.70,1443.81, 6.750, 0.18,10147.40,1503.32, 3.375, 0.18, 5006.44,1483.39, 13.500, 0.09,19424.53,1438.85, 6.750, 0.09, 9580.80,1419.38
INFO |-> [rocprof] 14, 7.250, 0.09,10431.80,1438.87, 7.250, 0.18,10773.56,1486.01, 3.625, 0.18, 5386.78,1486.01, 14.500, 0.09,20686.19,1426.63, 7.250, 0.09,10308.03,1421.80
INFO |-> [rocprof] 15, 7.750, 0.09,11151.12,1438.85, 7.750, 0.18,11506.37,1484.69, 3.875, 0.18, 5830.61,1504.67, 15.500, 0.09,22000.35,1419.38, 7.750, 0.09,11000.17,1419.38
INFO |-> [rocprof] 16, 8.250, 0.09,11809.77,1431.49, 8.250, 0.18,12077.77,1463.97, 4.125, 0.18, 6195.67,1501.98, 16.500, 0.09,23340.74,1414.59, 8.250, 0.10,11611.63,1407.47
INFO |-> [rocprof] 17, 8.750, 0.09,12568.41,1436.39, 8.750, 0.18,12709.94,1452.56, 4.375, 0.18, 6553.56,1497.96, 17.500, 0.09,24839.36,1419.39, 8.750, 0.09,12419.55,1419.38
INFO |-> [rocprof] 18, 9.250, 0.09,13332.27,1441.33, 9.250, 0.18,13506.39,1460.15, 4.625, 0.18, 6934.24,1499.30, 18.500, 0.10,25907.78,1400.42, 9.250, 0.10,13062.93,1412.21
INFO |-> [rocprof] 20, 10.250, 0.09,14748.25,1438.85, 10.250, 0.19,14799.10,1443.81, 5.125, 0.18, 7704.55,1503.33, 20.500, 0.10,28613.09,1395.76, 10.250, 0.09,14598.02,1424.20
INFO |-> [rocprof] 22, 11.250, 0.09,16159.39,1436.39, 11.250, 0.19,15496.04,1377.43, 5.625, 0.18, 8381.11,1489.98, 22.500, 0.10,29353.60,1304.60, 11.250, 0.10,15270.37,1357.37
INFO |-> [rocprof] 24, 12.250, 0.09,17565.70,1433.93, 12.250, 0.20,16337.03,1333.64, 6.125, 0.18, 9101.85,1486.02, 24.500, 0.11,30812.44,1257.65, 12.250, 0.11,15593.39,1272.93
INFO |-> [rocprof] 28, 14.250, 0.09,20364.17,1429.06, 14.250, 0.22,17349.29,1217.49, 7.125, 0.18,10597.25,1487.33, 28.500, 0.12,32438.71,1138.20, 14.250, 0.11,16859.89,1183.15
INFO |-> [rocprof] 32, 16.250, 0.09,23026.16,1416.99, 16.250, 0.24,18078.75,1112.54, 8.125, 0.18,12010.06,1478.16, 32.500, 0.13,34292.78,1055.16, 16.250, 0.13,17103.36,1052.51
INFO |-> [rocprof] 40, 20.250, 0.10,28217.49,1393.46, 20.250, 0.29,18769.96, 926.91, 10.125, 0.19,14555.99,1437.63, 40.500, 0.15,35611.78, 879.30, 20.250, 0.15,17918.47, 884.86
INFO |-> [rocprof] 48, 24.250, 0.11,29653.34,1222.82, 24.250, 0.34,19082.79, 786.92, 12.125, 0.20,16144.58,1331.51, 48.500, 0.18,36228.43, 746.98, 24.250, 0.18,18293.40, 754.37
INFO |-> [rocprof] 56, 28.250, 0.12,32507.29,1150.70, 28.250, 0.39,19242.91, 681.16, 14.125, 0.22,17373.61,1229.99, 56.500, 0.20,37056.61, 655.87, 28.250, 0.20,18644.83, 659.99
INFO |-> [rocprof] 64, 32.250, 0.13,34201.07,1060.50, 32.250, 0.45,19385.90, 601.11, 16.125, 0.24,17939.68,1112.54, 64.500, 0.23,38291.61, 593.67, 32.250, 0.24,17915.92, 555.53
INFO |-> [rocprof] 80, 40.250, 0.16,34665.23, 861.25, 40.250, 0.55,19618.77, 487.42, 20.125, 0.29,18643.73, 926.40, 80.500, 0.28,38499.19, 478.25, 40.250, 0.29,18329.97, 455.40
INFO |-> [rocprof] 96, 48.250, 0.18,36105.78, 748.31, 48.250, 0.65,19782.37, 410.00, 24.125, 0.34,19046.91, 789.51, 96.500, 0.33,39049.36, 404.66, 48.250, 0.35,18626.18, 386.03
INFO |-> [rocprof] 128, 64.250, 0.43,20163.23, 313.82, 64.250, 0.86,19954.23, 310.57, 32.125, 0.44,19436.10, 605.01, 128.500, 0.43,39805.34, 309.77, 64.250, 0.46,18779.21, 292.28
INFO |-> [rocprof] 256, 128.250, 0.83,20653.29, 161.04, 128.250, 1.70,20209.08, 157.58, 64.125, 0.86,20008.00, 312.02, 256.500, 0.84,40984.00, 159.78, 128.250, 0.90,19146.28, 149.29
INFO |-> [rocprof] 512, 256.250, 1.64,20912.17, 81.61, 256.250, 3.38,20365.35, 79.47, 128.125, 1.69,20303.79, 158.47, 512.500, 1.65,41589.60, 81.15, 256.250, 1.78,19353.19, 75.52
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101519_2798199/input0_results_250307_101519
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/pmc_perf_1.csv' is generating
INFO |-> [rocprof]
INFO [profiling] Current input file: /work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/timestamps.txt
INFO |-> [rocprof] RPL: on '250307_101520' from '/opt/rocm-6.3.1' in '/work1/amd/colramos/audacious/omniperf'
INFO |-> [rocprof] RPL: profiling '""/work1/amd/colramos/dev/mixbench/build/mixbench-hip""'
INFO |-> [rocprof] RPL: input file '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/perfmon/timestamps.txt'
INFO |-> [rocprof] RPL: output dir '/tmp/rpl_data_250307_101520_2798403'
INFO |-> [rocprof] RPL: result dir '/tmp/rpl_data_250307_101520_2798403/input0_results_250307_101520'
INFO |-> [rocprof] mixbench-hip (v0.04-14-g3dc1cdc)
INFO |-> [rocprof] ROCProfiler: input from "/tmp/rpl_data_250307_101520_2798403/input0.xml"
INFO |-> [rocprof] gpu_index =
INFO |-> [rocprof] kernel =
INFO |-> [rocprof] range =
INFO |-> [rocprof] 0 metrics
INFO |-> [rocprof] ------------------------ Device specifications ------------------------
INFO |-> [rocprof] Device:
INFO |-> [rocprof] CUDA driver version: 60342.133
INFO |-> [rocprof] GPU clock rate: 1700 MHz
INFO |-> [rocprof] WarpSize: 64
INFO |-> [rocprof] L2 cache size: 8192 KB
INFO |-> [rocprof] Total global mem: 65520 MB
INFO |-> [rocprof] Total SPs: 13312 (104 MPs x 128 SPs/MP)
INFO |-> [rocprof] Compute throughput: 45260.80 GFlops (theoretical single precision FMAs)
INFO |-> [rocprof] Memory bandwidth: 1638.40 GB/sec
INFO |-> [rocprof] -----------------------------------------------------------------------
INFO |-> [rocprof] Total GPU memory 68702699520, free 67905781760
INFO |-> [rocprof] Buffer size: 256MB
INFO |-> [rocprof] Trade-off type: compute with global memory (block strided)
INFO |-> [rocprof] Elements per thread: 8
INFO |-> [rocprof] Thread fusion degree: 1
INFO |-> [rocprof] ----------------------------------------------------------------------------- CSV data -------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof] Experiment ID, Single Precision ops,,,, Packed Single Precision ops,,,, Double precision ops,,,, Half precision ops,,,, Integer operations,,,
INFO |-> [rocprof] Compute iters, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Flops/byte, ex.time, GFLOPS, GB/sec, Iops/byte, ex.time, GIOPS, GB/sec
INFO |-> [rocprof] 0, 0.250, 0.09, 372.49,1489.97, 0.250, 0.18, 383.39,1533.56, 0.125, 0.18, 191.52,1532.16, 0.500, 0.09, 746.32,1492.63, 0.250, 0.09, 373.15,1492.62
INFO |-> [rocprof] 1, 0.750, 0.09, 1123.46,1497.95, 0.750, 0.18, 1147.02,1529.36, 0.375, 0.18, 573.51,1529.36, 1.500, 0.09, 2242.94,1495.30, 0.750, 0.09, 1123.46,1497.95
INFO |-> [rocprof] 2, 1.250, 0.09, 1869.12,1495.30, 1.250, 0.18, 1906.48,1525.18, 0.625, 0.18, 956.72,1530.76, 2.500, 0.09, 3738.20,1495.28, 1.250, 0.09, 1872.46,1497.97
INFO |-> [rocprof] 3, 1.750, 0.09, 2612.08,1492.62, 1.750, 0.17, 2686.18,1534.96, 0.875, 0.18, 1340.64,1532.16, 3.500, 0.09, 5252.26,1500.65, 1.750, 0.09, 2616.74,1495.28
INFO |-> [rocprof] 4, 2.250, 0.09, 3376.45,1500.65, 2.250, 0.18, 3441.07,1529.36, 1.125, 0.18, 1723.67,1532.15, 4.500, 0.09, 6740.77,1497.95, 2.250, 0.09, 3364.38,1495.28
INFO |-> [rocprof] 5, 2.750, 0.09, 4112.02,1495.28, 2.750, 0.18, 4205.73,1529.36, 1.375, 0.17, 2112.49,1536.36, 5.500, 0.09, 8209.49,1492.63, 2.750, 0.09, 4082.91,1484.69
INFO |-> [rocprof] 6, 3.250, 0.09, 4868.39,1497.97, 3.250, 0.18, 4894.58,1506.03, 1.625, 0.18, 2492.02,1533.55, 6.500, 0.09, 9650.50,1484.69, 3.250, 0.09, 4877.10,1500.65
INFO |-> [rocprof] 7, 3.750, 0.09, 5597.32,1492.62, 3.750, 0.18, 5652.67,1507.38, 1.875, 0.17, 2878.05,1534.96, 7.500, 0.09,11214.59,1495.28, 3.750, 0.09, 5607.30,1495.28
INFO |-> [rocprof] 8, 4.250, 0.09, 6355.01,1495.30, 4.250, 0.18, 6493.88,1527.97, 2.125, 0.17, 3261.79,1534.96, 8.500, 0.09,12664.72,1489.97, 4.250, 0.09, 6343.63,1492.62
INFO |-> [rocprof] 9, 4.750, 0.09, 7090.02,1492.63, 4.750, 0.18, 7251.26,1526.58, 2.375, 0.18, 3635.55,1530.76, 9.500, 0.09,14179.87,1492.62, 4.750, 0.09, 7090.02,1492.63
INFO |-> [rocprof] 10, 5.250, 0.09, 7850.21,1495.28, 5.250, 0.18, 7992.73,1522.42, 2.625, 0.18, 4014.58,1529.36, 10.500, 0.09,15644.83,1489.98, 5.250, 0.09, 7808.46,1487.33
INFO |-> [rocprof] 11, 5.750, 0.09, 8628.71,1500.65, 5.750, 0.18, 8714.40,1515.55, 2.875, 0.18, 4400.91,1530.75, 11.500, 0.09,17073.97,1484.69, 5.750, 0.09, 8491.99,1476.87
INFO |-> [rocprof] 12, 6.250, 0.09, 9328.86,1492.62, 6.250, 0.18, 9345.55,1495.29, 3.125, 0.18, 4706.33,1506.03, 12.500, 0.09,18493.20,1479.46, 6.250, 0.09, 9263.04,1482.09
INFO |-> [rocprof] 13, 6.750, 0.09,10057.28,1489.97, 6.750, 0.18,10276.37,1522.42, 3.375, 0.18, 5101.12,1511.44, 13.500, 0.09,20008.16,1482.09, 6.750, 0.09, 9986.33,1479.46
INFO |-> [rocprof] 14, 7.250, 0.09,10840.89,1495.30, 7.250, 0.18,10977.81,1514.18, 3.625, 0.18, 5498.83,1516.92, 14.500, 0.09,21528.05,1484.69, 7.250, 0.09,10707.29,1476.87
INFO |-> [rocprof] 15, 7.750, 0.09,11547.37,1489.98, 7.750, 0.18,11671.63,1506.02, 3.875, 0.18, 5931.66,1530.75, 15.500, 0.09,22972.08,1482.07, 7.750, 0.09,11365.68,1466.54
INFO |-> [rocprof] 16, 8.250, 0.09,12270.43,1487.33, 8.250, 0.18,12004.45,1455.08, 4.125, 0.18, 6291.42,1525.19, 16.500, 0.09,24113.33,1461.41, 8.250, 0.09,12120.01,1469.09
INFO |-> [rocprof] 17, 8.750, 0.09,13060.41,1492.62, 8.750, 0.18,12787.44,1461.42, 4.375, 0.18, 6666.62,1523.80, 17.500, 0.09,25754.50,1471.69, 8.750, 0.09,12832.08,1466.52
INFO |-> [rocprof] 18, 9.250, 0.09,13782.20,1489.97, 9.250, 0.18,13565.42,1466.53, 4.625, 0.18, 7079.72,1530.75, 18.500, 0.09,27226.18,1471.69, 9.250, 0.09,13565.35,1466.52
INFO |-> [rocprof] 20, 10.250, 0.09,15272.16,1489.97, 10.250, 0.18,15018.82,1465.25, 5.125, 0.18, 7781.24,1518.29, 20.500, 0.09,29907.21,1458.89, 10.250, 0.09,14979.49,1461.41
INFO |-> [rocprof] 22, 11.250, 0.09,16732.60,1487.34, 11.250, 0.19,15954.58,1418.18, 5.625, 0.18, 8524.96,1515.55, 22.500, 0.10,30890.63,1372.92, 11.250, 0.10,15728.64,1398.10
INFO |-> [rocprof] 24, 12.250, 0.09,18123.34,1479.46, 12.250, 0.20,16804.48,1371.79, 6.125, 0.18, 9257.65,1511.45, 24.500, 0.10,31716.19,1294.54, 12.250, 0.10,16233.72,1325.20
INFO |-> [rocprof] 28, 14.250, 0.09,21082.25,1479.46, 14.250, 0.22,17735.48,1244.59, 7.125, 0.18,10711.20,1503.33, 28.500, 0.12,33250.80,1166.69, 14.250, 0.11,17199.51,1206.98
INFO |-> [rocprof] 32, 16.250, 0.09,23872.75,1469.09, 16.250, 0.24,18408.34,1132.82, 8.125, 0.18,12149.14,1495.28, 32.500, 0.12,35223.20,1083.79, 16.250, 0.12,17703.09,1089.42
INFO |-> [rocprof] 40, 20.250, 0.09,29237.09,1443.81, 20.250, 0.29,18874.24, 932.06, 10.125, 0.18,14758.33,1457.61, 40.500, 0.15,36374.10, 898.13, 20.250, 0.15,18226.20, 900.06
INFO |-> [rocprof] 48, 24.250, 0.11,30728.66,1267.16, 24.250, 0.34,19091.69, 787.29, 12.125, 0.20,16592.39,1368.44, 48.500, 0.18,37154.81, 766.08, 24.250, 0.17,18714.13, 771.72
INFO |-> [rocprof] 56, 28.250, 0.12,32731.51,1158.64, 28.250, 0.39,19297.75, 683.11, 14.125, 0.22,17605.94,1246.44, 56.500, 0.20,37946.48, 671.62, 28.250, 0.20,18988.54, 672.16
INFO |-> [rocprof] 64, 32.250, 0.12,34952.25,1083.79, 32.250, 0.44,19455.65, 603.28, 16.125, 0.24,18242.10,1131.29, 64.500, 0.22,38758.08, 600.90, 32.250, 0.24,18083.58, 560.73
INFO |-> [rocprof] 80, 40.250, 0.15,35244.18, 875.63, 40.250, 0.55,19698.89, 489.41, 20.125, 0.28,19054.12, 946.79, 80.500, 0.28,38853.74, 482.66, 40.250, 0.29,18480.52, 459.14
INFO |-> [rocprof] 96, 48.250, 0.17,37166.94, 770.30, 48.250, 0.65,19850.31, 411.41, 24.125, 0.34,19273.65, 798.91, 96.500, 0.33,39353.22, 407.81, 48.250, 0.35,18764.34, 388.90
INFO |-> [rocprof] 128, 64.250, 0.43,20269.43, 315.48, 64.250, 0.86,19991.26, 311.15, 32.125, 0.44,19541.86, 608.31, 128.500, 0.43,40027.06, 311.49, 64.250, 0.46,18825.17, 293.00
INFO |-> [rocprof] 256, 128.250, 0.83,20704.99, 161.44, 128.250, 1.70,20250.94, 157.90, 64.125, 0.86,20064.00, 312.89, 256.500, 0.84,41109.34, 160.27, 128.250, 0.90,19197.57, 149.69
INFO |-> [rocprof] 512, 256.250, 1.64,20952.97, 81.77, 256.250, 3.37,20387.59, 79.56, 128.125, 1.69,20332.64, 158.69, 512.500, 1.65,41658.26, 81.28, 256.250, 1.77,19388.17, 75.66
INFO |-> [rocprof] ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
INFO |-> [rocprof]
INFO |-> [rocprof] ROCPRofiler: 497 contexts collected, output directory /tmp/rpl_data_250307_101520_2798403/input0_results_250307_101520
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:134: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:135: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:136: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ts_pattern = re.compile(", time\((\d*),(\d*),(\d*),(\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:140: SyntaxWarning: invalid escape sequence '\s'
INFO |-> [rocprof] var_pattern = re.compile("^\s*([a-zA-Z0-9_]+(?:\[\d+\])?)\s+\((\d+(?:\.\d+)?)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:141: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] pid_pattern = re.compile("pid\((\d*)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:419: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] ptrn1_field = re.compile(r"^.* " + field + "\(")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:432: SyntaxWarning: invalid escape sequence '\('
INFO |-> [rocprof] field + "\(\w+\)([ \)])", field + "(" + str(val) + ")\\1", args, count=1
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:496: SyntaxWarning: invalid escape sequence '\w'
INFO |-> [rocprof] prop_pattern = re.compile("([\w-]+)\((\w+)\)")
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/tblextr.py:497: SyntaxWarning: invalid escape sequence '\['
INFO |-> [rocprof] beg_pattern = re.compile('^dispatch\[(\d*)\], (.*) kernel-name\("([^"]*)"\)')
INFO |-> [rocprof] /opt/rocm-6.3.1/libexec/rocprofiler/mem_manager.py:124: SyntaxWarning: invalid escape sequence '\d'
INFO |-> [rocprof] size_ptrn = re.compile(DELIM + "Size=(\d+)" + DELIM)
INFO |-> [rocprof] File '/work1/amd/colramos/audacious/omniperf/workloads/mixbench_test_sol/MI200/timestamps.csv' is generating
INFO |-> [rocprof]
INFO [roofline] Skipping roofline
$ ./src/rocprof-compute analyze -p workloads/mixbench_test_sol/MI200/
__ _
_ __ ___ ___ _ __ _ __ ___ / _| ___ ___ _ __ ___ _ __ _ _| |_ ___
| '__/ _ \ / __| '_ \| '__/ _ \| |_ _____ / __/ _ \| '_ ` _ \| '_ \| | | | __/ _ \
| | | (_) | (__| |_) | | | (_) | _|_____| (_| (_) | | | | | | |_) | |_| | || __/
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___|
|_| |_|
INFO Analysis mode = cli
INFO [analysis] deriving rocprofiler-compute metrics...
WARNING Couldn't load roofline.csv. This may result in missing analysis data.
--------------------------------------------------------------------------------
0. Top Stats
0.1 Top Kernels
╒════╤══════════════════════════════════════════╤═════════╤═════════════╤════════════╤══════════════╤═══════╕
│ │ Kernel_Name │ Count │ Sum(ns) │ Mean(ns) │ Median(ns) │ Pct │
╞════╪══════════════════════════════════════════╪═════════╪═════════════╪════════════╪══════════════╪═══════╡
│ 0 │ void benchmark_func<HIP_vector_type<floa │ 3.00 │ 10122632.00 │ 3374210.67 │ 3373944.00 │ 8.24 │
│ │ t, 2u>, 256, 8u, 512u>(HIP_vector_type<f │ │ │ │ │ │
│ │ loat, 2u>, HIP_vector_type<float, 2u>... │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 1 │ void benchmark_func<int, 256, 8u, 512u>( │ 3.00 │ 5322116.00 │ 1774038.67 │ 1773932.00 │ 4.33 │
│ │ int, int*) [clone .kd] │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 2 │ void benchmark_func<HIP_vector_type<floa │ 3.00 │ 5100516.00 │ 1700172.00 │ 1700172.00 │ 4.15 │
│ │ t, 2u>, 256, 8u, 256u>(HIP_vector_type<f │ │ │ │ │ │
│ │ loat, 2u>, HIP_vector_type<float, 2u>... │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 3 │ void benchmark_func<double, 256, 8u, 512 │ 3.00 │ 5075234.00 │ 1691744.67 │ 1691692.00 │ 4.13 │
│ │ u>(double, double*) [clone .kd] │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 4 │ void benchmark_func<__half2, 256, 8u, 51 │ 3.00 │ 4954273.00 │ 1651424.33 │ 1651371.00 │ 4.03 │
│ │ 2u>(__half2, __half2*) [clone .kd] │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 5 │ void benchmark_func<float, 256, 8u, 512u │ 3.00 │ 4925155.00 │ 1641718.33 │ 1641452.00 │ 4.01 │
│ │ >(float, float*) [clone .kd] │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 6 │ void benchmark_func<int, 256, 8u, 256u>( │ 3.00 │ 2690098.00 │ 896699.33 │ 896646.00 │ 2.19 │
│ │ int, int*) [clone .kd] │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 7 │ void benchmark_func<HIP_vector_type<floa │ 3.00 │ 2588978.00 │ 862992.67 │ 862726.00 │ 2.11 │
│ │ t, 2u>, 256, 8u, 128u>(HIP_vector_type<f │ │ │ │ │ │
│ │ loat, 2u>, HIP_vector_type<float, 2u>... │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 8 │ void benchmark_func<double, 256, 8u, 256 │ 3.00 │ 2574098.00 │ 858032.67 │ 857926.00 │ 2.10 │
│ │ u>(double, double*) [clone .kd] │ │ │ │ │ │
├────┼──────────────────────────────────────────┼─────────┼─────────────┼────────────┼──────────────┼───────┤
│ 9 │ void benchmark_func<__half2, 256, 8u, 25 │ 3.00 │ 2513777.00 │ 837925.67 │ 837926.00 │ 2.05 │
│ │ 6u>(__half2, __half2*) [clone .kd] │ │ │ │ │ │
╘════╧══════════════════════════════════════════╧═════════╧═════════════╧════════════╧══════════════╧═══════╛
0.2 Dispatch List
╒════╤═══════════════╤══════════════════════════════════════════════════════════════════════════════════╤══════════╕
│ │ Dispatch_ID │ Kernel_Name │ GPU_ID │
╞════╪═══════════════╪══════════════════════════════════════════════════════════════════════════════════╪══════════╡
│ 0 │ 0 │ __amd_rocclr_fillBufferAligned.kd │ 2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 1 │ 1 │ void benchmark_func<short, 256, 8u, 0u>(short, short*) [clone .kd] │ 2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 2 │ 2 │ void benchmark_func<float, 256, 8u, 0u>(float, float*) [clone .kd] │ 2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 3 │ 3 │ void benchmark_func<float, 256, 8u, 0u>(float, float*) [clone .kd] │ 2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 4 │ 4 │ void benchmark_func<float, 256, 8u, 0u>(float, float*) [clone .kd] │ 2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 5 │ 5 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 0u>(HIP_vector_type<flo │ 2 │
│ │ │ at, 2u>, HIP_vector_type<float, 2u>*) [clone .kd] │ │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 6 │ 6 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 0u>(HIP_vector_type<flo │ 2 │
│ │ │ at, 2u>, HIP_vector_type<float, 2u>*) [clone .kd] │ │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 7 │ 7 │ void benchmark_func<HIP_vector_type<float, 2u>, 256, 8u, 0u>(HIP_vector_type<flo │ 2 │
│ │ │ at, 2u>, HIP_vector_type<float, 2u>*) [clone .kd] │ │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 8 │ 8 │ void benchmark_func<double, 256, 8u, 0u>(double, double*) [clone .kd] │ 2 │
├────┼───────────────┼──────────────────────────────────────────────────────────────────────────────────┼──────────┤
│ 9 │ 9 │ void benchmark_func<double, 256, 8u, 0u>(double, double*) [clone .kd] │ 2 │
╘════╧═══════════════╧══════════════════════════════════════════════════════════════════════════════════╧══════════╛
--------------------------------------------------------------------------------
1. System Info
╒════════════════════════╤═════════════════════════════════════════════════════╕
│ │ Info │
╞════════════════════════╪═════════════════════════════════════════════════════╡
│ workload_name │ mixbench_test_sol │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ command │ /work1/amd/colramos/dev/mixbench/build/mixbench-hip │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ ip_blocks │ SQ|LDS|SQC|TA|TD|TCP|TCC|SPI|CPC|CPF │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ timestamp │ Fri 07 Mar 2025 10:15:12 AM (CST) │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ version │ 3 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ hostname │ login1.hpcfund │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cpu_model │ AMD EPYC 7V13 64-Core Processor │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ sbios │ American Megatrends Inc.0602 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ linux_distro │ Rocky Linux 9.4 (Blue Onyx) │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ linux_kernel_version │ 5.14.0-162.18.1.el9_1.x86_64 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ amd_gpu_kernel_version │ nan │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cpu_memory │ 527651060 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_memory │ nan │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ rocm_version │ 6.3.1-48 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ vbios │ 113-D67301V-073 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ compute_partition │ nan │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ memory_partition │ nan │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_series │ MI200 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_model │ MI200 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_arch │ gfx90a │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_l1 │ 16 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ gpu_l2 │ 8192 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cu_per_gpu │ 104 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ simd_per_cu │ 4 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ se_per_gpu │ 8 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ wave_size │ 64 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ workgroup_max_size │ 1024 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ chip_id │ 29711 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ max_waves_per_cu │ 32 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ max_sclk │ 1700 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ max_mclk │ 1600 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cur_sclk │ 1700 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ cur_mclk │ 1600 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ total_l2_chan │ 32 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ lds_banks_per_cu │ 32 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ sqc_per_gpu │ 56 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ pipes_per_gpu │ 4 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ hbm_bw │ 1638.4 │
├────────────────────────┼─────────────────────────────────────────────────────┤
│ num_xcd │ 1 │
╘════════════════════════╧═════════════════════════════════════════════════════╛
INFO Not showing table not selected during profiling: 2.1 Speed-of-Light
INFO Not showing table not selected during profiling: 3.1 Memory Chart
INFO Not showing table not selected during profiling: 4.1 Roofline
INFO Not showing table not selected during profiling: 5.1 Command Processor Fetcher
INFO Not showing table not selected during profiling: 5.2 Packet Processor
INFO Not showing table not selected during profiling: 6.1 Workgroup Manager Utilizations
INFO Not showing table not selected during profiling: 6.2 Workgroup Manager - Resource Allocation
INFO Not showing table not selected during profiling: 7.1 Wavefront Launch Stats
INFO Not showing table not selected during profiling: 7.2 Wavefront Runtime Stats
INFO Not showing table not selected during profiling: 10.1 Overall Instruction Mix
INFO Not showing table not selected during profiling: 10.2 VALU Arithmetic Instr Mix
INFO Not showing table not selected during profiling: 10.3 VMEM Instr Mix
INFO Not showing table not selected during profiling: 10.4 MFMA Arithmetic Instr Mix
INFO Not showing table not selected during profiling: 11.1 Speed-of-Light
INFO Not showing table not selected during profiling: 11.2 Pipeline Stats
INFO Not showing table not selected during profiling: 11.3 Arithmetic Operations
INFO Not showing table not selected during profiling: 12.1 Speed-of-Light
INFO Not showing table not selected during profiling: 12.2 LDS Stats
INFO Not showing table not selected during profiling: 13.1 Speed-of-Light
INFO Not showing table not selected during profiling: 13.2 Instruction Cache Accesses
INFO Not showing table not selected during profiling: 13.3 Instruction Cache - L2 Interface
INFO Not showing table not selected during profiling: 14.1 Speed-of-Light
INFO Not showing table not selected during profiling: 14.2 Scalar L1D Cache Accesses
INFO Not showing table not selected during profiling: 14.3 Scalar L1D Cache - L2 Interface
INFO Not showing table not selected during profiling: 15.1 Address Processing Unit
INFO Not showing table not selected during profiling: 15.2 Data-Return Path
INFO Not showing table not selected during profiling: 16.1 Speed-of-Light
INFO Not showing table not selected during profiling: 16.2 L1D Cache Stalls (%)
INFO Not showing table not selected during profiling: 16.3 L1D Cache Accesses
INFO Not showing table not selected during profiling: 16.4 L1D - L2 Transactions
INFO Not showing table not selected during profiling: 16.5 L1D Addr Translation
INFO Not showing table not selected during profiling: 17.1 Speed-of-Light
INFO Not showing table not selected during profiling: 17.2 L2 - Fabric Transactions
INFO Not showing table not selected during profiling: 17.3 L2 Cache Accesses
INFO Not showing table not selected during profiling: 17.4 L2 - Fabric Interface Stalls
INFO Not showing table not selected during profiling: 17.5 L2 - Fabric Detailed Transaction Breakdown
INFO Not showing table not selected during profiling: 18.1 Aggregate Stats (All channels)
INFO Not showing table not selected during profiling: 18.2 L2 Cache Hit Rate (pct)
INFO Not showing table not selected during profiling: 18.3 L2 Requests (per normUnit)
INFO Not showing table not selected during profiling: 18.4 L2 Requests (per normUnit)
INFO Not showing table not selected during profiling: 18.5 L2-Fabric Requests (per normUnit)
INFO Not showing table not selected during profiling: 18.6 L2-Fabric Read Latency (Cycles)
INFO Not showing table not selected during profiling: 18.7 L2-Fabric Write and Atomic Latency (Cycles)
INFO Not showing table not selected during profiling: 18.8 L2-Fabric Atomic Latency (Cycles)
INFO Not showing table not selected during profiling: 18.9 L2-Fabric Read Stall (Cycles per normUnit)
INFO Not showing table not selected during profiling: 18.10 L2-Fabric Write and Atomic Stall (Cycles per normUnit)
Other than this, looks good aside from the few minor comments below
analysis report which focuses on metrics associated with a hardware component or | ||
a group of hardware components. All profiling results are accumulated in the same | ||
target directory without overwriting those for other hardware components. | ||
This enables incremental profiling and analysis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is potentially the first time a new user is being exposed to the idea of "report block filtering" it may be useful to add a sentence that explains to map analysis blocks to section id's. If we explain it already, we can just link to it, but it's not obvious for a new user how to know which number to use
docs/how-to/use.rst
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On line 83, in the "Analyze in the command line" section, we may want to change "hardware block filters" -> "hardware report block filters" for consistency
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Want to change the example we print in rocprof-compute profile --help
to use the new -b
/--block
formatting? Since it's still showing users the soon to be deprecated format
metavar="", | ||
nargs="?", | ||
const="", | ||
help=print_avail_arch(supported_archs.keys()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit picky, but i see print_avail_arch()
only prints with two tabs, while all other sections use three. This makes for a little more ugly print out, i.e.,
(omniperf) [colramos@t008-006 omniperf]$ ./src/rocprof-compute profile -h
usage:
rocprof-compute profile --name <workload_name> [profile options] [roofline options] -- <profile_cmd>
---------------------------------------------------------------------------------
Examples:
rocprof-compute profile -n vcopy_all -- ./vcopy -n 1048576 -b 256
rocprof-compute profile -n vcopy_SPI_TCC -b SQ TCC -- ./vcopy -n 1048576 -b 256
rocprof-compute profile -n vcopy_kernel -k vecCopy -- ./vcopy -n 1048576 -b 256
rocprof-compute profile -n vcopy_disp -d 0 -- ./vcopy -n 1048576 -b 256
rocprof-compute profile -n vcopy_roof --roof-only -- ./vcopy -n 1048576 -b 256
---------------------------------------------------------------------------------
Help:
-h, --help show this help message and exit
General Options:
-v, --version show program's version number and exit
-V, --verbose Increase output verbosity (use multiple times for higher levels)
-q, --quiet Reduce output and run quietly.
-s, --specs Print system specs and exit.
Profile Options:
-n , --name Assign a name to workload.
-p , --path Specify path to save workload.
(DEFAULT: /work1/amd/colramos/audacious/omniperf/workloads/<name>)
--subpath Specify the type of subpath to save workload: node_name, gpu_model.
--hip-trace HIP trace, execturion trace for the entire application at the HIP level.
-k [ ...], --kernel [ ...] Kernel filtering.
-d [ ...], --dispatch [ ...] Dispatch ID filtering.
-b [ ...], --block [ ...] Specify metric id(s) from --list-metrics for filtering (e.g. 10, 4, 4.3).
Can provide multiple space separated arguments.
Can also accept Hardware blocks.
Hardware block filtering (to be deprecated soon):
SQ
SQC
TA
TD
TCP
TCC
SPI
CPC
CPF
--list-metrics [] List all available metrics for analysis on specified arch:
gfx906
gfx908
gfx90a
gfx940
gfx941
gfx942
--config-dir Specify the directory of customized report section configs.
--join-type Choose how to join rocprof runs: (DEFAULT: grid)
kernel (i.e. By unique kernel name dispatches)
grid (i.e. By unique kernel name + grid size dispatches)
--no-roof Profile without collecting roofline data.
-- [ ...] Provide command for profiling after double dash.
--spatial-multiplexing [ ...] Provide Node ID and GPU number per node.
--format-rocprof-output Set the format of output file of rocprof.
Standalone Roofline Options:
--roof-only Profile roofline data only.
--sort Overlay top kernels or top dispatches: (DEFAULT: kernels)
kernels
dispatches
-m [ ...], --mem-level [ ...] Filter by memory level: (DEFAULT: ALL)
HBM
L2
vL1D
LDS
--device Target GPU device ID. (DEFAULT: ALL)
--kernel-names Include kernel names in roofline plot.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That being said, I checked out how we format rocprof-compute analyze -h
and it uses two tabs. Since print_avail_arch()
is used in both analyze and profile mode, it may be worth checking to see if we can bump all profile options down to two tabs
src/argparser.py
Outdated
"--config-dir", | ||
dest="config_dir", | ||
metavar="", | ||
help="\t\tSpecify the directory of customized report section configs.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should also use three tabs for the same reason cited above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One other high-level comment for the arg parser. Since we've added some new options recently, it may make sense to go in and audit the order in which we print out options. For example, move --subpath
lower next to --spatial-multiplexing
and move -- [...]
higer up since its more commonly used
) | ||
profile_group.add_argument( | ||
"--list-metrics", | ||
metavar="", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can also use argparser's choices parameter to easily catch and handle invalid input (i.e. choices=supported_archs.keys()
)
This file does not have any metrics, so we can delete that
Yes that is a bug which I have fixed now, thanks for testing thoroughly @coleramos425 |
* Profiling mode changes - `-b` option now additionally accepts metric id(s), similar to `-b` option in analyze mode (e.g. 6, 6.2, 6.23) - Only counters mentioned in the selected analysis report blocks will be collected - Add parsing logic to identify hardware counters from analysis report blocks - Add filtering logic to only write filtered counters in perfmon files - Log not collected counters in one line - `--list-metrics` option added in profile mode to list possible metric id(s) similar to analyze mode - Write arguments provided during profiling in profiling_configuration.yaml file * Analysis mode changes - During analysis mode, only show report blocks selected during profiling - If `-b` option is provided in analysis mode, then follow provided filters - Do not show empty tables in analysis report * Miscellaneous changes - Update CHANGELOG - Add test cases - Instruction mix report block filter - Instruction mix and Memory chart report block filter - Instruction mix report block filter and CPC hardware block filter - TA hardware block filter - --list-metrics in profile mode should work - Move binary handler fixtures to conftest.py to avoid importing fixtures * Public documentation changes - Use the term "Hardware report block" instead of "Hardware block" - Add documentation for "--list-metrics" option in profile mode - Add example of filtering by hardware report block such as instruction mix and wavefront launch statistics - Add deprecation warning for hardware component (sq, tcc) based filtering
952ce42
to
75213ad
Compare
|_| \___/ \___| .__/|_| \___/|_| \___\___/|_| |_| |_| .__/ \__,_|\__\___| | ||
|_| |_| | ||
|
||
rocprofiler-compute version: 2.0.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe 2.0.0 -->x.x.x
args = self.__args | ||
for section in self.__filter_metric_ids: | ||
section_num = convert_metric_id_to_panel_idx(section) | ||
file_id = str(section_num // 100) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Eventually, we might want to centrelize the code in one place, not now
Profiling mode changes
-b
option now additionally accepts metric id(s), similar to-b
option in analyze mode (e.g. 6, 6.2, 6.23)--list-metrics
option added in profile mode to list possible metric id(s) similar to analyze modeAnalysis mode changes
-b
option is provided in analysis mode, then follow provided filtersMiscellaneous changes
fixtures
Public documentation changes
mix and wavefront launch statistics