You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Cbench is the worst of the bunch, taking the CI more than a minute to complete:
8: Test command: /home/runner/work/charmlite/charmlite/charm/bin/charmrun "/home/runner/work/charmlite/charmlite/build/bin/pgm_cbench_benchmark" "+p2" "++ppn2"
8: Test timeout computed to be: 120
8:
8: Running as 1 OS processes: /home/runner/work/charmlite/charmlite/build/bin/pgm_cbench_benchmark ++ppn2
8: charmrun> /usr/bin/setarch x86_64 -R mpirun -np 1 /home/runner/work/charmlite/charmlite/build/bin/pgm_cbench_benchmark ++ppn2
8: Charm++> Running in SMP mode: 1 processes, 2 worker threads (PEs) + 1 comm threads per process, 2 PEs total
8: Charm++> The comm. thread both sends and receives messages
8: Converse/Charm++ Commit ID: v7.1.0-devel-122-g064b48915
8: Charm++> Using STL-based msgQ:
8: Charm++> Message priorities have been turned off and will not be respected.
8: main> rep 1 of 16
8: main> rep 2 of 16
8: main> rep 3 of 16
8: main> rep 4 of 16
8: main> rep 5 of 16
8: main> rep 6 of 16
8: main> rep 7 of 16
8: main> rep 8 of 16
8: main> rep 9 of 16
8: main> rep 10 of 16
8: main> rep 11 of 16
8: main> rep 12 of 16
8: main> rep 13 of 16
8: main> rep 14 of 16
8: main> rep 15 of 16
8: main> rep 16 of 16
8: info> interleaved 129 broadcasts and reductions across 8 chares
8: info> average time per repetition: 4453.8 ms
8: info> average time per broadcast+reduction: 34525.6 ns
8: [Partition 0][Node 0] End of program
8/10 Test #8: pgm_cbench_benchmark_pe2 ......... Passed 72.52 sec
It's not uncommon to see these 34525.6 ns broadcasts+reductions on an over-subscribed PC either! We should probably try to determine what's going on here, and why the performance is so bad for these configurations.
What I've tried so far:
Enabling or disabling +CmiSleepOnIdle.
Enabling or disabling cpu topology/affinity.
Using the lockless queue (--enable-lockless-queue).
Nothing seemed to improve the situation.
The text was updated successfully, but these errors were encountered:
In particular, for jacobi, cbench, and pingpong.
Cbench is the worst of the bunch, taking the CI more than a minute to complete:
It's not uncommon to see these 34525.6 ns broadcasts+reductions on an over-subscribed PC either! We should probably try to determine what's going on here, and why the performance is so bad for these configurations.
What I've tried so far:
+CmiSleepOnIdle
.--enable-lockless-queue
).Nothing seemed to improve the situation.
The text was updated successfully, but these errors were encountered: