You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When running the compute example on a M4 Pro, the duration displayed is always 0ns, no matter the input size (I modified the code to print it as nanoseconds instead of microseconds). But the example works fine on M1-M3. I would assume this generalizes to all M4 chips but I couldn't test it.
I wanted to know if someone has an idea of what could explain this (and ideally how to fix this) as I didn't find much about it online.
The text was updated successfully, but these errors were encountered:
That's unfortunate.. I implemented a profiler to analyze most GPU-time consuming operators in graphs but numbers seems wrong.
Total GPU time adds up to total CPU time but per-operator times are wrong (for example, matmul reports 3% of the GPU usage when I know it is more about 60% if I remove them). But this is behavior is not observed on M1.
My first intuition was that thread concurrency or pipeline cascading kind of mess up the counter sampling, but I cannot find anything about it. It also feels strange I am the first noticing this
Hello,
When running the compute example on a M4 Pro, the duration displayed is always 0ns, no matter the input size (I modified the code to print it as nanoseconds instead of microseconds). But the example works fine on M1-M3. I would assume this generalizes to all M4 chips but I couldn't test it.
I wanted to know if someone has an idea of what could explain this (and ideally how to fix this) as I didn't find much about it online.
The text was updated successfully, but these errors were encountered: