Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute example counter sampling not working on M4 #346

Open
LouisChouraki opened this issue Jan 22, 2025 · 2 comments
Open

Compute example counter sampling not working on M4 #346

LouisChouraki opened this issue Jan 22, 2025 · 2 comments

Comments

@LouisChouraki
Copy link

LouisChouraki commented Jan 22, 2025

Hello,

When running the compute example on a M4 Pro, the duration displayed is always 0ns, no matter the input size (I modified the code to print it as nanoseconds instead of microseconds). But the example works fine on M1-M3. I would assume this generalizes to all M4 chips but I couldn't test it.

I wanted to know if someone has an idea of what could explain this (and ideally how to fix this) as I didn't find much about it online.

@cwfitzgerald
Copy link
Member

Hmm, nothing comes to mind, but I haven't personally played around with an M4

@LouisChouraki
Copy link
Author

That's unfortunate.. I implemented a profiler to analyze most GPU-time consuming operators in graphs but numbers seems wrong.
Total GPU time adds up to total CPU time but per-operator times are wrong (for example, matmul reports 3% of the GPU usage when I know it is more about 60% if I remove them). But this is behavior is not observed on M1.
My first intuition was that thread concurrency or pipeline cascading kind of mess up the counter sampling, but I cannot find anything about it. It also feels strange I am the first noticing this

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants