Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IOByte computation in benchmarks #3721

Open
Priya2698 opened this issue Jan 16, 2025 · 3 comments
Open

IOByte computation in benchmarks #3721

Priya2698 opened this issue Jan 16, 2025 · 3 comments

Comments

@Priya2698
Copy link
Collaborator

Priya2698 commented Jan 16, 2025

We currently use the input-ouputs consumed by nvfuser definitions as reference for the IOBytes computation for all executors.
This has certain limitations:

  1. Requires manual effort to identify the reference IOBytes from nvfuser definitions when adding Thunder-nvfuser benchmarks (PR rope_benchmark #3550)
  2. There is also the possibility of the IOBytes being different between executors (torch.compile, eager and Thunder).
@naoyam
Copy link
Collaborator

naoyam commented Jan 21, 2025

There is also the possibility of the IOBytes being different between executors (torch.compile, eager and Thunder).

Why is this?

@Priya2698
Copy link
Collaborator Author

There is also the possibility of the IOBytes being different between executors (torch.compile, eager and Thunder).

Why is this?

If the executors save different variables for backward pass or choose to rematerialize any intermediate variables, that strategy may differ across executors, particularly for larger fusions. I don't think we see this in our current benchmarks though.

@naoyam
Copy link
Collaborator

naoyam commented Jan 22, 2025

There is also the possibility of the IOBytes being different between executors (torch.compile, eager and Thunder).

Why is this?

If the executors save different variables for backward pass or choose to rematerialize any intermediate variables, that strategy may differ across executors, particularly for larger fusions. I don't think we see this in our current benchmarks though.

If that's the case, I wonder if it still make sense to compare the performances between the backends because they don't seem to compare apples to apples.

Can this happen only between the torch.compile and thunder backends but not between the thunder-torch.compile and thunder backends?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants