Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize for_each_element folded loops #602

Merged
merged 3 commits into from
Feb 13, 2025
Merged

Conversation

dsharlet
Copy link
Owner

Currently, we evaluate folded loops very simply, by evaluating each iteration independently. This has high overhead, especially for for_each_contiguous_slice where the folded dimension is contiguous.

This PR adds logic to call the callbacks on linear chunks at a time when possible:

  • Between folding boundaries
  • When not crossing a buffer boundary

@dsharlet dsharlet requested a review from vksnk February 12, 2025 23:06
@dsharlet dsharlet merged commit afc5627 into main Feb 13, 2025
1 check passed
@dsharlet dsharlet deleted the ds/opt-for-each-folded branch February 13, 2025 00:21
dsharlet added a commit that referenced this pull request Feb 13, 2025
…603)

#602 allowed folded innermost dimensions that are contiguous to be
passed to `for_each_contiguous_slice` callbacks, but not non-innermost
dimensions. This PR fixes that.

It is a small regression in unaffected cases due to the parameter being
passed through each loop implementation function even when it isn't
used.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants