Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix the fixed warp tile used in global to shared memory load/store. #18

Open
lcy-seso opened this issue Dec 11, 2024 · 0 comments · May be fixed by #47
Open

Fix the fixed warp tile used in global to shared memory load/store. #18

lcy-seso opened this issue Dec 11, 2024 · 0 comments · May be fixed by #47
Assignees
Labels
enhancement New feature or request

Comments

@lcy-seso
Copy link
Contributor

The current global-to-shared load uses a fixed 16x16 base tile to align with TensorCore's warp tile requirement. This approach results in inefficiency when the overall problem size is large enough to support a larger warp tile for coalescing memory access.

@lcy-seso lcy-seso self-assigned this Dec 17, 2024
@haruhi55 haruhi55 added the enhancement New feature or request label Dec 18, 2024
@haruhi55 haruhi55 changed the title fix the fixed warp tile used in global to shared memory load/store. Fix the fixed warp tile used in global to shared memory load/store. Dec 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants