-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce memory use in transeq for both backends #130
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good to me beside a minor change and addition of some comments. Overall I think it would be good to document what all the arguments of these functions are supposed to be.
Also, in the issue you mention we might get a 7% performance uplift, is that about what you experienced?
src/omp/exec_dist.f90
Outdated
real(dp), dimension(:, :, :), intent(out) :: rhs, du, dud, d2u | ||
real(dp), dimension(:, :, :), intent(inout) :: rhs_du, dud, d2u |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understand dud
and d2u
are still intent(out)
only.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They're actually all intent(out)
as before, fixed.
@@ -61,7 +61,7 @@ subroutine exec_dist_tds_compact( & | |||
end subroutine exec_dist_tds_compact | |||
|
|||
subroutine exec_dist_transeq_compact( & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment explaining that rhs_du is where the results are also stored.
@@ -233,30 +233,29 @@ subroutine transeq_halo_exchange(self, u, v, w, dir) | |||
|
|||
end subroutine transeq_halo_exchange | |||
|
|||
subroutine transeq_dist_component(self, rhs, u, conv, & | |||
subroutine transeq_dist_component(self, rhs_du, u, conv, & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same, add a comment explaining that rhs_du is where the output is stored.
@@ -56,24 +56,24 @@ subroutine exec_dist_tds_compact( & | |||
end subroutine exec_dist_tds_compact | |||
|
|||
subroutine exec_dist_transeq_3fused( & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comment explaining that r_du is where the output is
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! I would only suggest some updates to comments to match the code changes
Thanks guys, some of the About the speedup, I haven't tested this in isolation, but tested with implementing #66 as well, and the combined speedup from the two is as expected. |
Anything else here @Nanoseb @pbartholomew08? |
Nope, seems ready to go to me |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Closes #40.