Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce memory use in transeq for both backends #130

Merged
merged 3 commits into from
Dec 5, 2024

Conversation

semi-h
Copy link
Member

@semi-h semi-h commented Nov 22, 2024

Closes #40.

Copy link
Collaborator

@Nanoseb Nanoseb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking good to me beside a minor change and addition of some comments. Overall I think it would be good to document what all the arguments of these functions are supposed to be.

Also, in the issue you mention we might get a 7% performance uplift, is that about what you experienced?

real(dp), dimension(:, :, :), intent(out) :: rhs, du, dud, d2u
real(dp), dimension(:, :, :), intent(inout) :: rhs_du, dud, d2u
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understand dud and d2u are still intent(out) only.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They're actually all intent(out) as before, fixed.

@@ -61,7 +61,7 @@ subroutine exec_dist_tds_compact( &
end subroutine exec_dist_tds_compact

subroutine exec_dist_transeq_compact( &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a comment explaining that rhs_du is where the results are also stored.

@@ -233,30 +233,29 @@ subroutine transeq_halo_exchange(self, u, v, w, dir)

end subroutine transeq_halo_exchange

subroutine transeq_dist_component(self, rhs, u, conv, &
subroutine transeq_dist_component(self, rhs_du, u, conv, &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same, add a comment explaining that rhs_du is where the output is stored.

@@ -56,24 +56,24 @@ subroutine exec_dist_tds_compact( &
end subroutine exec_dist_tds_compact

subroutine exec_dist_transeq_3fused( &
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comment explaining that r_du is where the output is

Copy link
Member

@pbartholomew08 pbartholomew08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! I would only suggest some updates to comments to match the code changes

src/cuda/exec_dist.f90 Show resolved Hide resolved
src/omp/backend.f90 Outdated Show resolved Hide resolved
@semi-h
Copy link
Member Author

semi-h commented Nov 27, 2024

Thanks guys, some of the intents were indeed misleading, fixed them all and added all the recommended comments.

About the speedup, I haven't tested this in isolation, but tested with implementing #66 as well, and the combined speedup from the two is as expected.

@semi-h
Copy link
Member Author

semi-h commented Dec 5, 2024

Anything else here @Nanoseb @pbartholomew08?

@Nanoseb
Copy link
Collaborator

Nanoseb commented Dec 5, 2024

Nope, seems ready to go to me

Copy link
Member

@pbartholomew08 pbartholomew08 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@semi-h semi-h merged commit 3a37d48 into xcompact3d:main Dec 5, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Reuse result field in transeq_component as a temporary field
3 participants