Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Complete rewrite of the legacy fortran solver #38

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

loiseaujc
Copy link

Hej,

I've stumbled on this repo while looking for some benchmarks. I really like the idea. Given that stdlib is reaching some level of maturity, I believe that it is a good time to resurrect the benchmarks. Let me contribute my two cents for the Poisson benchmark.

Updates

  • legacy.f90 : Rewrote the reference Jacobi solver using only legacy fortran constructs. Note that the evaluation of the residual is now done on the fly instead of maxval(abs(phi-phi_prime)) after the i, j loops which actually performs allocation and loops once more over all the entries of phi-phi_prime. I've also fixed the definition of dx so that, as m changes, we actually solve the discretized form of the same continuous problem (i.e. the Poisson equation on a unit square).
  • naive.f90/optimized.f90 : I took the liberty to remove both of these implementations. Thanks to the on-the-fly computation of the residual, the solver in legacy.f90 is drastically faster (even when I kept the original definition of dx to have a fair comparison). To give you an idea, if I keep dx = 0.01_dp and m = 300, here are the timing on my laptop:
    • legacy.f90 : Converged in 426 580 iterations, wall-clock time is 35 seconds.
    • naive.f90 : Converged in 426 326 iterations, wall-clock time is 85 seconds.
    • optimized.f90 : Converged in 426 580 iterations, wall-clock time is 44 seconds.
  • jacobi/ : I moved all the solvers in the jacobi folder if, in the not-so-distant future, we want to add other solvers for this benchmark (e.g. Gauss-Seidel, conjugate gradient, etc).

If you like what you see, I can take some time over the next few days to port the same changes to the c and python implementations. I do have a few questions though:

  • How extensive should we be for each language? A relatively naïve (yet efficient) implementation + an optimized one using only standard constructs + an optimized one using existing well-established libraries ? For python, this could include for instance a pure python implementation with loops (which should really be avoided), a vectorized implementation using numpy and possibly one implementation using numba and/or cython?
  • Should we also add a Julia version of the solver (both the relatively naïve one and another version using one of the Julia package providing the Jacobi solver)?

@loiseaujc
Copy link
Author

loiseaujc commented May 21, 2024

I didn't take the time before submitting this PR to go extensively through the discussions in Issue. Just found out about #10 and the fascinating discussion therein. I don't have enough perspective yet to fully grasp all the points that have been raised (in particular those by @rouson) although I agree that both legacy and "modern" versions of the codes should be provided. By legacy, I mean pretty much what I wrote in this PR, i.e. a piece of code that could be written by someone (e.g. an undergrad or grad student) with minimal programming experience who just started to learn about fortran, while "modern" would be the same algorithm leveraging more recent constructs (e.g. array syntax, do concurrent etc).

For this particular problem however, I'm not sure how to adapt the on-the-fly computation of the residual (which really is one of the bottleneck) to the array syntax version. In such an implementation, the nested do loop to evaluate the Jacobi iteration are replaced by the one-liner

phi_prime(2:m-1,2:m-1) = (phi(3:,2:m-1) + phi(:m-2,2:m-1) + phi(2:m-1,3:) + phi(2:m-1,1:m-2))/4 + rhs(2:m-1, 2:m-1)

Evaluation of the residual still however requires

residual = maxval(abs(phi - phi_prime))

which (if my understanding is correct) would allocate a temporary array and traverse it once more. I'm not sure how to make this more efficient (in terms of data spatial and temporal locality) other than evaluating this residual only every n iterations. Likewise, if the nested do loops are replaced with

do concurrent (i=2:m-1, j=2:m-1)
    phi_prime(i, j) = (phi(i+1, j) + phi(i-1, j) + phi(i, j+1) + phi(i, j-1))/4 + rhs(i, j)
enddo

my understanding is that the on-the-fly evaluation of the residual with residual = max(residual, abs(phi(i, j) - phi_prime(i, j)) cannot be included in the loop either as it would break the underlying assumption of do concurrent that the iterations are independent. I believe that this goes beyond this simple PR with a legacy implementation but I'm curious if any of you has ideas on how to combine the best of these worlds (i.e. array syntax or do concurrent with on-the-fly evaluation of the residual).

PS: The only option I can think off for do concurrent is to pre-allocate a 2D array delta, evaluate delta(i, j) = abs(phi(i, j) - phi_prime(i, j)) in the do concurrent loop and then call residual = maxval(delta) after the loop. If I run this code, approximately half of the time is spent in maxval which is not acceptable.

PPS: If I use do concurrent with the on-the-fly computation of the residual, the code compiles correctly and runs just as fast as the legacy version. Yet, my understanding is that it would break the independence of the iterations and the compiler probably falls back to something similar to the nested do loops version. Is my understanding correct ? If so, would that be considered "valid" code in fortran? By valid I mean using a do concurrent construct which would hint at iteration independence although they ain't really independent which might lead the user to think that the code is doing something it actually ain't.

@loiseaujc loiseaujc changed the title Complete rewrite of the fortran solver Complete rewrite of the legacy fortran solver May 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant