Complete rewrite of the legacy fortran solver #38

loiseaujc · 2024-05-21T08:42:11Z

Hej,

I've stumbled on this repo while looking for some benchmarks. I really like the idea. Given that stdlib is reaching some level of maturity, I believe that it is a good time to resurrect the benchmarks. Let me contribute my two cents for the Poisson benchmark.

Updates

legacy.f90 : Rewrote the reference Jacobi solver using only legacy fortran constructs. Note that the evaluation of the residual is now done on the fly instead of maxval(abs(phi-phi_prime)) after the i, j loops which actually performs allocation and loops once more over all the entries of phi-phi_prime. I've also fixed the definition of dx so that, as m changes, we actually solve the discretized form of the same continuous problem (i.e. the Poisson equation on a unit square).
naive.f90/optimized.f90 : I took the liberty to remove both of these implementations. Thanks to the on-the-fly computation of the residual, the solver in legacy.f90 is drastically faster (even when I kept the original definition of dx to have a fair comparison). To give you an idea, if I keep dx = 0.01_dp and m = 300, here are the timing on my laptop:
- legacy.f90 : Converged in 426 580 iterations, wall-clock time is 35 seconds.
- naive.f90 : Converged in 426 326 iterations, wall-clock time is 85 seconds.
- optimized.f90 : Converged in 426 580 iterations, wall-clock time is 44 seconds.
jacobi/ : I moved all the solvers in the jacobi folder if, in the not-so-distant future, we want to add other solvers for this benchmark (e.g. Gauss-Seidel, conjugate gradient, etc).

If you like what you see, I can take some time over the next few days to port the same changes to the c and python implementations. I do have a few questions though:

How extensive should we be for each language? A relatively naïve (yet efficient) implementation + an optimized one using only standard constructs + an optimized one using existing well-established libraries ? For python, this could include for instance a pure python implementation with loops (which should really be avoided), a vectorized implementation using numpy and possibly one implementation using numba and/or cython?
Should we also add a Julia version of the solver (both the relatively naïve one and another version using one of the Julia package providing the Jacobi solver)?

loiseaujc · 2024-05-21T10:21:24Z

I didn't take the time before submitting this PR to go extensively through the discussions in Issue. Just found out about #10 and the fascinating discussion therein. I don't have enough perspective yet to fully grasp all the points that have been raised (in particular those by @rouson) although I agree that both legacy and "modern" versions of the codes should be provided. By legacy, I mean pretty much what I wrote in this PR, i.e. a piece of code that could be written by someone (e.g. an undergrad or grad student) with minimal programming experience who just started to learn about fortran, while "modern" would be the same algorithm leveraging more recent constructs (e.g. array syntax, do concurrent etc).

For this particular problem however, I'm not sure how to adapt the on-the-fly computation of the residual (which really is one of the bottleneck) to the array syntax version. In such an implementation, the nested do loop to evaluate the Jacobi iteration are replaced by the one-liner

phi_prime(2:m-1,2:m-1) = (phi(3:,2:m-1) + phi(:m-2,2:m-1) + phi(2:m-1,3:) + phi(2:m-1,1:m-2))/4 + rhs(2:m-1, 2:m-1)

Evaluation of the residual still however requires

residual = maxval(abs(phi - phi_prime))

which (if my understanding is correct) would allocate a temporary array and traverse it once more. I'm not sure how to make this more efficient (in terms of data spatial and temporal locality) other than evaluating this residual only every n iterations. Likewise, if the nested do loops are replaced with

do concurrent (i=2:m-1, j=2:m-1)
    phi_prime(i, j) = (phi(i+1, j) + phi(i-1, j) + phi(i, j+1) + phi(i, j-1))/4 + rhs(i, j)
enddo

my understanding is that the on-the-fly evaluation of the residual with residual = max(residual, abs(phi(i, j) - phi_prime(i, j)) cannot be included in the loop either as it would break the underlying assumption of do concurrent that the iterations are independent. I believe that this goes beyond this simple PR with a legacy implementation but I'm curious if any of you has ideas on how to combine the best of these worlds (i.e. array syntax or do concurrent with on-the-fly evaluation of the residual).

PS: The only option I can think off for do concurrent is to pre-allocate a 2D array delta, evaluate delta(i, j) = abs(phi(i, j) - phi_prime(i, j)) in the do concurrent loop and then call residual = maxval(delta) after the loop. If I run this code, approximately half of the time is spent in maxval which is not acceptable.

PPS: If I use do concurrent with the on-the-fly computation of the residual, the code compiles correctly and runs just as fast as the legacy version. Yet, my understanding is that it would break the independence of the iterations and the compiler probably falls back to something similar to the nested do loops version. Is my understanding correct ? If so, would that be considered "valid" code in fortran? By valid I mean using a do concurrent construct which would hint at iteration independence although they ain't really independent which might lead the user to think that the code is doing something it actually ain't.

loiseaujc added 10 commits May 21, 2024 08:18

Added legacy Jacobi solver in fortran.

689d7c4

Run linter on poisson.py

2847d63

Run linter on setup.py

d30462d

Removed unefficient fortran implementations.

96665e6

Updated rhs module.

dd7eac4

Remove rhs module.

a57c8dc

Moved initialization of rhs outside the timed section.

9e5ff2c

Added documentation.

37c198c

Created Jacobi folder if other solvers are implemented later on.

bbc2e64

Removed from the definition of rho.

dcd69d4

loiseaujc changed the title ~~Complete rewrite of the fortran solver~~ Complete rewrite of the legacy fortran solver May 21, 2024

Removed trailing whitespace.

0efb357

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Complete rewrite of the legacy fortran solver #38

Complete rewrite of the legacy fortran solver #38

loiseaujc commented May 21, 2024

loiseaujc commented May 21, 2024 •

edited

Loading

Complete rewrite of the legacy fortran solver #38

Are you sure you want to change the base?

Complete rewrite of the legacy fortran solver #38

Conversation

loiseaujc commented May 21, 2024

Updates

loiseaujc commented May 21, 2024 • edited Loading

loiseaujc commented May 21, 2024 •

edited

Loading