Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modern poisson2d solver & rename optimized.f90 #23

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

rouson
Copy link

@rouson rouson commented Jun 29, 2021

I hope the benefit of this pull request extends beyond this specific refactoring and serves as a guide for other codes contributed to this project. The aim is to demonstrate what modern Fortran can do to improve the expressiveness of code while maintaining and potentially improving performance.

In my local tests compiling with gfortran -Ofast, what was previously named optimized.f90 only performs about 5% faster than modern.f90 on average. In 2021, that hardly justifies restricting oneself to a small subset of Fortran 95 that is very nearly Fortran 77 + free format even in the name of performance. The modern version is more compact in some places, adds safety in some places (e.g., tighter variable scoping means fewer opportunities for mistakenly changing variables), and adds additional optimization opportunities now that compilers such as NVIDIA's compiler can offload do concurrent to a GPU. This means there's even more that can be explored with this new code than I've been able to investigate so far.

Refactorings

  1. Use block for tighter variable scoping (reduces some clutter).
  2. Use array statements with vector subscripts to write compact initializations.
  3. Use do concurrent to express fine-grained parallelism and expose opportunities for optimizations such as vectorization, multithreading, or GPU offloading.
  4. Replace some [magic numbers], e.g., initial delta.
  5. Eliminate some unnecessary computation by initializing and updating only values that actually need initializing and updating.
  6. Use more compact conditional logic.
  7. Use a real kind specification based on precision requirements rather than the ambiguously defined double-precision representation of any one given compiler on any one given platform.
  8. Eliminate unnecessary precision specifiers (_dp) when an expression is simply a constant that is exactly representable in any reasonable floating-point representation.
  9. Separate the procedure interface into a module and the procedure definition into a submodule.**
  10. Use a more descriptive variable names in a few places.
  11. Use associate to ensure immutable state for what is essentially a runtime constant.
  12. Add comments! (including some in FORD style)

** In larger projects, item 9 makes code much more user-friendly by isolation the application programmer interface (API) and reduces compile times by eliminating unnecessary compilation cascades.
[magic numbers]: https://en.wikipedia.org/wiki/Magic_number_(programming)#Unnamed_numerical_constants

Refactorings
------------
1. Use BLOCK for tighter variable scoping (reduces some clutter).
2. Use array statements with vector subscripts to write compact
   initializations.
3. Use DO CONCURRENT to express fine-grained parallelism and
   expose opportunities for optimizations such as vectorization,
   multithreading, or  GPU offloading.
4. Replace some magic values, e.g., initial delta.
5. Eliminate some unnecessary computation by initializing and
   updating only values that actually need initializing and
   updating.
6. Use more compact conditional logic.
7. Use a real kind specificaiton based on precision requirements
   rather than the ambiguously defined double-precision
   representation of any one given compiler on any one given
   platform.
8. Eliminate unnecessary precision specifiers ("_dp") when an
   expression is simply a constant that is exactly representable
   in any reasonable floating-point representation.
9. Separate the procedure interface into a module and the
   procedure definition into a submodule.**
10. Use a more descriptive variable names in a few places.
11. Add comments! (includeing some in FORD style)

** In larger projects, item 9 makes code much more user-friendly
by isolation the application programmer interface (API) and
reduces compile times by eliminating unnecessary compilation
cascades.
This code is old-fashioned, verbose, and most importantly, less
clealry expressive, particularly in terms of exposing optimization
opportunties, which makes the file name a misnomer.  By contrast,
commit a7ccf54 introduces a modern.f90 refactored to expose
concurrency more explicitly through the use of `do concurrent`
and array statements, both of which express sets of operations
that can be safely executed in any order, opening up opportunities
for vectorization, multithreading, and GPU offloading.

In timings on a 2.3 GHz 8-Core Intel Core i9, modern.f90 executes
in roughly the same time -- possibly 5-10% longer based on informal
ad hoc testing -- but is considerably more compact in many places,
is arguably more clear and expressive.
poisson2d/modern.f90 Outdated Show resolved Hide resolved
poisson2d/modern.f90 Outdated Show resolved Hide resolved
poisson2d/modern.f90 Outdated Show resolved Hide resolved
@certik
Copy link
Member

certik commented Jun 29, 2021

Thank you for this! Looks nice.

Eliminate unnecessary precision specifiers (_dp) when an expression is simply a constant that is exactly representable in any reasonable floating-point representation.

I see --- I think you are taking advantage of this here, so you wouldn't need my suggested changes above? But is that really true? I know that 1.0 is exactly representable, but I thought the compilers will still only treat it as single precision when assigning and not set the higher bits. But maybe not. But 0.01 I think is not exactly representable.

rouson and others added 2 commits June 28, 2021 21:53
Either is fine. I'm a minimalist so I use the first one because the `_dp` isn't required when the real literal is exactly representable in the chosen floating-post representation, which presumably is the case for `1.` in any floating-point scheme.

Co-authored-by: Ondřej Čertík <[email protected]>
I concur that `_dp` is required in this one.

Co-authored-by: Ondřej Čertík <[email protected]>
@certik
Copy link
Member

certik commented Jun 29, 2021

I see. I am a minimalist too, that's why I just use rho = 1.

Copy link
Member

@certik certik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is +1 from me. @milancurcic would you mind also please reviewing this?

@rouson
Copy link
Author

rouson commented Jun 29, 2021

@certik good points. I think I just accepted your changes, but I'm not sure. If so, feel free to merge this or let me know if you want me to merge it. If I didn't accept your changes correctly, let me know and I'll see if I can fix it.

Co-authored-by: Ondřej Čertík <[email protected]>
poisson2d/modern.f90 Outdated Show resolved Hide resolved
interface
pure real(dp) module function rho(x,y)
!! Poisson equation inhomogeneous term.
implicit none
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question for my knowledge: why is there an implicit none here? Is the implicit none in the module not covering it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is a common mistake that I also learned the hard way --- implicit none in a module propagates and applies to everything except interface, where you have to repeat it.... :(

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @certik for your answer. I will have to check all my codes ;)

@arunningcroc
Copy link
Member

I don't know how to suggest changes to files that weren't changed in the pull request, so I'll just write here. Since you renamed one of the Fortran files, can you also rename the corresponding row in the table in README.md?

Also, you could add a row to the table, with the following numbers:

M=100: 0.43 M=200: 8.24 M=300: 38.39

which are the numbers I got on my machine with the same settings as the other files (-Ofast, WSL2, same CPU, etc.)

@certik
Copy link
Member

certik commented Jun 29, 2021

Ah, I didn't realize the file got renamed from optimized to archaic. @rouson, do you have a better name? The term archaic suggests (to me) that that style should not be used anymore, but I actually like that style, that's how I write most of my codes. I don't know if you like any of the terms such as simple, traditional, plain, direct, ... I don't want to pass judgement yet, I am hoping to first get all the examples up with lots of versions and later I think it will be easier to discuss the different styles and to agree (or agree to disagree) on the recommended "idiomatic style(s)". We can also just number the different Fortran versions/styles.

(To rephrase what I am trying to achieve: I absolutely want to have such a discussion about how modern Fortran codes should be written and arrive at more opinionated community agreed upon style(s). But in order to have a good discussion about this, I am hoping to have a few dozen examples, each with different styles. And only then have that discussion. I am afraid the discussion would not be productive if we do it now.)

@rouson
Copy link
Author

rouson commented Jun 29, 2021

@certik I like your suggestion to rename archaic.f90 to traditional.f90. I'll push a new commit that makes that change. On a related note, you can think of a better name for modern.f90, please let me know. There can of course be many different modern approaches. I chose modern.f90 for two reasons: (1) it uses features from more recent standards and (2) I'm hoping more Fortran programming moves in this direction for several reasons.

poisson2d/modern.f90 Outdated Show resolved Hide resolved
Comment on lines 69 to 70
phi_prime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4._dp &
+ (dx/2._dp)*(dy/2._dp)/epsilon0*rho_sampled(i,j)
Copy link
Member

@milancurcic milancurcic Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The _dp suffix for a few literal constants here is unnecessary (let the compiler coerce to correct precision). And, if you wanted to switch to single precision, you'd only need to change it in the declaration of phi and other arrays, and not fiddle with literal constants' precisions here.

Suggested change
phi_prime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4._dp &
+ (dx/2._dp)*(dy/2._dp)/epsilon0*rho_sampled(i,j)
phi_prime(i,j) = (phi(i+1,j) + phi(i-1,j) + phi(i,j+1) + phi(i,j-1))/4 &
+ (dx/2)*(dy/2)/epsilon0*rho_sampled(i,j)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know the rules on expression evaluation well. I was guessing that the _dp is required to avoid getting a default-precision result for the division. I'm fine with changing it. but GitHub is marking this review comment as outdated so I think I'll have to make the change locally and push it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am a fan of concise code, so I agree with Milan. :)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the above cases where 2 is an integer it doesn't matter. However when you are dealing with 0.213535 then things gets interesting since it first gets interpreted as a real(single), then gets cast to a real(double) which is not what you want. :)

But here 2 is sufficient int -> double is perfect representation.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I recommend to always write floating point numbers as 0.5_dp, whether or not they can be represented exactly. That way one cannot make a mistake and it's a simple rule to follow. Floating point divided by an integer is guaranteed to be a floating point, so there I prefer to just use an integer 2 instead of 2.0_dp.

poisson2d/modern.f90 Outdated Show resolved Hide resolved
@milancurcic
Copy link
Member

If you like the whole-array form over the do concurrent one, you could rename the sources as "archaic" -> "imperative" and "modern" -> "array-oriented". I think the style-based naming communicates more than the era-based naming. Just an idea.

@milancurcic
Copy link
Member

I don't know why yet, but modern.f90 in this PR, using gfortran-9.3.0 converges only for -O[01], but does not converge for -O{2,3,fast}. I think we need to fix it before merging.

Copy link
Member

@awvwgk awvwgk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately, the code fails to compile with two of four compilers I tested (gfortran 11.1 and nvfortran 21.5 are okay, while nagfor 7.0 and ifort 2021.2 fail). At least the build with ifort seems mendable by putting the submodule in a separate file.

@arunningcroc
Copy link
Member

arunningcroc commented Jun 29, 2021

I don't know why yet, but modern.f90 in this PR, using gfortran-9.3.0 converges only for -O[01], but does not converge for -O{2,3,fast}. I think we need to fix it before merging.

I seem to be able to converge it for M=100, 200, 300 with -Ofast, or at least it outputs a number of iterations in line with the C code (and Cython/python where available) using gfortran-9.3.0

@milancurcic
Copy link
Member

I apologize, I gave incomplete information. I don't converge for M=50 (I made it small for shorter tests). Do the method and the problem require a minimum value of M for the method to converge?

@arunningcroc
Copy link
Member

I apologize, I gave incomplete information. I don't converge for M=50 (I made it small for shorter tests). Do the method and the problem require a minimum value of M for the method to converge?

I get the same problem for M=50. I'm not sure if there's any general proof for what kind of grid is too small, but the other Fortran codes seem to work. So clearly the modern code is somehow generating a different code. The basic logic of it seems to be the same, though, and works for bigger M, so I wonder if the compiler isn't doing some wonky optimization because of the modern features that is simply buggy?

rouson and others added 2 commits June 29, 2021 10:40
Co-authored-by: Milan Curcic <[email protected]>
Co-authored-by: Ondřej Čertík <[email protected]>
Co-authored-by: Milan Curcic <[email protected]>
@rouson
Copy link
Author

rouson commented Jun 29, 2021

When compiled with gfortran (installed via Homebrew on macOS Big Sur) 11.1.0 with no optimization , the code converges for me for all the M values that make sense: M>2. When compiled with gfortran -Ofast, the code converges for M>90.

I also tried the NAG compiler version 7.0, Build 7044, and found a compiler bug that I just reported to NAG. I'm pasting the bug report below.

I'm not sure what to do from here. Before refactoring code, it's always safest to have autoamated tests in place. I hope that any code contributed to this repository will include tests, which will probably need to be internal to the code, given the difficulty of setting up a multi-language test harness that covers so many languages.

Bug Report

% cat interface-not-found.f90 

module f_interface
contains
  pure real function f()
    f = 1.
  end function
end module 

  use f_interface
  implicit none

  integer i
  real f_sampled(1)
  
  associate(dy => 0.) 
    do concurrent(i=1:1)
      f_sampled(i) = f()
    end do
  end associate
end

% nagfor interface-not-found.f90 
NAG Fortran Compiler Release 7.0(Yurakucho) Build 7044
Error: interface-not-found.f90, line 19: Implicit type for F
Questionable: interface-not-found.f90, line 19: Variable F_SAMPLED set but never referenced
[NAG Fortran Compiler pass 1 error termination, 1 error, 1 warning]

@arunningcroc
Copy link
Member

arunningcroc commented Jun 29, 2021

I've managed to isolate the problem. In a puzzling turn of events, it appears gfortran doesn't like the rho function and the submodule for some reason. Removing both the rewritten if-else as well as the submodule seems to fix this problem (but neither one on their own fixes it). That is, just use the old rho function in a module instead of the submodule, and for some reason the code works again.

Edit: but also, simply putting rho in a submodule and then calling it from a test program still doesn't cause any issues, and the code executes correctly. The loop seems to be required for the convergence issue to arise.

@rouson
Copy link
Author

rouson commented Jun 29, 2021

@arunningcroc could you either insert code edits on GitHub or submit a new pull request with working code? Most importantly, if you were the original author (or simply if you have thoughts on how to do so), could you include some sort of results-verification code? Possibly the best thing might be for each of the Fortran versions to be refactored into being functions instead of main programs. Assuming someone has verified that some version produces the desire solution, that version could be the reference version of the code. Then there can be just one Fortran main program that calls each version with cpu_time calls in-between for timings. The result produced by each refactored version can then be checked against the result of the reference version.

@certik if you don't mind introducing object-oriented programming, @everythingfunctional and I have developed a design pattern that captures everything I just wrote and specifies requirements in the form of deferred bindings on an abstract type we call the Oracle pattern, inspired by the common term "test oracle." It's actually a special case of a Template Method pattern. In addition to saving time explaining the same concepts in English, requiring contributors extend an oracle abstract derived type enforces the project requirements formally and operational and ensures uniformity across the project.

@certik
Copy link
Member

certik commented Jun 29, 2021

@certik if you don't mind introducing object-oriented programming,

Absolutely. Please submit another PR with the OO code!

@arunningcroc
Copy link
Member

@rouson
@arunningcroc could you either insert code edits on GitHub or submit a new pull request with working code? Most importantly, if you were the original author (or simply if you have thoughts on how to do so), could you include some sort of results-verification code? Possibly the best thing might be for each of the Fortran versions to be refactored into being functions instead of main programs. Assuming someone has verified that some version produces the desire solution, that version could be the reference version of the code. Then there can be just one Fortran main program that calls each version with cpu_time calls in-between for timings. The result produced by each refactored version can then be checked against the result of the reference version.

Yes, I'll try to get this done this afternoon. Also, I think I should probably write some sort of problem description pdf, explaining the theory and the numerical method in detail, so that anyone may easily implement it in their language of choice.

For the test cases, I know that the Python code should produce correct results, seeing as it is lifted almost wholesale from a computational physics textbook. However, there are also known analytical solutions to 2D Poisson and Laplace equations, and these might be valuable for testing as well. Presumably finding the correct solution to a few such cases will be a good guarantee that the code works as expected.

If I separate the Fortran code in to modules/functions and call them all from a single main program, should I do the same for the Python, C and Cython codes as well, so that each language is called from some file like "main.f90", "main.py", etc? Then anyone who wants to see all the timings of a particular language could always just execute the main file for that language, and one could imagine adding some bash/powershell script to run all of them consequentially.

@rouson
Copy link
Author

rouson commented Jun 30, 2021

@arunningcroc that all sounds great. The C code could be called by the Fortran main program via C bindings. I've heard that other languages also support C bindings so it might also be possible to call other languages from one Fortran main program.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants