From 0f5ea94763c05e4d4b7689818bacdd76f2614933 Mon Sep 17 00:00:00 2001 From: Morten Hjorth-Jensen Date: Fri, 22 Mar 2024 13:01:59 +0100 Subject: [PATCH] Update week10.ipynb --- doc/pub/week10/ipynb/week10.ipynb | 1256 ++++++++--------------------- 1 file changed, 328 insertions(+), 928 deletions(-) diff --git a/doc/pub/week10/ipynb/week10.ipynb b/doc/pub/week10/ipynb/week10.ipynb index f9ca718d..414c550c 100644 --- a/doc/pub/week10/ipynb/week10.ipynb +++ b/doc/pub/week10/ipynb/week10.ipynb @@ -3,9 +3,7 @@ { "cell_type": "markdown", "id": "4e09aa18", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "\n", @@ -15,9 +13,7 @@ { "cell_type": "markdown", "id": "412f3206", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "# March 18-22: Optimization and Parallelization with MPI and OpenMP\n", "**Morten Hjorth-Jensen Email morten.hjorth-jensen@fys.uio.no**, Department of Physics and Center fo Computing in Science Education, University of Oslo, Oslo, Norway and Department of Physics and Astronomy and Facility for Rare Isotope Beams, Michigan State University, East Lansing, Michigan, USA\n", @@ -28,9 +24,7 @@ { "cell_type": "markdown", "id": "93bb21e5", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Overview of the week March 18-22, 2024\n", "**Topics.**\n", @@ -52,9 +46,7 @@ { "cell_type": "markdown", "id": "805dcdc9", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Alternatives for project 2\n", "1. Fermion VMC, continuation of project 1\n", @@ -73,9 +65,7 @@ { "cell_type": "markdown", "id": "436531d4", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Content\n", "* Simple compiler options \n", @@ -98,9 +88,7 @@ { "cell_type": "markdown", "id": "30eeb8d8", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Optimization and profiling\n", "\n", @@ -111,9 +99,7 @@ { "cell_type": "markdown", "id": "9ff09f9f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " c++ -c mycode.cpp\n", " c++ -o mycode.exe mycode.o\n" @@ -122,9 +108,7 @@ { "cell_type": "markdown", "id": "84db6a69", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "For Fortran replace with for example **gfortran** or **ifort**.\n", "This is what we call a flat compiler option and should be used when we develop the code.\n", @@ -138,9 +122,7 @@ { "cell_type": "markdown", "id": "d4b00414", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " man c++\n" ] @@ -148,9 +130,7 @@ { "cell_type": "markdown", "id": "5555e17e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## More on optimization\n", "We have additional compiler options for optimization. These may include procedure inlining where \n", @@ -162,9 +142,7 @@ { "cell_type": "markdown", "id": "d0128d64", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " c++ -O3 -c mycode.cpp\n", " c++ -O3 -o mycode.exe mycode.o\n" @@ -173,9 +151,7 @@ { "cell_type": "markdown", "id": "1b8ff090", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "This (other options are -O2 or -Ofast) is the recommended option." ] @@ -183,9 +159,7 @@ { "cell_type": "markdown", "id": "e648a452", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Optimization and profiling\n", "It is also useful to profile your program under the development stage.\n", @@ -195,9 +169,7 @@ { "cell_type": "markdown", "id": "4be15364", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " c++ -pg -O3 -c mycode.cpp\n", " c++ -pg -O3 -o mycode.exe mycode.o\n" @@ -206,9 +178,7 @@ { "cell_type": "markdown", "id": "00ca8da4", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "After you have run the code you can obtain the profiling information via" ] @@ -216,9 +186,7 @@ { "cell_type": "markdown", "id": "f6aa648c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " gprof mycode.exe > ProfileOutput\n" ] @@ -226,9 +194,7 @@ { "cell_type": "markdown", "id": "213f9f7b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "When you have profiled properly your code, you must take out this option as it \n", "slows down performance.\n", @@ -238,9 +204,7 @@ { "cell_type": "markdown", "id": "0170fcbc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Optimization and debugging\n", "Adding debugging options is a very useful alternative under the development stage of a program.\n", @@ -250,9 +214,7 @@ { "cell_type": "markdown", "id": "7a453247", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " c++ -g -O0 -c mycode.cpp\n", " c++ -g -O0 -o mycode.exe mycode.o\n" @@ -261,9 +223,7 @@ { "cell_type": "markdown", "id": "1398bec5", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "This option generates debugging information allowing you to trace for example if an array is properly allocated. Some compilers work best with the no optimization option **-O0**.\n", "\n", @@ -276,9 +236,7 @@ { "cell_type": "markdown", "id": "f5b174ae", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Other hints\n", "In general, irrespective of compiler options, it is useful to\n", @@ -292,9 +250,7 @@ { "cell_type": "markdown", "id": "5596fa3a", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " k = n-1;\n", " for (i = 0; i < n; i++){\n", @@ -306,9 +262,7 @@ { "cell_type": "markdown", "id": "12909f4c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "A better code is" ] @@ -316,9 +270,7 @@ { "cell_type": "markdown", "id": "603b0ee0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " temp = c*d;\n", " for (i = 0; i < n; i++){\n", @@ -330,9 +282,7 @@ { "cell_type": "markdown", "id": "28be21af", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "Here we avoid a repeated multiplication inside a loop. \n", "Most compilers, depending on compiler flags, identify and optimize such bottlenecks on their own, without requiring any particular action by the programmer. However, it is always useful to single out and avoid code examples like the first one discussed here." @@ -341,9 +291,7 @@ { "cell_type": "markdown", "id": "b0dbe1ab", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Vectorization and the basic idea behind parallel computing\n", "Present CPUs are highly parallel processors with varying levels of parallelism. The typical situation can be described via the following three statements.\n", @@ -359,9 +307,7 @@ { "cell_type": "markdown", "id": "d38114e2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## A rough classification of hardware models\n", "\n", @@ -375,9 +321,7 @@ { "cell_type": "markdown", "id": "35eef157", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Shared memory and distributed memory\n", "One way of categorizing modern parallel computers is to look at the memory configuration.\n", @@ -391,9 +335,7 @@ { "cell_type": "markdown", "id": "e004bf61", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Different parallel programming paradigms\n", "\n", @@ -405,9 +347,7 @@ { "cell_type": "markdown", "id": "9ed41b29", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Different parallel programming paradigms\n", "\n", @@ -419,9 +359,7 @@ { "cell_type": "markdown", "id": "426a62a2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## What is vectorization?\n", "Vectorization is a special\n", @@ -438,9 +376,7 @@ { "cell_type": "markdown", "id": "dd666c54", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (i = 0; i < n; i++){\n", " a[i] = b[i] + c[i];\n", @@ -450,9 +386,7 @@ { "cell_type": "markdown", "id": "09970af6", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "If the code is not vectorized, the compiler will simply start with the first element and \n", "then perform subsequent additions operating on one address in memory at the time." @@ -461,9 +395,7 @@ { "cell_type": "markdown", "id": "88c052c8", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Number of elements that can acted upon\n", "A SIMD instruction can operate on multiple data elements in one single instruction.\n", @@ -490,9 +422,7 @@ { "cell_type": "markdown", "id": "12bc6dec", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Number of elements that can acted upon, examples\n", "We start with the simple scalar operations given by" @@ -501,9 +431,7 @@ { "cell_type": "markdown", "id": "e0b0aed2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (i = 0; i < n; i++){\n", " a[i] = b[i] + c[i];\n", @@ -513,9 +441,7 @@ { "cell_type": "markdown", "id": "0253b89e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "If the code is not vectorized and we have a 128-bit register to store a 32 bits floating point number,\n", "it means that we have $3\\times 32$ bits that are not used. \n", @@ -526,9 +452,7 @@ { "cell_type": "markdown", "id": "d5ab01bb", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Operation counts for scalar operation\n", "The code" @@ -537,9 +461,7 @@ { "cell_type": "markdown", "id": "f1c4eb73", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (i = 0; i < n; i++){\n", " a[i] = b[i] + c[i];\n", @@ -549,9 +471,7 @@ { "cell_type": "markdown", "id": "4a244c6e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "has for $n$ repeats\n", "1. one load for $c[i]$ in address 1\n", @@ -566,9 +486,7 @@ { "cell_type": "markdown", "id": "929879a7", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Number of elements that can acted upon, examples\n", "If we vectorize the code, we can perform, with a 128-bit register four simultaneous operations, that is\n", @@ -578,9 +496,7 @@ { "cell_type": "markdown", "id": "8d020287", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (i = 0; i < n; i+=4){\n", " a[i] = b[i] + c[i];\n", @@ -593,9 +509,7 @@ { "cell_type": "markdown", "id": "5a1d7c82", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "Four additions are now done in a single step." ] @@ -603,9 +517,7 @@ { "cell_type": "markdown", "id": "79e9991d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Number of operations when vectorized\n", "For $n/4$ repeats assuming floats or integers\n", @@ -621,9 +533,7 @@ { "cell_type": "markdown", "id": "c71b0b43", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [A simple test case with and without vectorization](https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program7.cpp)\n", "We implement these operations in a simple c++ program that computes at the end the norm of a vector." @@ -632,9 +542,7 @@ { "cell_type": "markdown", "id": "99f7abae", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " #include \n", @@ -691,9 +599,7 @@ { "cell_type": "markdown", "id": "e76a53a1", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Compiling with and without vectorization\n", "We can compile and link without vectorization using the clang c++ compiler" @@ -702,9 +608,7 @@ { "cell_type": "markdown", "id": "b4cae8a8", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clang -o novec.x vecexample.cpp\n" ] @@ -712,9 +616,7 @@ { "cell_type": "markdown", "id": "0a260598", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and with vectorization (and additional optimizations)" ] @@ -722,9 +624,7 @@ { "cell_type": "markdown", "id": "80bb119a", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clang++ -O3 -Rpass=loop-vectorize -o vec.x vecexample.cpp \n" ] @@ -732,9 +632,7 @@ { "cell_type": "markdown", "id": "51bdd37c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "The speedup depends on the size of the vectors. In the example here we have run with $10^7$ elements.\n", "The example here was run on an IMac17.1 with OSX El Capitan (10.11.4) as operating system and an Intel i5 3.3 GHz CPU." @@ -743,9 +641,7 @@ { "cell_type": "markdown", "id": "c5ced92e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " Compphys:~ hjensen$ ./vec.x 10000000\n", " Time used for norm computation=0.04720500000\n", @@ -756,9 +652,7 @@ { "cell_type": "markdown", "id": "d04f9072", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "This particular C++ compiler speeds up the above loop operations with a factor of 1.5 \n", "Performing the same operations for $10^9$ elements results in a smaller speedup since reading from main memory is required. The non-vectorized code is seemingly faster." @@ -767,9 +661,7 @@ { "cell_type": "markdown", "id": "e7195a54", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " Compphys:~ hjensen$ ./vec.x 1000000000\n", " Time used for norm computation=58.41391100\n", @@ -780,9 +672,7 @@ { "cell_type": "markdown", "id": "0dc46387", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "We will discuss these issues further in the next slides." ] @@ -790,9 +680,7 @@ { "cell_type": "markdown", "id": "53828efb", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Compiling with and without vectorization using clang\n", "We can compile and link without vectorization with clang compiler" @@ -801,9 +689,7 @@ { "cell_type": "markdown", "id": "636e5084", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clang++ -o -fno-vectorize novec.x vecexample.cpp\n" ] @@ -811,9 +697,7 @@ { "cell_type": "markdown", "id": "c1a74833", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and with vectorization" ] @@ -821,9 +705,7 @@ { "cell_type": "markdown", "id": "8b53cdca", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clang++ -O3 -Rpass=loop-vectorize -o vec.x vecexample.cpp \n" ] @@ -831,9 +713,7 @@ { "cell_type": "markdown", "id": "e5a73e92", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "We can also add vectorization analysis, see for example" ] @@ -841,9 +721,7 @@ { "cell_type": "markdown", "id": "c1b48043", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clang++ -O3 -Rpass-analysis=loop-vectorize -o vec.x vecexample.cpp \n" ] @@ -851,9 +729,7 @@ { "cell_type": "markdown", "id": "222eda16", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "or figure out if vectorization was missed" ] @@ -861,9 +737,7 @@ { "cell_type": "markdown", "id": "f136c7dc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clang++ -O3 -Rpass-missed=loop-vectorize -o vec.x vecexample.cpp \n" ] @@ -871,9 +745,7 @@ { "cell_type": "markdown", "id": "f88fc827", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, criteria\n", "\n", @@ -885,9 +757,7 @@ { "cell_type": "markdown", "id": "0eee3124", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (int j = 0; j < n; j++) {\n", " a[j] = cos(j*1.0);\n", @@ -897,9 +767,7 @@ { "cell_type": "markdown", "id": "fa513fe8", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "The variable $n$ does need to be known at compile time. However, this variable must stay the same for the entire duration of the loop. It implies that an exit statement inside the loop cannot be data dependent." ] @@ -907,9 +775,7 @@ { "cell_type": "markdown", "id": "8f93381d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, exit criteria\n", "\n", @@ -921,9 +787,7 @@ { "cell_type": "markdown", "id": "48b6cdfa", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (int j = 0; j < n; j++) {\n", " a[j] = cos(j*1.0);\n", @@ -934,9 +798,7 @@ { "cell_type": "markdown", "id": "13c853d0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "Avoid loop termination conditions and opt for a single entry loop variable $n$. The lower and upper bounds have to be kept fixed within the loop." ] @@ -944,9 +806,7 @@ { "cell_type": "markdown", "id": "10544a27", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, straight-line code\n", "\n", @@ -958,9 +818,7 @@ { "cell_type": "markdown", "id": "31ae931f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (int j = 0; j < n; j++) {\n", " double x = cos(j*1.0);\n", @@ -976,9 +834,7 @@ { "cell_type": "markdown", "id": "bcb1cde0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "These operations can be performed for all data elements but only those elements which the mask evaluates as true are stored. In general, one should avoid branches such as **switch**, **go to**, or **return** statements or **if** constructs that cannot be treated as masked assignments." ] @@ -986,9 +842,7 @@ { "cell_type": "markdown", "id": "4be96ac3", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, nested loops\n", "\n", @@ -998,9 +852,7 @@ { "cell_type": "markdown", "id": "ad21e609", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (int i = 0; i < n; i++) {\n", " for (int j = 0; j < n; j++) {\n", @@ -1012,9 +864,7 @@ { "cell_type": "markdown", "id": "d7824ee5", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "The exception is if an original outer loop is transformed into an inner loop as the result of compiler optimizations." ] @@ -1022,9 +872,7 @@ { "cell_type": "markdown", "id": "c25634af", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, function calls\n", "\n", @@ -1036,9 +884,7 @@ { "cell_type": "markdown", "id": "88445657", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (int i = 0; i < n; i++) {\n", " a[i] = log10(i)*cos(i);\n", @@ -1048,9 +894,7 @@ { "cell_type": "markdown", "id": "cc3d4d6f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "Similarly, **inline** functions defined by the programmer, allow for vectorization since the function statements are glued into the actual place where the function is called." ] @@ -1058,9 +902,7 @@ { "cell_type": "markdown", "id": "09e39571", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, data dependencies\n", "\n", @@ -1071,9 +913,7 @@ { "cell_type": "markdown", "id": "0fed6a0b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " double b = 15.;\n", " for (int i = 1; i < n; i++) {\n", @@ -1084,9 +924,7 @@ { "cell_type": "markdown", "id": "e0fd7e91", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "is an example of flow dependency and results in wrong numerical results if vectorized. For a scalar operation, the value $a[i-1]$ computed during the iteration is loaded into the right-hand side and the results are fine. In vector mode however, with a vector length of four, the values $a[0]$, $a[1]$, $a[2]$ and $a[3]$ from the previous loop will be loaded into the right-hand side and produce wrong results. That is, we have" ] @@ -1094,9 +932,7 @@ { "cell_type": "markdown", "id": "c1c90069", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " a[1] = a[0] + b;\n", " a[2] = a[1] + b;\n", @@ -1107,9 +943,7 @@ { "cell_type": "markdown", "id": "f97a137c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and if the two first iterations are executed at the same by the SIMD instruction, the value of say $a[1]$ could be used by the second iteration before it has been calculated by the first iteration, leading thereby to wrong results." ] @@ -1117,9 +951,7 @@ { "cell_type": "markdown", "id": "572312c2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, more data dependencies\n", "\n", @@ -1130,9 +962,7 @@ { "cell_type": "markdown", "id": "b001d418", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " double b = 15.;\n", " for (int i = 1; i < n; i++) {\n", @@ -1143,9 +973,7 @@ { "cell_type": "markdown", "id": "318116de", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "is an example of flow dependency that can be vectorized since no iteration with a higher value of $i$\n", "can complete before an iteration with a lower value of $i$. However, such code leads to problems with parallelization." @@ -1154,9 +982,7 @@ { "cell_type": "markdown", "id": "b1b7fde6", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Automatic vectorization and vectorization inhibitors, memory stride\n", "\n", @@ -1168,9 +994,7 @@ { "cell_type": "markdown", "id": "cf475fc8", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " for (int i = 0; i < n; i++) {\n", " for (int j = 0; j < n; j++) {\n", @@ -1182,9 +1006,7 @@ { "cell_type": "markdown", "id": "7077e205", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Memory management\n", "The main memory contains the program data\n", @@ -1210,9 +1032,7 @@ { "cell_type": "markdown", "id": "56042e32", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Memory and communication\n", "\n", @@ -1230,9 +1050,7 @@ { "cell_type": "markdown", "id": "ceeac209", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Measuring performance\n", "\n", @@ -1242,9 +1060,7 @@ { "cell_type": "markdown", "id": "12f14dd3", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " clock_t start, finish;\n", " start = clock();\n", @@ -1258,9 +1074,7 @@ { "cell_type": "markdown", "id": "86e68710", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Problems with measuring time\n", "1. Timers are not infinitely accurate\n", @@ -1277,9 +1091,7 @@ { "cell_type": "markdown", "id": "254b47e0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Problems with cold start\n", "\n", @@ -1297,9 +1109,7 @@ { "cell_type": "markdown", "id": "7c386419", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Problems with smart compilers\n", "\n", @@ -1315,9 +1125,7 @@ { "cell_type": "markdown", "id": "f7ff7326", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Problems with interference\n", "1. Other activities are sharing your processor\n", @@ -1336,9 +1144,7 @@ { "cell_type": "markdown", "id": "1cc12840", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Problems with measuring performance\n", "1. Accurate, reproducible performance measurement is hard\n", @@ -1353,9 +1159,7 @@ { "cell_type": "markdown", "id": "66754866", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Thomas algorithm for tridiagonal linear algebra equations" ] @@ -1363,9 +1167,7 @@ { "cell_type": "markdown", "id": "134fcdf2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "\\left( \\begin{array}{ccccc}\n", @@ -1394,9 +1196,7 @@ { "cell_type": "markdown", "id": "6e912b42", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Thomas algorithm, forward substitution\n", "The first step is to multiply the first row by $a_0/b_0$ and subtract it from the second row. This is known as the forward substitution step. We obtain then" @@ -1405,9 +1205,7 @@ { "cell_type": "markdown", "id": "bead360b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "a_i = 0,\n", @@ -1417,9 +1215,7 @@ { "cell_type": "markdown", "id": "500ad7db", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "b_i = b_i - \\frac{a_{i-1}}{b_{i-1}}c_{i-1},\n", @@ -1429,9 +1225,7 @@ { "cell_type": "markdown", "id": "ec0bf988", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and" ] @@ -1439,9 +1233,7 @@ { "cell_type": "markdown", "id": "934e2d03", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "f_i = f_i - \\frac{a_{i-1}}{b_{i-1}}f_{i-1}.\n", @@ -1451,9 +1243,7 @@ { "cell_type": "markdown", "id": "a568ef79", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "At this point the simplified equation, with only an upper triangular matrix takes the form" ] @@ -1461,9 +1251,7 @@ { "cell_type": "markdown", "id": "163cb6ff", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "\\left( \\begin{array}{ccccc}\n", @@ -1491,9 +1279,7 @@ { "cell_type": "markdown", "id": "17e7b86d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Thomas algorithm, backward substitution\n", "The next step is the backward substitution step. The last row is multiplied by $c_{N-3}/b_{N-2}$ and subtracted from the second to last row, thus eliminating $c_{N-3}$ from the last row. The general backward substitution procedure is" @@ -1502,9 +1288,7 @@ { "cell_type": "markdown", "id": "2363fd2a", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "c_i = 0,\n", @@ -1514,9 +1298,7 @@ { "cell_type": "markdown", "id": "0862b71c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and" ] @@ -1524,9 +1306,7 @@ { "cell_type": "markdown", "id": "859a1873", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "f_{i-1} = f_{i-1} - \\frac{c_{i-1}}{b_i}f_i\n", @@ -1536,9 +1316,7 @@ { "cell_type": "markdown", "id": "a54a9bda", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "All that ramains to be computed is the solution, which is the very straight forward process of" ] @@ -1546,9 +1324,7 @@ { "cell_type": "markdown", "id": "496ced14", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "x_i = \\frac{f_i}{b_i}\n", @@ -1558,9 +1334,7 @@ { "cell_type": "markdown", "id": "edbaac8a", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Thomas algorithm and counting of operations (floating point and memory)\n", "\n", @@ -1580,9 +1354,7 @@ { "cell_type": "markdown", "id": "bb1cc369", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // Forward substitution \n", " // Note that we can simplify by precalculating a[i-1]/b[i-1]\n", @@ -1601,9 +1373,7 @@ { "cell_type": "markdown", "id": "fb42c035", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [Example: Transpose of a matrix](https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program8.cpp)" ] @@ -1611,9 +1381,7 @@ { "cell_type": "markdown", "id": "c9cfb890", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " #include \n", @@ -1668,9 +1436,7 @@ { "cell_type": "markdown", "id": "5bb563cc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [Matrix-matrix multiplication](https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/LecturePrograms/programs/Classes/cpp/program9.cpp)\n", "This the matrix-matrix multiplication code with plain c++ memory allocation. It computes at the end the Frobenius norm." @@ -1679,9 +1445,7 @@ { "cell_type": "markdown", "id": "50514fd1", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " #include \n", @@ -1753,9 +1517,7 @@ { "cell_type": "markdown", "id": "e731c20f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## How do we define speedup? Simplest form\n", "* Speedup measures the ratio of performance between two objects\n", @@ -1772,9 +1534,7 @@ { "cell_type": "markdown", "id": "db46422c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## How do we define speedup? Correct baseline\n", "The key is choosing the correct baseline for comparison\n", @@ -1788,9 +1548,7 @@ { "cell_type": "markdown", "id": "4fa31944", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Parallel speedup\n", "For parallel applications, speedup is typically defined as\n", @@ -1803,9 +1561,7 @@ { "cell_type": "markdown", "id": "160fe438", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Speedup and memory\n", "The speedup on $p$ processors can\n", @@ -1821,9 +1577,7 @@ { "cell_type": "markdown", "id": "f0273e8e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Upper bounds on speedup\n", "Assume that almost all parts of a code are perfectly\n", @@ -1838,9 +1592,7 @@ { "cell_type": "markdown", "id": "0c71ec1b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Amdahl's law\n", "On one processor we have" @@ -1849,9 +1601,7 @@ { "cell_type": "markdown", "id": "f8768017", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "T_1 = (1-f)W + fW = W\n", @@ -1861,9 +1611,7 @@ { "cell_type": "markdown", "id": "b9cdd972", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "On $p$ processors we have" ] @@ -1871,9 +1619,7 @@ { "cell_type": "markdown", "id": "765fffd0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "T_p = (1-f)W + \\frac{fW}{p},\n", @@ -1883,9 +1629,7 @@ { "cell_type": "markdown", "id": "c78cc68e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "resulting in a speedup of" ] @@ -1893,9 +1637,7 @@ { "cell_type": "markdown", "id": "7cf3afbb", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "\\frac{T_1}{T_p} = \\frac{W}{(1-f)W+fW/p}\n", @@ -1905,9 +1647,7 @@ { "cell_type": "markdown", "id": "0e03f343", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "As $p$ goes to infinity, $fW/p$ goes to zero, and the maximum speedup is" ] @@ -1915,9 +1655,7 @@ { "cell_type": "markdown", "id": "0fd403f1", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "\\frac{1}{1-f},\n", @@ -1927,9 +1665,7 @@ { "cell_type": "markdown", "id": "3e724c74", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "meaning that if \n", "if $f = 0.99$ (all but $1\\%$ parallelizable), the maximum speedup\n", @@ -1939,9 +1675,7 @@ { "cell_type": "markdown", "id": "6c0eb811", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## How much is parallelizable\n", "If any non-parallel code slips into the\n", @@ -1955,9 +1689,7 @@ { "cell_type": "markdown", "id": "7a798569", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Today's situation of parallel computing\n", "\n", @@ -1973,9 +1705,7 @@ { "cell_type": "markdown", "id": "9bc86e86", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Overhead present in parallel computing\n", "\n", @@ -1994,9 +1724,7 @@ { "cell_type": "markdown", "id": "ec41b9cc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Parallelizing a sequential algorithm\n", "\n", @@ -2008,9 +1736,7 @@ { "cell_type": "markdown", "id": "d21f2ecf", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Strategies\n", "* Develop codes locally, run with some few processes and test your codes. Do benchmarking, timing and so forth on local nodes, for example your laptop or PC. \n", @@ -2021,9 +1747,7 @@ { "cell_type": "markdown", "id": "16f91552", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## How do I run MPI on a PC/Laptop? MPI\n", "To install MPI is rather easy on hardware running unix/linux as operating systems, follow simply the instructions from the [OpenMPI website](https://www.open-mpi.org/). See also subsequent slides.\n", @@ -2034,9 +1758,7 @@ { "cell_type": "markdown", "id": "fb249946", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " # Compile and link\n", " mpic++ -O3 -o nameofprog.x nameofprog.cpp\n", @@ -2047,9 +1769,7 @@ { "cell_type": "markdown", "id": "9abfee58", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Can I do it on my own PC/laptop? OpenMP installation\n", "If you wish to install MPI and OpenMP \n", @@ -2063,9 +1783,7 @@ { "cell_type": "markdown", "id": "c3a82ba8", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " brew install libomp\n" ] @@ -2073,9 +1791,7 @@ { "cell_type": "markdown", "id": "15fa2856", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and compile and link as" ] @@ -2083,9 +1799,7 @@ { "cell_type": "markdown", "id": "5091d16c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " c++ -o -lomp\n" ] @@ -2093,9 +1807,7 @@ { "cell_type": "markdown", "id": "84c6e059", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Installing MPI\n", "For linux/ubuntu users, you need to install two packages (alternatively use the synaptic package manager)" @@ -2104,9 +1816,7 @@ { "cell_type": "markdown", "id": "2c076927", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " sudo apt-get install libopenmpi-dev\n", " sudo apt-get install openmpi-bin\n" @@ -2115,9 +1825,7 @@ { "cell_type": "markdown", "id": "380349b2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "For OS X users, install brew (after having installed xcode and gcc, needed for the \n", "gfortran compiler of openmpi) and then install with brew" @@ -2126,9 +1834,7 @@ { "cell_type": "markdown", "id": "66f4b160", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " brew install openmpi\n" ] @@ -2136,9 +1842,7 @@ { "cell_type": "markdown", "id": "ee914a99", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "When running an executable (code.x), run as" ] @@ -2146,9 +1850,7 @@ { "cell_type": "markdown", "id": "69dd142d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " mpirun -n 10 ./code.x\n" ] @@ -2156,9 +1858,7 @@ { "cell_type": "markdown", "id": "6dd79aa0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "where we indicate that we want the number of processes to be 10." ] @@ -2166,9 +1866,7 @@ { "cell_type": "markdown", "id": "4b0be777", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Installing MPI and using Qt\n", "With openmpi installed, when using Qt, add to your .pro file the instructions [here](http://dragly.org/2012/03/14/developing-mpi-applications-in-qt-creator/)\n", @@ -2179,9 +1877,7 @@ { "cell_type": "markdown", "id": "47f9628e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## What is Message Passing Interface (MPI)?\n", "\n", @@ -2199,9 +1895,7 @@ { "cell_type": "markdown", "id": "5e572566", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Going Parallel with MPI\n", "**Task parallelism**: the work of a global problem can be divided\n", @@ -2215,9 +1909,7 @@ { "cell_type": "markdown", "id": "40f6f392", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_Command_name\n" ] @@ -2225,9 +1917,7 @@ { "cell_type": "markdown", "id": "65dc2a10", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and Fortran-binding (routine names are in uppercase, but can also be in lower case)" ] @@ -2235,9 +1925,7 @@ { "cell_type": "markdown", "id": "288fffcc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_COMMAND_NAME\n" ] @@ -2245,9 +1933,7 @@ { "cell_type": "markdown", "id": "1f90cffc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## MPI is a library\n", "MPI is a library specification for the message passing interface,\n", @@ -2268,9 +1954,7 @@ { "cell_type": "markdown", "id": "e34cf1af", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Bindings to MPI routines\n", "\n", @@ -2281,9 +1965,7 @@ { "cell_type": "markdown", "id": "36236bfc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_Command_name\n" ] @@ -2291,9 +1973,7 @@ { "cell_type": "markdown", "id": "dec0c5fe", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "and Fortran-binding (routine names are in uppercase, but can also be in lower case)" ] @@ -2301,9 +1981,7 @@ { "cell_type": "markdown", "id": "004b34b9", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_COMMAND_NAME\n" ] @@ -2311,9 +1989,7 @@ { "cell_type": "markdown", "id": "785531a4", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "The discussion in these slides focuses on the C++ binding." ] @@ -2321,9 +1997,7 @@ { "cell_type": "markdown", "id": "eb26e014", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Communicator\n", "* A group of MPI processes with a name (context).\n", @@ -2336,9 +2010,7 @@ { "cell_type": "markdown", "id": "0e318b0e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_COMM_WORLD \n" ] @@ -2346,9 +2018,7 @@ { "cell_type": "markdown", "id": "58912dc9", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Mechanism to identify subset of processes.\n", "\n", @@ -2358,9 +2028,7 @@ { "cell_type": "markdown", "id": "a2395f3d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Some of the most important MPI functions\n", "\n", @@ -2382,9 +2050,7 @@ { "cell_type": "markdown", "id": "d7c2bce6", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [The first MPI C/C++ program](https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program2.cpp)\n", "\n", @@ -2394,9 +2060,7 @@ { "cell_type": "markdown", "id": "319a505b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " using namespace std;\n", " #include \n", @@ -2417,9 +2081,7 @@ { "cell_type": "markdown", "id": "7f5c9ed7", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## The Fortran program" ] @@ -2427,9 +2089,7 @@ { "cell_type": "markdown", "id": "aeec3c45", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " PROGRAM hello\n", " INCLUDE \"mpif.h\"\n", @@ -2447,9 +2107,7 @@ { "cell_type": "markdown", "id": "5e15fb41", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Note 1\n", "\n", @@ -2463,9 +2121,7 @@ { "cell_type": "markdown", "id": "42ff7898", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [Ordered output with MPIBarrier](https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program3.cpp)" ] @@ -2473,9 +2129,7 @@ { "cell_type": "markdown", "id": "2e99dabf", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " int main (int nargs, char* args[])\n", " {\n", @@ -2494,9 +2148,7 @@ { "cell_type": "markdown", "id": "c6b861fa", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Note 2\n", "* Here we have used the $MPI\\_Barrier$ function to ensure that that every process has completed its set of instructions in a particular order.\n", @@ -2513,9 +2165,7 @@ { "cell_type": "markdown", "id": "9146b2c3", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [Ordered output](https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program4.cpp)" ] @@ -2523,9 +2173,7 @@ { "cell_type": "markdown", "id": "701a0919", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " .....\n", " int numprocs, my_rank, flag;\n", @@ -2547,9 +2195,7 @@ { "cell_type": "markdown", "id": "16de9d9d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Note 3\n", "\n", @@ -2560,9 +2206,7 @@ { "cell_type": "markdown", "id": "e7b0fc33", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " int MPI_Send(void *buf, int count, \n", " MPI_Datatype datatype, \n", @@ -2572,9 +2216,7 @@ { "cell_type": "markdown", "id": "895a5c9e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "This single command allows the passing of any kind of variable, even a large array, to any group of tasks. \n", "The variable **buf** is the variable we wish to send while **count**\n", @@ -2588,9 +2230,7 @@ { "cell_type": "markdown", "id": "d6c6f15f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Note 4\n", "\n", @@ -2601,9 +2241,7 @@ { "cell_type": "markdown", "id": "63989105", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " int MPI_Recv( void *buf, int count, MPI_Datatype datatype, \n", " int source, \n", @@ -2613,9 +2251,7 @@ { "cell_type": "markdown", "id": "d9b933dd", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "The arguments that are different from those in MPI\\_SEND are\n", "**buf** which is the name of the variable where you will be storing the received data, \n", @@ -2632,9 +2268,7 @@ { "cell_type": "markdown", "id": "8f12f852", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [Numerical integration in parallel](https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program6.cpp)\n", "**Integrating $\\pi$.**\n", @@ -2647,9 +2281,7 @@ { "cell_type": "markdown", "id": "e66212c0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "$$\n", "I=\\int_a^bf(x) dx\\approx h\\left(f(a)/2 + f(a+h) +f(a+2h)+\\dots +f(b-h)+ f(b)/2\\right).\n", @@ -2659,9 +2291,7 @@ { "cell_type": "markdown", "id": "fa251800", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "Click [on this link](https://github.com/CompPhysics/ComputationalPhysics2/blob/gh-pages/doc/Programs/LecturePrograms/programs/MPI/chapter07/program6.cpp) for the full program." ] @@ -2669,9 +2299,7 @@ { "cell_type": "markdown", "id": "9d240e35", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Dissection of trapezoidal rule with $MPI\\_reduce$" ] @@ -2679,9 +2307,7 @@ { "cell_type": "markdown", "id": "e0a2c3c4", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // Trapezoidal rule and numerical integration usign MPI\n", " using namespace std;\n", @@ -2704,9 +2330,7 @@ { "cell_type": "markdown", "id": "5b4df5be", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Dissection of trapezoidal rule" ] @@ -2714,9 +2338,7 @@ { "cell_type": "markdown", "id": "5abef253", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // MPI initializations\n", " MPI_Init (&nargs, &args);\n", @@ -2737,9 +2359,7 @@ { "cell_type": "markdown", "id": "56b097b7", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Integrating with **MPI**" ] @@ -2747,9 +2367,7 @@ { "cell_type": "markdown", "id": "adb8e118", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " total_sum = 0.0;\n", " local_sum = trapezoidal_rule(local_a, local_b, local_n, \n", @@ -2772,9 +2390,7 @@ { "cell_type": "markdown", "id": "733964b4", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## How do I use $MPI\\_reduce$?\n", "\n", @@ -2784,9 +2400,7 @@ { "cell_type": "markdown", "id": "d4b84936", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_reduce( void *senddata, void* resultdata, int count, \n", " MPI_Datatype datatype, MPI_Op, int root, MPI_Comm comm)\n" @@ -2795,9 +2409,7 @@ { "cell_type": "markdown", "id": "d521318f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "The two variables $senddata$ and $resultdata$ are obvious, besides the fact that one sends the address\n", "of the variable or the first element of an array. If they are arrays they need to have the same size. \n", @@ -2812,9 +2424,7 @@ { "cell_type": "markdown", "id": "841af2bf", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## More on $MPI\\_Reduce$\n", "In our case, since we are summing\n", @@ -2829,9 +2439,7 @@ { "cell_type": "markdown", "id": "e61ceb8c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " MPI_Allreduce( void *senddata, void* resultdata, int count, \n", " MPI_Datatype datatype, MPI_Op, MPI_Comm comm) \n" @@ -2840,9 +2448,7 @@ { "cell_type": "markdown", "id": "b982f6a3", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Dissection of trapezoidal rule\n", "\n", @@ -2853,9 +2459,7 @@ { "cell_type": "markdown", "id": "d86099fc", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // this function defines the function to integrate\n", " double int_function(double x)\n", @@ -2869,9 +2473,7 @@ { "cell_type": "markdown", "id": "3fc023c7", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Dissection of trapezoidal rule" ] @@ -2879,9 +2481,7 @@ { "cell_type": "markdown", "id": "ddc06bee", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // this function defines the trapezoidal rule\n", " double trapezoidal_rule(double a, double b, int n, \n", @@ -2906,9 +2506,7 @@ { "cell_type": "markdown", "id": "a10d70fa", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [The quantum dot program for two electrons](https://github.com/CompPhysics/ComputationalPhysics2/blob/master/doc/Programs/ParallelizationMPI/MPIvmcqdot.cpp)" ] @@ -2916,9 +2514,7 @@ { "cell_type": "markdown", "id": "b25cf48b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // Variational Monte Carlo for atoms with importance sampling, slater det\n", " // Test case for 2-electron quantum dot, no classes using Mersenne-Twister RNG\n", @@ -3382,9 +2978,7 @@ { "cell_type": "markdown", "id": "44203cc9", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## What is OpenMP\n", "* OpenMP provides high-level thread programming\n", @@ -3412,9 +3006,7 @@ { "cell_type": "markdown", "id": "36e5b24c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Getting started, things to remember\n", " * Remember the header file" @@ -3423,9 +3015,7 @@ { "cell_type": "markdown", "id": "9867f65d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n" ] @@ -3433,9 +3023,7 @@ { "cell_type": "markdown", "id": "8dd7425f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Insert compiler directives in C++ syntax as" ] @@ -3443,9 +3031,7 @@ { "cell_type": "markdown", "id": "7689bd82", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp...\n" ] @@ -3453,9 +3039,7 @@ { "cell_type": "markdown", "id": "918398ae", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Compile with for example *c++ -fopenmp code.cpp*\n", "\n", @@ -3469,9 +3053,7 @@ { "cell_type": "markdown", "id": "9cf48e9c", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## OpenMP syntax\n", "* Mostly directives" @@ -3480,9 +3062,7 @@ { "cell_type": "markdown", "id": "c664027f", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp construct [ clause ...]\n" ] @@ -3490,9 +3070,7 @@ { "cell_type": "markdown", "id": "d8629aa5", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Some functions and types" ] @@ -3500,9 +3078,7 @@ { "cell_type": "markdown", "id": "0a8305d0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n" ] @@ -3510,9 +3086,7 @@ { "cell_type": "markdown", "id": "e22dd6c3", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Most apply to a block of code\n", "\n", @@ -3524,9 +3098,7 @@ { "cell_type": "markdown", "id": "0e58f028", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Different OpenMP styles of parallelism\n", "OpenMP supports several different ways to specify thread parallelism\n", @@ -3545,9 +3117,7 @@ { "cell_type": "markdown", "id": "a1d07f29", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## General code structure" ] @@ -3555,9 +3125,7 @@ { "cell_type": "markdown", "id": "936d678d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " main ()\n", @@ -3583,9 +3151,7 @@ { "cell_type": "markdown", "id": "068be97e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Parallel region\n", "* A parallel region is a block of code that is executed by a team of threads\n", @@ -3596,9 +3162,7 @@ { "cell_type": "markdown", "id": "7ce97e01", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel { ... }\n" ] @@ -3606,9 +3170,7 @@ { "cell_type": "markdown", "id": "3ec44770", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Clauses can be added at the end of the directive\n", "\n", @@ -3624,9 +3186,7 @@ { "cell_type": "markdown", "id": "c855e394", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Hello world, not again, please!" ] @@ -3634,9 +3194,7 @@ { "cell_type": "markdown", "id": "5c6a68f4", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " #include \n", @@ -3660,9 +3218,7 @@ { "cell_type": "markdown", "id": "e4fad1b0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Hello world, yet another variant" ] @@ -3670,9 +3226,7 @@ { "cell_type": "markdown", "id": "3c0516fb", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " #include \n", @@ -3692,9 +3246,7 @@ { "cell_type": "markdown", "id": "55b0e1f0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "Variables declared outside of the parallel region are shared by all threads\n", "If a variable like **id** is declared outside of the" @@ -3703,9 +3255,7 @@ { "cell_type": "markdown", "id": "e5a4912b", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel, \n" ] @@ -3713,9 +3263,7 @@ { "cell_type": "markdown", "id": "58c4b368", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "it would have been shared by various the threads, possibly causing erroneous output\n", " * Why? What would go wrong? Why do we add possibly?" @@ -3724,9 +3272,7 @@ { "cell_type": "markdown", "id": "3af5d943", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Important OpenMP library routines\n", "\n", @@ -3742,9 +3288,7 @@ { "cell_type": "markdown", "id": "601d3e58", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Private variables\n", "Private clause can be used to make thread- private versions of such variables:" @@ -3753,9 +3297,7 @@ { "cell_type": "markdown", "id": "3407c652", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel private(id)\n", " {\n", @@ -3767,9 +3309,7 @@ { "cell_type": "markdown", "id": "f1df3870", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* What is their value on entry? Exit?\n", "\n", @@ -3781,9 +3321,7 @@ { "cell_type": "markdown", "id": "fb155680", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Master region\n", "It is often useful to have only one thread execute some of the code in a parallel region. I/O statements are a common example" @@ -3792,9 +3330,7 @@ { "cell_type": "markdown", "id": "acf50b56", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel \n", " {\n", @@ -3809,9 +3345,7 @@ { "cell_type": "markdown", "id": "90d669be", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Parallel for loop\n", " * Inside a parallel region, the following compiler directive can be used to parallelize a for-loop:" @@ -3820,9 +3354,7 @@ { "cell_type": "markdown", "id": "0a8613d0", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp for\n" ] @@ -3830,9 +3362,7 @@ { "cell_type": "markdown", "id": "c449f7fd", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "* Clauses can be added, such as\n", "\n", @@ -3854,9 +3384,7 @@ { "cell_type": "markdown", "id": "cfab7ab9", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Parallel computations and loops\n", "\n", @@ -3866,9 +3394,7 @@ { "cell_type": "markdown", "id": "6e4ba0f5", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel for\n", " for (i=0; i\n", " #define CHUNKSIZE 100\n", @@ -3941,9 +3459,7 @@ { "cell_type": "markdown", "id": "d31bf577", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Example code for loop scheduling, guided instead of dynamic" ] @@ -3951,9 +3467,7 @@ { "cell_type": "markdown", "id": "81fb3ac1", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #include \n", " #define CHUNKSIZE 100\n", @@ -3975,9 +3489,7 @@ { "cell_type": "markdown", "id": "a8b85e83", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## More on Parallel for loop\n", "* The number of loop iterations cannot be non-deterministic; break, return, exit, goto not allowed inside the for-loop\n", @@ -3996,9 +3508,7 @@ { "cell_type": "markdown", "id": "54dae10e", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // #pragma omp parallel and #pragma omp for\n" ] @@ -4006,9 +3516,7 @@ { "cell_type": "markdown", "id": "b33f2f30", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "can be combined into" ] @@ -4016,9 +3524,7 @@ { "cell_type": "markdown", "id": "1d459dd2", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel for\n" ] @@ -4026,9 +3532,7 @@ { "cell_type": "markdown", "id": "10d6a997", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## What can happen with this loop?\n", "\n", @@ -4038,9 +3542,7 @@ { "cell_type": "markdown", "id": "110a5f02", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel for\n", " for (i=0; i maxval) {\n", @@ -4543,9 +3967,7 @@ { "cell_type": "markdown", "id": "f1978f07", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## Not all computations are simple, competing threads\n", "All threads are potentially accessing and changing the same values, **maxloc** and **maxval**.\n", @@ -4555,9 +3977,7 @@ { "cell_type": "markdown", "id": "0a24031d", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp atomic\n" ] @@ -4565,9 +3985,7 @@ { "cell_type": "markdown", "id": "e8ef8abb", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "1. Only one thread at a time can execute the following statement (not block). We can use the critical option" ] @@ -4575,9 +3993,7 @@ { "cell_type": "markdown", "id": "45f0a7bf", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp critical\n" ] @@ -4585,9 +4001,7 @@ { "cell_type": "markdown", "id": "39858691", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "1. Only one thread at a time can execute the following block\n", "\n", @@ -4597,9 +4011,7 @@ { "cell_type": "markdown", "id": "134e0582", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## How to find the max value using OpenMP\n", "Write down the simplest algorithm and look carefully for race conditions. How would you handle them? \n", @@ -4609,9 +4021,7 @@ { "cell_type": "markdown", "id": "04e7f12a", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " #pragma omp parallel for\n", " for (i=0; i\n", @@ -4846,9 +4232,7 @@ { "cell_type": "markdown", "id": "dd5b6bd1", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ "## [Matrix-matrix multiplication](https://github.com/CompPhysics/ComputationalPhysicsMSU/blob/master/doc/Programs/ParallelizationOpenMP/OpenMPmatrixmatrixmult.cpp)\n", "This the matrix-matrix multiplication code with plain c++ memory allocation using OpenMP" @@ -4857,9 +4241,7 @@ { "cell_type": "markdown", "id": "3da37543", - "metadata": { - "editable": true - }, + "metadata": {}, "source": [ " // Matrix-matrix multiplication and Frobenius norm of a matrix with OpenMP\n", " #include \n", @@ -4942,7 +4324,25 @@ ] } ], - "metadata": {}, + "metadata": { + "kernelspec": { + "display_name": "Python 3 (ipykernel)", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.18" + } + }, "nbformat": 4, "nbformat_minor": 5 }