-
Notifications
You must be signed in to change notification settings - Fork 14
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
16f5465
commit 133b505
Showing
997 changed files
with
80,734 additions
and
491,541 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
See _README.clang |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
Using the DWD's ICON forecast grib files with wgrib2 updated 12/2019, 4/2020 | ||
|
||
|
||
|
||
The DWD is making global forecasts using the ICON model. This model uses a triangular | ||
mesh, and the forecast quantities are valid for the center of the triangles. The DWD | ||
opendata server is distributing data in grib format for the forecast values from | ||
the center of the triangles. This note shows how to process the grib data using | ||
wgrib2. | ||
|
||
|
||
Basics that DWD may change: | ||
|
||
https://opendata.dwd.de/weather/nwp/icon/grib/HH | ||
|
||
HH = 00, 06, 12 or 18 | ||
|
||
Step 1. Download the CLAT and CLON file | ||
|
||
CLAT=latitude of the center of the triangles | ||
CLON=longitude of the center of the triangles | ||
|
||
https://opendata.dwd.de/weather/nwp/icon/grib/00/clat/icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLAT.grib2.bz2 | ||
https://opendata.dwd.de/weather/nwp/icon/grib/00/clon/icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLON.grib2.bz2 | ||
|
||
Step 2: Download some forecast files | ||
|
||
Example | ||
|
||
https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_YYYYMMDDHH_000_T_2M.grib2.bz2 | ||
https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_YYYYMMDDHH_001_T_2M.grib2.bz2 | ||
.. | ||
https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_YYYYMMDDHH_180_T_2M.grib2.bz2 | ||
|
||
Step 3: Uncompress the data | ||
bunzip2: | ||
|
||
Step 4: Combining the files | ||
Bash: | ||
cat icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLAT.grib2 \ | ||
icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLON.grib2 \ | ||
icon_global_icosahedral_single-level_YYYYMMDDHH_006_TMAX_2M.grib2 >icon.grb | ||
|
||
Windows: | ||
copy /b icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLAT.grib2 + | ||
icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLON.grib2 + | ||
icon_global_icosahedral_single-level_YYYYMMDDHH_006_TMAX_2M.grib2 icon.grb | ||
|
||
(all of above on one line) | ||
|
||
Contents of icon.grb | ||
$ wgrib2 icon.grb | ||
1:0:d=2019040900:GEOLON:surface:anl: | ||
2:5898409:d=2019040900:GEOLAT:surface:anl: | ||
3:11796818:d=2019040900:TMP:2 m above ground:0-360 min max fcst: | ||
|
||
Comment: | ||
|
||
Regridding takes a long time for the first field because wgrib2 searches | ||
each grid point to find the nearest neighbor. The rest of the fields | ||
is much faster because wgrib2 retains a list of the nearest neighbors. | ||
So processing is faster if all the fields that need regridding are put | ||
into one file. (The unix cat command works for grib files.) This | ||
slow first field behavior also works for the -lon option. The | ||
nearest neighbor search is faster when using multiple cores and the | ||
OpenMP version of wgrib2. | ||
|
||
|
||
Example 1: Obtaining values for (10E, 20N) and (10W, 30S) | ||
|
||
wgrib2 v2.0.9 (in development) | ||
v2.0.9 adds -else, -elseif and -endif | ||
v2.0.9 updates -grid_def to use GEOLAT and GEOLON | ||
|
||
$ wgrib2 icon.grb -if "^(1|2):" -grid_def -else -s -lon 10 20 -lon 15 -30 -endif | ||
1:0 | ||
2:5898409 | ||
3:11796818:d=2019040900:TMP:2 m above ground:0-6 hour max fcst::lon=9.968750,lat=20.012680, | ||
val=296.588:lon=15.078125,lat=-30.069351,val=290.953 | ||
|
||
|
||
|
||
wgrib2 v2.0.6 - v2.0.8 (earlier versions of wgrib2 had a bug in -grid_def) | ||
|
||
$ wgrib2 icon.grb \ | ||
-if ":GEOLAT:" -set center 7 -set_var NLAT -fi \ | ||
-if ":GEOLON:" -set center 7 -set_var ELON -fi \ | ||
-grid_def -s \ | ||
-not_if "^(1|2):" -lon 10 20 -lon 15 -30 -fi | ||
1:0:d=2019040900:ELON:surface:anl: | ||
2:5898409:d=2019040900:NLAT:surface:anl: | ||
3:11796818:d=2019040900:TMP:2 m above ground:0-360 min max fcst::lon=9.968750,lat=20.012680,val=296.588: | ||
lon=15.078125,lat=-30.069351,val=290.953 | ||
|
||
|
||
Example 2: a 1x1 degree global grid by nearest neighbor interpolation | ||
|
||
|
||
wgrib2 v2.0.9 (in development) | ||
|
||
$ wgrib2 icon.grb -if "^(1|2):" -grid_def -else -s -lola 0:360:1 -90:181:1 1x1.grb grib -endif | ||
1:0 | ||
2:5898409 | ||
3:11796818:d=2019040900:TMP:2 m above ground:0-6 hour max fcst: | ||
|
||
wgrib2 v2.0.6 - v2.0.8 (earlier versions of wgrib2 had a bug in -grid_def) | ||
|
||
$ wgrib2 icon.grb \ | ||
-if ":GEOLAT:" -set center 7 -set_var NLAT -fi \ | ||
-if ":GEOLON:" -set center 7 -set_var ELON -fi \ | ||
-grid_def -s \ | ||
-not_if "^(1|2):" -lola 0:360:1 -90:181:1 1x1.grb grib | ||
1:0:d=2019040900:ELON:local level type 1 0:anl: | ||
2:5898409:d=2019040900:NLAT:local level type 1 0:anl: | ||
3:11796818:d=2019040900:TMP:local level type 103 2:0-6 hour max fcst: | ||
|
||
|
||
Example 3: Making a netcdf file | ||
|
||
The raw ICON grib files do not have latitude and longitude information. By prepending | ||
the CLON and CLAT files, the file has the longitude and latitude information. However, | ||
the wgrib2 cannot make a netcdf file because the data are not on a lat-lon grid. One | ||
could update the netcdf converter to output the ICON data on a trianglular mesh, but | ||
how many visualization codes could read that netcdf file and make a plot? | ||
|
||
The suggested method to make a netcdf file using wgrib2 is by making a lat-lon grib | ||
file. See example 2. Once you have made the lat-lon file, you can make a netcdf | ||
file using the grib2->netcdf utility of your choice. | ||
|
||
The conversion from the trianglar mesh to a lat-lon grid is slow because a linear search | ||
is used to find the nearest neighbor. The conversion can be made faster by using more cores | ||
and setting the appropriate number of cores to use (export OMP_NUM_THREADS=n). This is why | ||
you want more cores! |
File renamed without changes.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,123 @@ | ||
SIMD | ||
|
||
OpenMP v4.0 explicitly enables SIMD code generations. Earlier versions | ||
of OpenMP were concerned with using multiple cores to speed up | ||
processing, SIMD allows explict generation of vector instructions. The | ||
current Zen 4 and p-core Xeon cpus support avx-512. That means that | ||
a single instruction can support vectors of 16 single precision | ||
numbers and 8 double precision numbers. | ||
|
||
Older Zen, most current Intel client, and e-core Intel servers support | ||
avx-2 which has vectors that are half the width of avx-512 vectors. | ||
|
||
The current default (64-bit x86) wgrib2 build defaults to the original | ||
64-bit x86 specification SSE which uses a vector of 4 single precision | ||
floats. The alignment restrictions for SSE are pretty severe which make | ||
the vector instructions difficult to use. So one should probably restrict | ||
SIMD to cpus that support a variant of avx which allowed non-aligned | ||
memory access. | ||
|
||
|
||
What version of SIMD? SSE, AVX, AVX-2, AVX-512 | ||
|
||
This is not an easy question to answer. From the wiki | ||
|
||
AVX-512 | ||
AMD: Zen 4 (2022) | ||
Intel Xeon: Skylake (2015) | ||
Intel client: Rocket Lake (2021) i11xxx | ||
The Intel Core replacements for Rocket Lake | ||
do not have support for avx-512 because it is | ||
not supported by the e-cores in Alder Lake. | ||
|
||
AVX-2 | ||
AMD: excavator (2015) | ||
Intel server: Haswell (2013) | ||
Intel client: Haswell (2013) | ||
note: pentium and lower may not have avx-2 | ||
Intel client: Alder Lake (2021) | ||
core, pentium and celeron have avx-2 | ||
|
||
AVX | ||
AMD: Bulldozer (2011), puma, jaguar | ||
Intel: Sandy Bridge (2011) | ||
from wiki: | ||
"Not all CPUs from the listed families support AVX. Generally, CPUs with | ||
the commercial denomination Core i3/i5/i7/i9 support them, whereas Pentium | ||
and Celeron CPUs before Tiger Lake[12] do not." | ||
|
||
There was an effort to make AVX-2 the default for linux builds as AVX-2 was | ||
introduced in 2013 by Intel and copied by AMD in 2015. However, that effort | ||
failed as people pointed out that some Intel cpus didn't have support. Much | ||
anguish was directed towards Intel marketing. | ||
|
||
Will the future avx-10 make a difference? No. AVX-10 has three subsets, one | ||
that supports 512-bit registers, another one for 256-bit registers and | ||
a third for 128-bit register. This is no different from the current avx-512 vs | ||
avx-2 problem. AVX-10 does bring some new capabilities, however they are more | ||
ai related (reduced precision support). | ||
|
||
Summary, there isn't a good univeral SIMD configuration. My desktop at | ||
work is a 4 core Xeon with AVX-512. My desktop at home is newer but only | ||
has AVX-2. Laptops in my family have SSEE (no avx), avx-2 and avx-512. | ||
Servers that I use are either avx-2 or avx-512. | ||
|
||
Wgrib and SIMD | ||
|
||
Wgrib2 v3.1.3 adds support for OpenMP v4.0, and SIMD options were added to | ||
unpk_complex.c and Ens_processing.c The conversion was to add OpenMP simd | ||
pragmas and not replace any existing threading pragmas. | ||
|
||
ran time wgrib2.v? -ens_processing x.v? 0 ensemble.grb | ||
|
||
configuration: cpu amd 5600g (6 cores, 12 threads), nmve pcie-3 | ||
wgrib2 v3.1.3 beta 9/2023 | ||
|
||
v0 | ||
default build no simd optimizations | ||
real 0m4.490s | ||
user 0m16.958s | ||
sys 0m0.442s | ||
|
||
v1 | ||
cpu opt (avx2) -march=native -mtune=native | ||
real 0m4.384s | ||
user 0m16.808s | ||
sys 0m0.458s | ||
|
||
v2 | ||
cpu opn (avx2) + omp simd .. one loop rewritten, | ||
ens_processing, unpk_complex have simd pragmas | ||
real 0m4.256s | ||
user 0m16.618s | ||
sys 0m0.385s | ||
|
||
|
||
v0 with OMP_NUM_THREADS=1 | ||
real 0m11.077s | ||
user 0m10.914s | ||
sys 0m0.140s | ||
|
||
The "native" optimizations give about 2.5% speed improvement, | ||
and simd pragmas gave a 5% improvement with a huge sampling error | ||
|
||
Should thread parallelism be replaced by SIMD? | ||
|
||
For short loops, yes. The overhead for setting up the threads is huge. | ||
More testing is needed to give an answer for other instances. | ||
For example, consider the following | ||
|
||
#pragma omp simd | ||
for (i = 0; i<HUGE, i++) x[i] = x[i] * factor; | ||
|
||
The program will limited by how fast the system can read/write memory. | ||
|
||
#pragma omp parallel for | ||
for (i = 0; i<HUGE, i++) x[i] = x[i] * factor; | ||
|
||
Again, the loop speed will be limited by memory bandwidth. Suppose we have 8 | ||
threads running the above loop on server with 8 memory controllers each | ||
connected to a different ram stick. The threaded loop could potentially be much | ||
faster than the simd loop. | ||
|
||
|
Oops, something went wrong.