Skip to content

Commit

Permalink
Merge pull request #63 from NOAA-EMC/ejh_try2
Browse files Browse the repository at this point in the history
converting to match current tarball
  • Loading branch information
edwardhartnett authored Dec 7, 2023
2 parents 16f5465 + 133b505 commit 386dbd6
Show file tree
Hide file tree
Showing 997 changed files with 80,734 additions and 491,541 deletions.
45 changes: 27 additions & 18 deletions README
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
wgrib2 revised 7.2016
wgrib2 revised 7.2016, 2.2021

wgrib2 is a program to read/write grib2 files.

Expand Down Expand Up @@ -29,22 +29,25 @@ Default makefile options
USE_REGEX=1
USE_TIGGE=1
USE_MYSQL=0
USE_IPOLATES=0
USE_IPOLATES=3
USE_SPECTRAL=1
USE_UDF=0
USE_OPENMP=1
USE_PROJ4=0
USE_WMO_VALIDATION=0
DISABLE_TIMEZONE=0
MAKE_FTN_API=0
MAKE_SHARED_LIB=0

USE_G2CLIB=0
USE_PNG=1
USE_JASPER=1
USE_OPENJPEG=1
USE_AEC=1

To compile without netcdf, set USE_NETCDF3=0 and USE_NETCDF4=0
To compile without netcdf v3, set USE_NETCDF3=1 and USE_NETCDF4=0
To compile without netcdf v4, set USE_NETCDF3=0 and USE_NETCDF4=1
To compile with netcdf v3, set USE_NETCDF3=1 and USE_NETCDF4=0
To compile with netcdf v4, set USE_NETCDF3=0 and USE_NETCDF4=1

For netcdf4, the netcdf4 and hdf5 libraries are usually not included
in the wgrib2.tgz file to save space. The makefile suggests
Expand All @@ -59,11 +62,12 @@ Support for Mysql is an option. You have to modify the makefile to indicate
the locations of the mysql includes and libraries and set USE_MYSQL=1

The option -new_grid uses the ipolates library to do the interpolation.
The required libraries are written in fortran and a few compilers are
already supported in the makefile. For other compilers, you are on
your own. The source code and makefile will have to be modified to use
the ipolates option. Consult a local expert if you want to install this
optional package. No help is available from NCEP for installing the package.
The default is to USE_IPOLATES=3, and USE_SPECTRAL=1. The interpolation
libraries are written in fortran and work with the supported compilers.
For other compilers, you are on your own. The source code and makefile
will have to be modified to use the ipolates option. Consult a local
expert if you want to install this optional package. No help is available
from NCEP for installing the package.

User Defined Functions (UDF) allow you to run shell commands from
within wgrib2. UDF are not available on windows machines unless
Expand Down Expand Up @@ -95,14 +99,22 @@ or supports time zones in a non-POSIX manner, then you have to set
DISABLE_TIMEZONE to 1.

Wgrib2 is both a stand alone utility and a library that is callable
from both C and Fortran. To enable the making the wgrib2 library, you
have to set
from both C, Fortran and python. To enable the making the wgrib2 library,
you have to set

MAKE_FTN_API=1

For use with python, you have to make a shared library using

MAKE_SHARED_LIB=1

To make a library, you have tocompile by

$ make lib

In older versions of wgrib2, the g2clib was the default decoder of grib files.
In the current version, you can use g2clib as an optional decoder. The main
use of compiling wgrib2 with g2clib is for testing g2clib.
use of compiling wgrib2 with g2clib is for testing g2clib.

USE_G2CLIB=1

Expand All @@ -111,12 +123,9 @@ the following options.

USE_PNG=0
USE_JASPER=0
USE_OPENJPEG=0
USE_AEC=0


You might want to turn off the various compressions because

1) libraries do not compile correctly (icc and pgcc have problems with Jasper)
2) reduce the executable size and compile time
3) problems with cross-compiling
Some of the optional libraries require CMake. Some of the optional libraries will
run a configure script. Both features can make cross compiling difficult.

1 change: 1 addition & 0 deletions README.AOCC
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
See _README.clang
File renamed without changes.
133 changes: 133 additions & 0 deletions README.ICON.DWD
Original file line number Diff line number Diff line change
@@ -0,0 +1,133 @@
Using the DWD's ICON forecast grib files with wgrib2 updated 12/2019, 4/2020



The DWD is making global forecasts using the ICON model. This model uses a triangular
mesh, and the forecast quantities are valid for the center of the triangles. The DWD
opendata server is distributing data in grib format for the forecast values from
the center of the triangles. This note shows how to process the grib data using
wgrib2.


Basics that DWD may change:

https://opendata.dwd.de/weather/nwp/icon/grib/HH

HH = 00, 06, 12 or 18

Step 1. Download the CLAT and CLON file

CLAT=latitude of the center of the triangles
CLON=longitude of the center of the triangles

https://opendata.dwd.de/weather/nwp/icon/grib/00/clat/icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLAT.grib2.bz2
https://opendata.dwd.de/weather/nwp/icon/grib/00/clon/icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLON.grib2.bz2

Step 2: Download some forecast files

Example

https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_YYYYMMDDHH_000_T_2M.grib2.bz2
https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_YYYYMMDDHH_001_T_2M.grib2.bz2
..
https://opendata.dwd.de/weather/nwp/icon/grib/00/t_2m/icon_global_icosahedral_single-level_YYYYMMDDHH_180_T_2M.grib2.bz2

Step 3: Uncompress the data
bunzip2:

Step 4: Combining the files
Bash:
cat icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLAT.grib2 \
icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLON.grib2 \
icon_global_icosahedral_single-level_YYYYMMDDHH_006_TMAX_2M.grib2 >icon.grb

Windows:
copy /b icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLAT.grib2 +
icon_global_icosahedral_time-invariant_YYYYMMDDHH_CLON.grib2 +
icon_global_icosahedral_single-level_YYYYMMDDHH_006_TMAX_2M.grib2 icon.grb

(all of above on one line)

Contents of icon.grb
$ wgrib2 icon.grb
1:0:d=2019040900:GEOLON:surface:anl:
2:5898409:d=2019040900:GEOLAT:surface:anl:
3:11796818:d=2019040900:TMP:2 m above ground:0-360 min max fcst:

Comment:

Regridding takes a long time for the first field because wgrib2 searches
each grid point to find the nearest neighbor. The rest of the fields
is much faster because wgrib2 retains a list of the nearest neighbors.
So processing is faster if all the fields that need regridding are put
into one file. (The unix cat command works for grib files.) This
slow first field behavior also works for the -lon option. The
nearest neighbor search is faster when using multiple cores and the
OpenMP version of wgrib2.


Example 1: Obtaining values for (10E, 20N) and (10W, 30S)

wgrib2 v2.0.9 (in development)
v2.0.9 adds -else, -elseif and -endif
v2.0.9 updates -grid_def to use GEOLAT and GEOLON

$ wgrib2 icon.grb -if "^(1|2):" -grid_def -else -s -lon 10 20 -lon 15 -30 -endif
1:0
2:5898409
3:11796818:d=2019040900:TMP:2 m above ground:0-6 hour max fcst::lon=9.968750,lat=20.012680,
val=296.588:lon=15.078125,lat=-30.069351,val=290.953



wgrib2 v2.0.6 - v2.0.8 (earlier versions of wgrib2 had a bug in -grid_def)

$ wgrib2 icon.grb \
-if ":GEOLAT:" -set center 7 -set_var NLAT -fi \
-if ":GEOLON:" -set center 7 -set_var ELON -fi \
-grid_def -s \
-not_if "^(1|2):" -lon 10 20 -lon 15 -30 -fi
1:0:d=2019040900:ELON:surface:anl:
2:5898409:d=2019040900:NLAT:surface:anl:
3:11796818:d=2019040900:TMP:2 m above ground:0-360 min max fcst::lon=9.968750,lat=20.012680,val=296.588:
lon=15.078125,lat=-30.069351,val=290.953


Example 2: a 1x1 degree global grid by nearest neighbor interpolation


wgrib2 v2.0.9 (in development)

$ wgrib2 icon.grb -if "^(1|2):" -grid_def -else -s -lola 0:360:1 -90:181:1 1x1.grb grib -endif
1:0
2:5898409
3:11796818:d=2019040900:TMP:2 m above ground:0-6 hour max fcst:

wgrib2 v2.0.6 - v2.0.8 (earlier versions of wgrib2 had a bug in -grid_def)

$ wgrib2 icon.grb \
-if ":GEOLAT:" -set center 7 -set_var NLAT -fi \
-if ":GEOLON:" -set center 7 -set_var ELON -fi \
-grid_def -s \
-not_if "^(1|2):" -lola 0:360:1 -90:181:1 1x1.grb grib
1:0:d=2019040900:ELON:local level type 1 0:anl:
2:5898409:d=2019040900:NLAT:local level type 1 0:anl:
3:11796818:d=2019040900:TMP:local level type 103 2:0-6 hour max fcst:


Example 3: Making a netcdf file

The raw ICON grib files do not have latitude and longitude information. By prepending
the CLON and CLAT files, the file has the longitude and latitude information. However,
the wgrib2 cannot make a netcdf file because the data are not on a lat-lon grid. One
could update the netcdf converter to output the ICON data on a trianglular mesh, but
how many visualization codes could read that netcdf file and make a plot?

The suggested method to make a netcdf file using wgrib2 is by making a lat-lon grib
file. See example 2. Once you have made the lat-lon file, you can make a netcdf
file using the grib2->netcdf utility of your choice.

The conversion from the trianglar mesh to a lat-lon grid is slow because a linear search
is used to find the nearest neighbor. The conversion can be made faster by using more cores
and setting the appropriate number of cores to use (export OMP_NUM_THREADS=n). This is why
you want more cores!
File renamed without changes.
123 changes: 123 additions & 0 deletions README.SIMD
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
SIMD

OpenMP v4.0 explicitly enables SIMD code generations. Earlier versions
of OpenMP were concerned with using multiple cores to speed up
processing, SIMD allows explict generation of vector instructions. The
current Zen 4 and p-core Xeon cpus support avx-512. That means that
a single instruction can support vectors of 16 single precision
numbers and 8 double precision numbers.

Older Zen, most current Intel client, and e-core Intel servers support
avx-2 which has vectors that are half the width of avx-512 vectors.

The current default (64-bit x86) wgrib2 build defaults to the original
64-bit x86 specification SSE which uses a vector of 4 single precision
floats. The alignment restrictions for SSE are pretty severe which make
the vector instructions difficult to use. So one should probably restrict
SIMD to cpus that support a variant of avx which allowed non-aligned
memory access.


What version of SIMD? SSE, AVX, AVX-2, AVX-512

This is not an easy question to answer. From the wiki

AVX-512
AMD: Zen 4 (2022)
Intel Xeon: Skylake (2015)
Intel client: Rocket Lake (2021) i11xxx
The Intel Core replacements for Rocket Lake
do not have support for avx-512 because it is
not supported by the e-cores in Alder Lake.

AVX-2
AMD: excavator (2015)
Intel server: Haswell (2013)
Intel client: Haswell (2013)
note: pentium and lower may not have avx-2
Intel client: Alder Lake (2021)
core, pentium and celeron have avx-2

AVX
AMD: Bulldozer (2011), puma, jaguar
Intel: Sandy Bridge (2011)
from wiki:
"Not all CPUs from the listed families support AVX. Generally, CPUs with
the commercial denomination Core i3/i5/i7/i9 support them, whereas Pentium
and Celeron CPUs before Tiger Lake[12] do not."

There was an effort to make AVX-2 the default for linux builds as AVX-2 was
introduced in 2013 by Intel and copied by AMD in 2015. However, that effort
failed as people pointed out that some Intel cpus didn't have support. Much
anguish was directed towards Intel marketing.

Will the future avx-10 make a difference? No. AVX-10 has three subsets, one
that supports 512-bit registers, another one for 256-bit registers and
a third for 128-bit register. This is no different from the current avx-512 vs
avx-2 problem. AVX-10 does bring some new capabilities, however they are more
ai related (reduced precision support).

Summary, there isn't a good univeral SIMD configuration. My desktop at
work is a 4 core Xeon with AVX-512. My desktop at home is newer but only
has AVX-2. Laptops in my family have SSEE (no avx), avx-2 and avx-512.
Servers that I use are either avx-2 or avx-512.

Wgrib and SIMD

Wgrib2 v3.1.3 adds support for OpenMP v4.0, and SIMD options were added to
unpk_complex.c and Ens_processing.c The conversion was to add OpenMP simd
pragmas and not replace any existing threading pragmas.

ran time wgrib2.v? -ens_processing x.v? 0 ensemble.grb

configuration: cpu amd 5600g (6 cores, 12 threads), nmve pcie-3
wgrib2 v3.1.3 beta 9/2023

v0
default build no simd optimizations
real 0m4.490s
user 0m16.958s
sys 0m0.442s

v1
cpu opt (avx2) -march=native -mtune=native
real 0m4.384s
user 0m16.808s
sys 0m0.458s

v2
cpu opn (avx2) + omp simd .. one loop rewritten,
ens_processing, unpk_complex have simd pragmas
real 0m4.256s
user 0m16.618s
sys 0m0.385s


v0 with OMP_NUM_THREADS=1
real 0m11.077s
user 0m10.914s
sys 0m0.140s

The "native" optimizations give about 2.5% speed improvement,
and simd pragmas gave a 5% improvement with a huge sampling error

Should thread parallelism be replaced by SIMD?

For short loops, yes. The overhead for setting up the threads is huge.
More testing is needed to give an answer for other instances.
For example, consider the following

#pragma omp simd
for (i = 0; i<HUGE, i++) x[i] = x[i] * factor;

The program will limited by how fast the system can read/write memory.

#pragma omp parallel for
for (i = 0; i<HUGE, i++) x[i] = x[i] * factor;

Again, the loop speed will be limited by memory bandwidth. Suppose we have 8
threads running the above loop on server with 8 memory controllers each
connected to a different ram stick. The threaded loop could potentially be much
faster than the simd loop.


Loading

0 comments on commit 386dbd6

Please sign in to comment.