Skip to content

Commit

Permalink
add README.md in examples
Browse files Browse the repository at this point in the history
  • Loading branch information
wkliao committed Oct 19, 2023
1 parent 20e57de commit 69948af
Show file tree
Hide file tree
Showing 5 changed files with 126 additions and 86 deletions.
88 changes: 3 additions & 85 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -153,91 +153,9 @@ concatenation is important.
have been created previously in the output file and their names must be the
same as the one used in '-k' option.

## Sample input and output files
* There are four sample input HDF5 files provided in folder `examples`.
+ examples/sample_r11981_s06.gz
+ examples/sample_r11981_s07.gz
+ examples/sample_r11981_s08.gz
+ examples/sample_r11981_s09.gz
* These sample files are previously compressed. Run command 'make' will
uncompress them into HDF5 files, or run command below to uncompress them.
```console
% gzip -dc examples/sample_r11981_s06.gz > examples/sample_r11981_s06.h5
% gzip -dc examples/sample_r11981_s07.gz > examples/sample_r11981_s07.h5
% gzip -dc examples/sample_r11981_s08.gz > examples/sample_r11981_s08.h5
% gzip -dc examples/sample_r11981_s09.gz > examples/sample_r11981_s09.h5
```
* Sample run commands
```console
% mpiexec -n 2 ./ph5_concat -i examples/sample_list.txt -o sample_output.h5
% mpiexec -n 4 ./ph5_concat -i examples/sample_list.txt -o sample_output.h5 -k evt
```
The output shown on screen is stored in `examples/sample_stdout.txt`.
* Sample output files
+ The output files from concatenating the 4 sample files are available in
`examples/sample_output.h5.gz` whose metadata dumped from command below is
also available in `examples/sample_output.metadata`.
```console
% gzip -dc examples/sample_output.h5.gz > sample_output.h5
% h5dump -Hp sample_output.h5
```

## An example timing output from a run on Cori using 128 MPI processes.
```console
% srun -n 128 ./ph5_concat -i ./nd_list_128.txt -o /scratch1/FS_1M_128/nd_out.h5 -b 512 -k evt

Number of input HDF5 files: 128
Input directory name: /global/cscratch1/sd/wkliao/FS_1M_8
Output file name: /global/cscratch1/sd/wkliao/FS_1M_128/nd_out.h5
Output datasets are compressed with level 6
Read metadata from input files takes 1.2776 seconds
Create output file + datasets takes 25.7466 seconds
Concatenating 1D datasets takes 158.8101 seconds
Writ partition key datasets takes 14.0372 seconds
Concatenating 2D datasets takes 114.4464 seconds
Close input files takes 0.0037 seconds
Close output files takes 0.4797 seconds
-------------------------------------------------------------
Input directory name: /scratch/FS_1M_8
Number of input HDF5 files: 128
Output HDF5 file name: /scratch1/FS_1M_128/nd_out.h5
Parallel I/O strategy: 2
Use POSIX I/O to open file: ON
POSIX In-memory I/O: ON
1-process-create-followed-by-all-open: OFF
Chunk caching for raw data: ON
GZIP level: 6
Internal I/O buffer size: 512.0 MiB
Dataset used to produce partition key: evt
Name of partition key datasets: evt.seq
-------------------------------------------------------------
Number of groups: 999
Number of non-zero-sized groups: 108
Number of groups have partition key: 108
Total number of datasets: 17971
Total number of non-zero datasets: 2795
-------------------------------------------------------------
Number of MPI processes: 128
Number calls to MPI_Allreduce: 3
Number calls to MPI_Exscan: 2
-------------------------------------------------------------
H5Dcreate: 25.6772
H5Dread for 1D datasets: 1.8583
H5Dwrite for 1D datasets: 170.2729
H5Dread for 2D datasets: 19.9304
H5Dwrite for 2D datasets: 93.9265
H5Dclose for input datasets: 0.0737
H5Dclose for output datasets: 0.0516
-------------------------------------------------------------
Read metadata from input files: 1.2782
Create output file + datasets: 25.7466
Concatenate small datasets: 158.8102
Write to partition key datasets: 14.0372
Concatenate large datasets: 114.4520
Close input files: 0.0124
Close output files: 0.4799
End-to-end: 314.8095
```
## Run example
An example run with small input files for illustration is available in
folder [examples](./examples).

## Publications
* Sunwoo Lee, Kai-yuan Hou, Kewei Wang, Saba Sehrish, Marc Paterno,
Expand Down
122 changes: 122 additions & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,122 @@
# Examples of running 'ph5concat'

This folder contains examples of running `ph5concat`, including a set of small
input files, run commands, concatenated output file, and metadata of input and
out files.

# Input files
* There are four input HDF5 files, each containing a small number of dataset
extracted from neutrino simulation data for illustrative purpose.
+ sample_r11981_s06.gz
+ sample_r11981_s07.gz
+ sample_r11981_s08.gz
+ sample_r11981_s09.gz

* These input files are compressed. Run command 'make' will uncompress them
into HDF5 files. They can also be uncompressed manually by commands below.
```console
% gzip -dc sample_r11981_s06.gz > sample_r11981_s06.h5
% gzip -dc sample_r11981_s07.gz > sample_r11981_s07.h5
% gzip -dc sample_r11981_s08.gz > sample_r11981_s08.h5
% gzip -dc sample_r11981_s09.gz > sample_r11981_s09.h5
```

* Metadata of input files
* Metadata of individual files can be retrieved by running command 'h5ls -r'
or 'h5dump -H'.
* The metadata of files 'sample_r11981_s06.h5' and 'sample_r11981_s07.h5' is
shown below.
<p align="left">
<img align="center" src="./s06_s07.tiff" width="1000">
</p>
* In these examples, each file contains 4 groups at the root level, namely
'neutrino', 'rec.me.trkkalman', 'rec.training.cvnmaps', and 'spill'. The
number of groups and their names must be identical among all input files to
be concatenated.
* In an input file, the number of datasets in a group can be different from
another group.
* Given a group, the number of datasets and their names it contains must be
identical among all input files. However, the size of first dimension of
datasets can be different.


* Run commands
```console
% mpiexec -n 2 ../ph5_concat -i sample_list.txt -o sample_output.h5
% mpiexec -n 4 ../ph5_concat -i sample_list.txt -o sample_output.h5 -k evt
```
When completed, the output shown on screen is available in
[sample_stdout.txt](./sample_stdout.txt).

* Concatenated output file
+ The concatenated output file is provided in `sample_output.h5.gz`.
+ The metadata retrieved from 'h5dump' command is also available in
`sample_output.metadata`.
```console
% gzip -dc sample_output.h5.gz > sample_output.h5
% h5dump -Hp sample_output.h5
```
* A short version of the metadata is shown below.
<p align="left">
<img align="center" src="./concated.tiff" width="400">
</p>

## An example output from a run concatenating 128 files
Below is an example timing output from a larger run on Cori using 128 MPI
processes.
```console
% srun -n 128 ./ph5_concat -i ./nd_list_128.txt -o /scratch1/FS_1M_128/nd_out.h5 -b 512 -k evt

Number of input HDF5 files: 128
Input directory name: /global/cscratch1/sd/wkliao/FS_1M_8
Output file name: /global/cscratch1/sd/wkliao/FS_1M_128/nd_out.h5
Output datasets are compressed with level 6
Read metadata from input files takes 1.2776 seconds
Create output file + datasets takes 25.7466 seconds
Concatenating 1D datasets takes 158.8101 seconds
Write partition key datasets takes 14.0372 seconds
Concatenating 2D datasets takes 114.4464 seconds
Close input files takes 0.0037 seconds
Close output files takes 0.4797 seconds
-------------------------------------------------------------
Input directory name: /scratch/FS_1M_8
Number of input HDF5 files: 128
Output HDF5 file name: /scratch1/FS_1M_128/nd_out.h5
Parallel I/O strategy: 2
Use POSIX I/O to open file: ON
POSIX In-memory I/O: ON
1-process-create-followed-by-all-open: OFF
Chunk caching for raw data: ON
GZIP level: 6
Internal I/O buffer size: 512.0 MiB
Dataset used to produce partition key: evt
Name of partition key datasets: evt.seq
-------------------------------------------------------------
Number of groups: 999
Number of non-zero-sized groups: 108
Number of groups have partition key: 108
Total number of datasets: 17971
Total number of non-zero datasets: 2795
-------------------------------------------------------------
Number of MPI processes: 128
Number calls to MPI_Allreduce: 3
Number calls to MPI_Exscan: 2
-------------------------------------------------------------
H5Dcreate: 25.6772
H5Dread for 1D datasets: 1.8583
H5Dwrite for 1D datasets: 170.2729
H5Dread for 2D datasets: 19.9304
H5Dwrite for 2D datasets: 93.9265
H5Dclose for input datasets: 0.0737
H5Dclose for output datasets: 0.0516
-------------------------------------------------------------
Read metadata from input files: 1.2782
Create output file + datasets: 25.7466
Concatenate small datasets: 158.8102
Write to partition key datasets: 14.0372
Concatenate large datasets: 114.4520
Close input files: 0.0124
Close output files: 0.4799
End-to-end: 314.8095
```

Binary file added examples/concated.tiff
Binary file not shown.
Binary file added examples/s06_s07.tiff
Binary file not shown.
2 changes: 1 addition & 1 deletion examples/sample_stdout.txt
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Output datasets are compressed with level 6
Read metadata from input files takes 0.0096 seconds
Create output file + datasets takes 0.0149 seconds
Concatenating 1D datasets takes 0.0480 seconds
Writ partition key datasets takes 0.0240 seconds
Write partition key datasets takes 0.0240 seconds
Concatenating 2D datasets takes 0.0081 seconds
Close input files takes 0.0005 seconds
Close output files takes 0.0010 seconds
Expand Down

0 comments on commit 69948af

Please sign in to comment.