Skip to content

Commit

Permalink
Merge pull request #58 from enram/documentation
Browse files Browse the repository at this point in the history
Consistent use of H5, HDF5, VP, VPTS, PVOL in documentation
  • Loading branch information
peterdesmet authored Aug 23, 2023
2 parents e78f50f + e6f71f8 commit 324a180
Show file tree
Hide file tree
Showing 16 changed files with 111 additions and 125 deletions.
3 changes: 2 additions & 1 deletion AUTHORS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
# Contributors

* Nicola Noé
* Stijn Van Hoey
* Nicolas Noé
* Peter Desmet
10 changes: 5 additions & 5 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@

## Version 0.1.0

- Integrate functions from [odimh5](https://pypi.org/project/odimh5) to read odim5 files
- Support for converting ODIm hdf5 files to the vpts-csv data standard
- s3 data storage integration
- CLI endpoint for the transfer of ODIM hdf5 files from Baltrad to the aloft S3 bucket
- CLI endpoint for the conversion from ODIM hdf5 files to daily/monthly aggregates as vpts-csv format
- Integrate functions from [odimh5](https://pypi.org/project/odimh5) to read ODIM HDF5 files
- Support for converting ODIM HDF5 files to the VPTS CSV data standard
- S3 data storage integration
- CLI endpoint for the transfer of ODIM HDF5 files from Baltrad to the Aloft S3 bucket
- CLI endpoint for the conversion from ODIM HDF5 files to daily/monthly aggregates as VPTS CSV format
- Setup CI with Github Actions
2 changes: 1 addition & 1 deletion LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
The MIT License (MIT)

Copyright (c) 2021 ENRAM
Copyright (c) 2023 Research Institute for Nature and Forest (INBO)

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
Expand Down
46 changes: 16 additions & 30 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,7 @@
[![PyPI-Server](https://img.shields.io/pypi/v/vptstools.svg)](https://pypi.org/project/vptstools/)
[![.github/workflows/release.yml](https://github.com/enram/vptstools/actions/workflows/release.yml/badge.svg)](https://github.com/enram/vptstools/actions/workflows/release.yml)

vptstools is a Python library to transfer and convert vpts data. VPTS (vertical profile time series) express the
density, speed and direction of biological signals such as birds, bats and insects within a weather radar volume,
grouped into altitude layers (height) and measured over time (datetime).
vptstools is a Python library to transfer and convert VPTS data. VPTS (vertical profile time series) express the density, speed and direction of biological signals such as birds, bats and insects within a weather radar volume, grouped into altitude layers (height) and measured over time (datetime).

## Installation

Expand All @@ -24,31 +22,30 @@ pip install vptstools\[transfer\]

## Usage

As a library user interested in working with ODIM h5 and vpts files, the most important functions provided by the
package are {py:func}`vptstools.vpts.vp`, {py:func}`vptstools.vpts.vpts` and {py:func}`vptstools.vpts.vpts_to_csv`,
which can be used respectively to convert a single `h5` file, a set of `h5` files and save a `vpts` DataFrame
to a csv-file:
As a library user interested in working with ODIM HDF5 and VPTS files, the most important functions provided by the package are {py:func}`vptstools.vpts.vp`, {py:func}`vptstools.vpts.vpts` and {py:func}`vptstools.vpts.vpts_to_csv`, which can be used respectively to convert a single HDF5 file, a set of HDF5 files and save a VPTS DataFrame to a CSV file:

- Convert a single local ODIM h5 file to a vp DataFrame:
- Convert a single local ODIM HDF5 file to a VP DataFrame:

```python
from vptstools.vpts import vp

file_path_h5 = "./NLDBL_vp_20080215T0010_NL50_v0-3-20.h5"
# Download https://aloftdata.s3-eu-west-1.amazonaws.com/baltrad/hdf5/nldbl/2013/11/23/nldbl_vp_20131123T0000Z.h5
file_path_h5 = "./nldbl_vp_20131123T0000Z.h5"
df_vp = vp(file_path_h5)
```

- Convert a set of locally stored ODIM h5 files to a vpts DataFrame:
- Convert a set of locally stored ODIM HDF5 files to a VPTS DataFrame:

```python
from pathlib import Path
from vptstools.vpts import vpts

file_paths = sorted(Path("./data").rglob("*.h5")) # Get all h5 files within the data directory
# Download files to data directory from e.g. https://aloftdata.eu/browse/?prefix=baltrad/hdf5/nldbl/2013/11/23/
file_paths = sorted(Path("./data").rglob("*.h5")) # Get all HDF5 files within the data directory
df_vpts = vpts(file_paths)
```

- Store a `vp` or `vpts` DataFrame to a [VPTS CSV](https://aloftdata.eu/vpts-csv/) file:
- Store a VP or VPTS DataFrame to a [VPTS CSV](https://aloftdata.eu/vpts-csv/) file:

```python
from vptstools.vpts import vpts_to_csv
Expand All @@ -57,15 +54,10 @@ vpts_to_csv(df_vpts, "vpts.csv")
```

```{note}
Both {py:func}`vptstools.vpts.vp` and {py:func}`vptstools.vpts.vpts` have 2 other optional parameters related to the
[VPTS-CSV data exchange format](https://aloftdata.eu/vpts-csv/). The `vpts_csv_version` parameter defines the version of the
[VPTS-CSV data exchange standard](https://aloftdata.eu/vpts-csv/) (default v1) whereas the `source_file` provides a way to define
a custom [source_file](https://aloftdata.eu/vpts-csv/#source_file) field to reference the source from which the
data were derived.
Both {py:func}`vptstools.vpts.vp` and {py:func}`vptstools.vpts.vpts` have 2 other optional parameters related to the [VPTS CSV data exchange format](https://aloftdata.eu/vpts-csv/). The `vpts_csv_version` parameter defines the version of the VPTS CSV data exchange standard (default v1) whereas the `source_file` provides a way to define a custom [source_file](https://aloftdata.eu/vpts-csv/#source_file) field to reference the source from which the data were derived.
```

To validate a vpts DataFrame against the frictionless data schema as defined by the VPTS-CSV data exchange
format and return a report, use the {py:func}`vptstools.vpts.validate_vpts`:
To validate a VPTS DataFrame against the frictionless data schema as defined by the VPTS CSV data exchange format and return a report, use the {py:func}`vptstools.vpts.validate_vpts`:

```python
from vptstools.vpts import validate_vpts
Expand All @@ -76,27 +68,21 @@ report.stats["errors"]

Other modules in the package are:

- {py:mod}`vptstools.odimh5`: This module extents the implementation of the original
[odimh5 package](https://pypi.org/project/odimh5/) which is now deprecated.
- {py:mod}`vptstools.vpts_csv`: This module contains - for each version of the VPTS-CSV exchange format - the
corresponding implementation which can be used to generate a `vp` or `vpts` DataFrame. For more information on how to
support a new version of the VPTS-CSV format, see [contributing docs](#new-vptscsv-version).
- {py:mod}`vptstools.s3`: This module contains the functions to manage the
aloft data repository](https://aloftdata.eu/browse/) S3 Bucket.
- {py:mod}`vptstools.odimh5`: This module extents the implementation of the original [odimh5 package](https://pypi.org/project/odimh5/) which is now deprecated.
- {py:mod}`vptstools.vpts_csv`: This module contains - for each version of the VPTS CSV exchange format - the corresponding implementation which can be used to generate a VP or VPTS DataFrame. For more information on how to support a new version of the VPTS CSV format, see [contributing docs](#new-vptscsv-version).
- {py:mod}`vptstools.s3`: This module contains the functions to manage the [Aloft data repository](https://aloftdata.eu/browse/) S3 bucket.

## CLI endpoints

In addition to using functions in Python scripts, two vptstools routines are available to be called from the command line
after installing the package:
In addition to using functions in Python scripts, two vptstools routines are available to be called from the command line after installing the package:

```{eval-rst}
.. include:: click.rst
```

## Development instructions

See [contributing](docs/contributing.md) for a detailed overview and set of guidelines. If familiar with `tox`,
the setup of a development environment boils down to:
See [contributing](docs/contributing.md) for a detailed overview and set of guidelines. If familiar with `tox`, the setup of a development environment boils down to:

```shell
tox -e dev # Create development environment with venv and register an ipykernel.
Expand Down
3 changes: 1 addition & 2 deletions setup.cfg
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
[metadata]
name = vptstools
description = Tools to work with vertical profile time series.
author = enram
author = INBO
license = MIT
license_files = LICENSE.txt
long_description = file: README.md
Expand All @@ -19,7 +19,6 @@ project_urls =
Changelog = https://enram.github.io/vptstools/changelog.html
Tracker = https://github.com/enram/vptstools/issues
# Download = "https://pypi.org/project/vptstools/#files"
Twitter = https://twitter.com/enram_network

# Change if running only on Windows, Mac or Linux (comma-separated)
platforms = any
Expand Down
12 changes: 6 additions & 6 deletions src/vptstools/bin/transfer_baltrad.py
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@
# Update reporting to SNS functionality
report_sns = partial(report_click_exception_to_sns,
aws_sns_topic=AWS_SNS_TOPIC,
subject=f"Transfer from Baltrad FTP to s3 bucket {DESTINATION_BUCKET} failed.",
subject=f"Transfer from Baltrad FTP to S3 bucket {DESTINATION_BUCKET} failed.",
profile_name=AWS_PROFILE,
region_name=AWS_REGION
)
Expand Down Expand Up @@ -56,7 +56,7 @@ def extract_metadata_from_filename(filename: str) -> tuple:
Parameters
----------
filename : str
Filename of a h5 incoming file from FTP
Filename of a HDF5 incoming file from FTP
"""
elems = filename.split("_")
radar_code = elems[0]
Expand All @@ -71,10 +71,10 @@ def extract_metadata_from_filename(filename: str) -> tuple:

@click.command(cls=catch_all_exceptions(click.Command, handler=report_sns)) # Add SNS-reporting to exception
def cli():
"""Sync files from Baltrad FTP server to the aloft s3 bucket.
"""Sync files from Baltrad FTP server to the Aloft S3 bucket.
This function connects via SFTP to the BALTRAD server, downloads the available ``vp`` files (``pvol`` gets ignored),
from the FTP server and upload the h5 file to the 'aloft' S3 bucket according to the defined folder path name
This function connects via SFTP to the BALTRAD server, downloads the available VP files (PVOL gets ignored),
from the FTP server and upload the HDF5 file to the Aloft S3 bucket according to the defined folder path name
convention. Existing files are ignored.
Designed to be executed via a simple scheduled job like cron or scheduled cloud function. Remark that
Expand Down Expand Up @@ -128,7 +128,7 @@ def cli():
for entry in sftp.listdir_attr():
if "_vp_" in entry.filename: # PVOLs and other files are ignored
click.echo(
f"{entry.filename} is a vp file, we need to consider it... "
f"{entry.filename} is a VP file, we need to consider it... "
)

radar_code, year, month_str, day_str = extract_metadata_from_filename(
Expand Down
50 changes: 25 additions & 25 deletions src/vptstools/bin/vph5_to_vpts.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
# Prepare SNS report handler
sns_report_exception = partial(report_click_exception_to_sns,
aws_sns_topic=AWS_SNS_TOPIC,
subject="Conversion from hdf5 files to daily/monthly vpts-files failed.",
subject="Conversion from HDF5 files to daily/monthly VPTS files failed.",
profile_name=AWS_PROFILE,
region_name=AWS_REGION
)
Expand All @@ -45,21 +45,21 @@
"modified_days_ago",
default=2,
type=int,
help="Range of h5 vp files to include, i.e. files modified between now and N"
"modified-days-ago. If 0, all h5 files in the bucket will be included.",
help="Range of HDF5 VP files to include, i.e. files modified between now and N"
"modified-days-ago. If 0, all HDF5 files in the bucket will be included.",
)
def cli(modified_days_ago):
"""Convert and aggregate h5 vp files to daily and monthly vpts-csv files on S3 bucket
"""Convert and aggregate HDF5 VP files to daily and monthly VPTS CSV files on S3 bucket
Check the latest modified
`ODIM h5 bird vp profile <https://github.com/adokter/vol2bird/wiki/ODIM-bird-profile-format-specification>`_ on the
aloft S3 bucket (as generated by `vol2bird <https://github.com/adokter/vol2bird>`_ and transferred using the
`ODIM HDF5 bird VP profile <https://github.com/adokter/vol2bird/wiki/ODIM-bird-profile-format-specification>`_ on the
Aloft S3 bucket (as generated by `vol2bird <https://github.com/adokter/vol2bird>`_ and transferred using the
:py:mod:`vpts.bin.transfer_baltrad` CLI routine). Using an
`s3 inventory bucket <https://docs.aws.amazon.com/AmazonS3/latest/userguide/storage-inventory.html>`_, check which
h5 files were recently added and convert those files from ODIM bird profile to the
`VPTS-CSV format <https://github.com/enram/vpts-csv>`_. Finally, upload the generated daily/monthly vpts files to S3.
HDF5 files were recently added and convert those files from ODIM bird profile to the
`VPTS CSV format <https://github.com/enram/vpts-csv>`_. Finally, upload the generated daily/monthly VPTS files to S3.
Besides, while scanning the s3 inventory to define the files to convert,
Besides, while scanning the S3 inventory to define the files to convert,
the CLI routine creates the ``coverage.csv`` file and uploads it to the bucket.
Configuration is loaded from the following environmental variables:
Expand Down Expand Up @@ -108,20 +108,20 @@ def cli(modified_days_ago):
f"s3://{S3_BUCKET}/coverage.csv", index=False, storage_options=storage_options
)

# Run vpts daily conversion for each radar-day with modified files
# Run VPTS daily conversion for each radar-day with modified files
inbo_s3 = s3fs.S3FileSystem(**storage_options)
# PATCH TO OVERCOME RECURSIVE s3fs in wrapped context
session = boto3.Session(**boto3_options)
s3_client = session.client("s3")

click.echo(f"Create {days_to_create_vpts.shape[0]} daily vpts files.")
click.echo(f"Create {days_to_create_vpts.shape[0]} daily VPTS files.")
for j, daily_vpts in enumerate(days_to_create_vpts["directory"]):
try:
# Enlist files of the day to rerun (all the given day)
source, _, radar_code, year, month, day = daily_vpts
odim_path = OdimFilePath(source, radar_code, "vp", year, month, day)
odim5_files = inbo_s3.ls(f"{S3_BUCKET}/{odim_path.s3_folder_path_h5}")
click.echo(f"Create daily vpts file {odim_path.s3_file_path_daily_vpts}.")
click.echo(f"Create daily VPTS file {odim_path.s3_file_path_daily_vpts}.")
# - create tempdir
temp_folder_path = Path(tempfile.mkdtemp())

Expand All @@ -139,13 +139,13 @@ def cli(modified_days_ago):
)
h5_file_local_paths.append(h5_local_path)

# - run vpts on all locally downloaded files
# - run VPTS on all locally downloaded files
df_vpts = vpts(h5_file_local_paths)

# - save vpts file locally
# - save VPTS file locally
vpts_to_csv(df_vpts, temp_folder_path / odim_path.daily_vpts_file_name)

# - copy vpts file to S3
# - copy VPTS file to S3
inbo_s3.put(
str(temp_folder_path / odim_path.daily_vpts_file_name),
f"{S3_BUCKET}/{odim_path.s3_file_path_daily_vpts}",
Expand All @@ -154,12 +154,12 @@ def cli(modified_days_ago):
# - remove tempdir with local files
shutil.rmtree(temp_folder_path)
except Exception as exc:
click.echo(f"[WARNING] - During conversion from h5 files of {source}/{radar_code} at "
f"{year}-{month}-{day} to daily vpts file, the following error occurred: {exc}.")
click.echo(f"[WARNING] - During conversion from HDF5 files of {source}/{radar_code} at "
f"{year}-{month}-{day} to daily VPTS file, the following error occurred: {exc}.")

click.echo("Finished creating daily vpts files.")
click.echo("Finished creating daily VPTS files.")

# Run vpts monthly conversion for each radar-day with modified files
# Run VPTS monthly conversion for each radar-day with modified files
# TODO - abstract monthly procedure to separate functionality
months_to_create_vpts = days_to_create_vpts
months_to_create_vpts["directory"] = months_to_create_vpts["directory"].apply(
Expand All @@ -169,13 +169,13 @@ def cli(modified_days_ago):
months_to_create_vpts.groupby("directory").size().reset_index()
)

click.echo(f"Create {months_to_create_vpts.shape[0]} monthly vpts files.")
click.echo(f"Create {months_to_create_vpts.shape[0]} monthly VPTS files.")
for j, monthly_vpts in enumerate(months_to_create_vpts["directory"]):
try:
source, _, radar_code, year, month = monthly_vpts
odim_path = OdimFilePath(source, radar_code, "vp", year, month, "01")

click.echo(f"Create monthly vpts file {odim_path.s3_file_path_monthly_vpts}.")
click.echo(f"Create monthly VPTS file {odim_path.s3_file_path_monthly_vpts}.")
file_list = inbo_s3.ls(f"{S3_BUCKET}/{odim_path.s3_path_setup('daily')}")
files_to_concat = sorted(
[
Expand All @@ -202,11 +202,11 @@ def cli(modified_days_ago):
storage_options=storage_options,
)
except Exception as exc:
click.echo(f"[WARNING] - During conversion from h5 files of {source}/{radar_code} at "
f"{year}-{month}-{day} to monthly vpts file, the following error occurred: {exc}.")
click.echo(f"[WARNING] - During conversion from HDF5 files of {source}/{radar_code} at "
f"{year}-{month}-{day} to monthly VPTS file, the following error occurred: {exc}.")

click.echo("Finished creating monthly vpts files.")
click.echo("Finished vpts update procedure.")
click.echo("Finished creating monthly VPTS files.")
click.echo("Finished VPTS update procedure.")


if __name__ == "__main__":
Expand Down
2 changes: 1 addition & 1 deletion src/vptstools/odimh5.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@ class ODIMReader(object):
"""Read ODIM (HDF5) files with context manager
Should be used with the "with" statement (context manager) to
properly close the h5 file.
properly close the HDF5 file.
Attributes
----------
Expand Down
Loading

0 comments on commit 324a180

Please sign in to comment.