Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat io tests #21

Merged
merged 39 commits into from
Nov 21, 2023
Merged
Show file tree
Hide file tree
Changes from 37 commits
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
b73f324
fix tests after refactor
sphamba Oct 26, 2023
77e4df6
add gpm_api_test_data as submodule
sphamba Oct 26, 2023
0cf01ad
add granule data
sphamba Oct 27, 2023
1127a87
Merge branch 'main' into fix-tests
sphamba Oct 30, 2023
4bb451e
fix some download tests
sphamba Oct 30, 2023
a441759
add granule data pull in GH action
sphamba Oct 30, 2023
c93f62a
add hdf5 file locking in gh action tests
sphamba Oct 30, 2023
3650cac
add tests for new methods of io/checks
sphamba Oct 30, 2023
42ccd5c
add tests for new methods of io/filter
sphamba Oct 31, 2023
5405ca5
add git lfs in github actions
sphamba Oct 31, 2023
9921010
add manual lfs pull in GH action
sphamba Oct 31, 2023
0f6ed22
add tests for io/info
sphamba Oct 31, 2023
d1f3a5e
remove unused test data generation scripts
sphamba Oct 31, 2023
ea2359d
fix tests for io/filter with updated conftest
sphamba Oct 31, 2023
7c32bba
remove lfs usage for test data
sphamba Nov 2, 2023
aba73e5
add tests for io/find local
sphamba Nov 6, 2023
b2f30f1
add tests for io/find pps
sphamba Nov 6, 2023
4723de7
remove io tests relying on network
sphamba Nov 6, 2023
f36a5b8
run io/find tests on all products
sphamba Nov 6, 2023
1be2640
fix (temporary) of io/download test
sphamba Nov 6, 2023
94cc5d0
fix test typings for 3.8
sphamba Nov 6, 2023
29c1fb5
fix url slashes in windows for pps
sphamba Nov 6, 2023
5d5793d
fix pps test on windows (slashes in url)
sphamba Nov 6, 2023
2386c9d
fix test granule files in python 3.8
sphamba Nov 6, 2023
6dc27b7
add tests for io/find ges_disc
sphamba Nov 6, 2023
d6fa57d
put kwargs in test functions calls
sphamba Nov 9, 2023
922ecba
lint
sphamba Nov 9, 2023
704a82f
add test io/filter case over two days
sphamba Nov 9, 2023
551ceac
split dataset test_granule finalize test
sphamba Nov 9, 2023
35237b4
fix: set granule data path relative to root
sphamba Nov 9, 2023
253eaac
fix caught exception in io/info
sphamba Nov 14, 2023
be2b26e
add io/find/find_daily_filepath test
sphamba Nov 14, 2023
02430b9
add non-failling asserts in tests and split tests
sphamba Nov 16, 2023
b7d476e
add io/download tests (and pps, ges_disc)
sphamba Nov 16, 2023
557b355
add data integrity test on real hdf5 file
sphamba Nov 16, 2023
8b46bf5
move test for io/checks to avoid conflicts
sphamba Nov 20, 2023
d83acf8
remove io/checks is_empty method
sphamba Nov 20, 2023
724e546
fix: remove unused variables and inexistent returns
sphamba Nov 21, 2023
6437e4f
lint
sphamba Nov 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion .coveragerc
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,13 @@ omit =
*dev*
*docs*
*tutorials*
gpm_api/tests/*
gpm_api/bucket/*
gpm_api/cli/*
gpm_api/encoding/*
gpm_api/etc/*
gpm_api/retrieval/*
gpm_api/tests/*
gpm_api/_version.py

[report]
exclude_lines =
Expand Down
4 changes: 4 additions & 0 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,8 @@ jobs:

steps:
- uses: actions/checkout@v3
with:
submodules: 'recursive'

- name: Set up micromamba
uses: mamba-org/setup-micromamba@v1
Expand All @@ -48,6 +50,8 @@ jobs:
- name: Test with pytest
run: |
pytest
env:
HDF5_USE_FILE_LOCKING: FALSE

- name: Upload coverage reports to Codecov
uses: codecov/codecov-action@v3
Expand Down
3 changes: 3 additions & 0 deletions .gitmodules
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
[submodule "gpm_api/tests/data"]
path = gpm_api/tests/data
url = [email protected]:ghiggi/gpm_api_test_data.git
2 changes: 1 addition & 1 deletion gpm_api/dataset/datatree.py
Original file line number Diff line number Diff line change
Expand Up @@ -72,7 +72,7 @@ def _identify_error(e, filepath):
msg = f"The file {filepath} is corrupted and is being removed. It must be redownload."
raise ValueError(msg)
elif "[Errno -51] NetCDF: Unknown file format" in error_str:
msg = "The GPM-API is not currently able to read the file format of {filepath}. Report the issue please."
msg = f"The GPM-API is not currently able to read the file format of {filepath}. Report the issue please."
raise ValueError(msg)
elif "lock" in error_str:
msg = "Unfortunately, HDF locking is occurring."
Expand Down
4 changes: 4 additions & 0 deletions gpm_api/etc/product_def.yml
Original file line number Diff line number Diff line change
Expand Up @@ -1667,6 +1667,8 @@
pps_nrt_dir: null
pps_rs_dir: radar
ges_disc_dir: null
start_time: null
end_time: null
available_versions:
- 4
scan_modes:
Expand All @@ -1680,6 +1682,8 @@
pps_nrt_dir: null
pps_rs_dir: radar
ges_disc_dir: null
start_time: null
end_time: null
available_versions:
- 4
scan_modes:
Expand Down
10 changes: 1 addition & 9 deletions gpm_api/io/checks.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python3

Check notice on line 1 in gpm_api/io/checks.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Overall Code Complexity

The mean cyclomatic complexity increases from 5.40 to 5.89, threshold = 4. This file has many conditional statements (e.g. if, for, while) across its implementation, leading to lower code health. Avoid adding more conditionals.
"""
Created on Sun Aug 14 20:02:18 2022
@author: ghiggi
Expand All @@ -9,14 +9,6 @@
import numpy as np


def is_not_empty(x):
return bool(x)


def is_empty(x):
return not x


def check_base_dir(base_dir):
"""Check base directory path.

Expand Down Expand Up @@ -192,7 +184,7 @@
if isinstance(time, np.ndarray):
if np.issubdtype(time.dtype, np.datetime64):
if time.size == 1:
time = time.astype("datetime64[s]").tolist()
time = time[0].astype("datetime64[s]").tolist()
else:
raise ValueError("Expecting a single timestep!")
else:
Expand Down
9 changes: 5 additions & 4 deletions gpm_api/io/download.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python3

Check notice on line 1 in gpm_api/io/download.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Lines of Code in a Single File

The lines of code increases from 727 to 729, improve code health by reducing it to 600. The number of Lines of Code in a single file. More Lines of Code lowers the code health.

Check notice on line 1 in gpm_api/io/download.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ Getting worse: Overall Code Complexity

The mean cyclomatic complexity increases from 4.65 to 4.74, threshold = 4. This file has many conditional statements (e.g. if, for, while) across its implementation, leading to lower code health. Avoid adding more conditionals.
"""
Created on Mon Aug 15 00:18:33 2022

Expand Down Expand Up @@ -30,7 +30,6 @@
check_remote_storage,
check_start_end_time,
check_valid_time_request,
is_empty,
)
from gpm_api.io.data_integrity import (
check_archive_integrity,
Expand Down Expand Up @@ -267,6 +266,8 @@
"pps": {"wget": wget_pps_cmd, "curl": curl_pps_cmd},
"ges_disc": {"wget": wget_ges_disc_cmd, "curl": curl_ges_disc_cmd},
}
if transfer_tool not in dict_fun[storage].keys():
raise NotImplementedError(f"Unsupported transfer tool: {transfer_tool}")
func = dict_fun[storage][transfer_tool]
return func

Expand Down Expand Up @@ -513,7 +514,7 @@
remote_filepaths=remote_filepaths,
force_download=force_download,
)
if is_empty(new_remote_filepaths):
if len(new_remote_filepaths) == 0:
if verbose:
print(f"The requested files are already on disk at {local_filepaths}.")
return None
Expand Down Expand Up @@ -716,7 +717,7 @@
)
# -------------------------------------------------------------------------.
## If no file to retrieve on NASA PPS, return None
if is_empty(remote_filepaths):
if len(remote_filepaths) == 0:
if warn_missing_files:
msg = f"No data found on PPS on date {date} for product {product}"
warnings.warn(msg, GPMDownloadWarning)
Expand All @@ -735,7 +736,7 @@
remote_filepaths=remote_filepaths,
force_download=force_download,
)
if is_empty(remote_filepaths):
if len(remote_filepaths) == 0:
return [-1], available_version # flag for already on disk

# -------------------------------------------------------------------------.
Expand Down
5 changes: 0 additions & 5 deletions gpm_api/io/filter.py
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
#!/usr/bin/env python3

Check notice on line 1 in gpm_api/io/filter.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ Getting better: Overall Code Complexity

The mean cyclomatic complexity decreases from 5.29 to 5.14, threshold = 4. This file has many conditional statements (e.g. if, for, while) across its implementation, leading to lower code health. Avoid adding more conditionals.
"""
Created on Thu Oct 13 11:30:46 2022

Expand Down Expand Up @@ -272,11 +272,6 @@
# - Retrieve start_time and end_time of GPM granules
l_start_time, l_end_time = get_start_end_time_from_filepaths(filepaths)

# -------------------------------------------------------------------------.
# Check file are available
if len(l_start_time) == 0:
return []

# -------------------------------------------------------------------------.
# Select granules with data within the start and end time
# - Case 1
Expand Down
9 changes: 4 additions & 5 deletions gpm_api/io/find.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
check_start_end_time,
check_storage,
check_valid_time_request,
is_empty,
)
from gpm_api.io.filter import filter_filepaths
from gpm_api.io.ges_disc import get_gesdisc_daily_filepaths
Expand Down Expand Up @@ -96,7 +95,7 @@
return filepaths, files_version


def ensure_valid_start_date(start_date, product):
def _ensure_valid_start_date(start_date, product):

Check notice on line 98 in gpm_api/io/find.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

✅ No longer an issue: Complex Method

ensure_valid_start_date is no longer above the threshold for cyclomatic complexity. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.

Check notice on line 98 in gpm_api/io/find.py

View check run for this annotation

CodeScene Delta Analysis / CodeScene Cloud Delta Analysis (main)

ℹ New issue: Complex Method

_ensure_valid_start_date has a cyclomatic complexity of 9, threshold = 9. This function has many conditional statements (e.g. if, for, while), leading to lower code health. Avoid adding more conditionals and code to it without refactoring.
sphamba marked this conversation as resolved.
Show resolved Hide resolved
"""Ensure that the product directory exists for start_date."""
if product == "2A-SAPHIR-MT1-CLIM":
min_start_date = "2011-10-13 00:00:00"
Expand Down Expand Up @@ -169,7 +168,7 @@
version=version,
verbose=verbose,
)
if is_empty(filepaths):
if len(filepaths) == 0:
if storage == "local" and verbose:
version_str = str(int(version))
print(
Expand All @@ -188,7 +187,7 @@
start_time=start_time,
end_time=end_time,
)
if is_empty(filepaths):
if len(filepaths) == 0:
return [], []

## -----------------------------------------------------------------------.
Expand Down Expand Up @@ -250,7 +249,7 @@
# --> Example granules starting at 23:XX:XX in the day before and extending to 01:XX:XX
start_date = datetime.datetime(start_time.year, start_time.month, start_time.day)
start_date = start_date - datetime.timedelta(days=1)
start_date = ensure_valid_start_date(start_date=start_date, product=product)
start_date = _ensure_valid_start_date(start_date=start_date, product=product)
end_date = datetime.datetime(end_time.year, end_time.month, end_time.day)
date_range = pd.date_range(start=start_date, end=end_date, freq="D")
dates = list(date_range.to_pydatetime())
Expand Down
23 changes: 12 additions & 11 deletions gpm_api/io/ges_disc.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
@author: ghiggi
"""
import datetime
import os
import re
import subprocess

Expand Down Expand Up @@ -42,7 +41,7 @@ def _get_href_value(input_string):
return href_value


def _get_gesc_disc_list_path(url):
def _get_ges_disc_list_path(url):
sphamba marked this conversation as resolved.
Show resolved Hide resolved
# Retrieve url content
# - If it returns something, means url is correct
wget_output = _get_ges_disc_url_content(url)
Expand All @@ -51,7 +50,7 @@ def _get_gesc_disc_list_path(url):
list_content = [s for s in list_content if s != ""]
if len(list_content) == 0:
raise ValueError(f"The GES DISC {url} directory is empty.")
list_path = [os.path.join(url, s) for s in list_content]
list_path = [f"{url}/{s}" for s in list_content]
return list_path


Expand All @@ -72,11 +71,11 @@ def _get_gesc_disc_list_path(url):
def _get_ges_disc_server(product):
# TRMM
if is_trmm_product(product):
ges_disc_base_url = "https://disc2.gesdisc.eosdis.nasa.gov/data/"
ges_disc_base_url = "https://disc2.gesdisc.eosdis.nasa.gov/data"

# GPM
else:
ges_disc_base_url = "https://gpm1.gesdisc.eosdis.nasa.gov/data"
# ges_disc_base_url = "https://gpm1.gesdisc.eosdis.nasa.gov/data"
ges_disc_base_url = "https://gpm2.gesdisc.eosdis.nasa.gov/data"
return ges_disc_base_url

Expand Down Expand Up @@ -114,9 +113,11 @@ def _get_ges_disc_product_directory_tree(product, date, version):

# Specify the directory tree
# --> TODO: currently specified only for L1 and L2
directory_tree = os.path.join(
folder_name,
datetime.datetime.strftime(date, "%Y/%j"),
directory_tree = "/".join(
[
folder_name,
datetime.datetime.strftime(date, "%Y/%j"),
]
)
return directory_tree

Expand Down Expand Up @@ -148,7 +149,7 @@ def get_ges_disc_product_directory(product, date, version):
product=product, date=date, version=version
)
# Define product directory where data are listed
url_product_dir = os.path.join(url_server, dir_structure)
url_product_dir = f"{url_server}/{dir_structure}"
return url_product_dir


Expand Down Expand Up @@ -178,7 +179,7 @@ def _get_gesdisc_file_list(url_product_dir, product, date, version, verbose=True
Default is False. Whether to specify when data are not available for a specific date.
"""
try:
filepaths = _get_gesc_disc_list_path(url_product_dir)
filepaths = _get_ges_disc_list_path(url_product_dir)
except Exception as e:
# If url not exist, raise an error
if "was not found on the GES DISC server" in str(e):
Expand Down Expand Up @@ -248,5 +249,5 @@ def define_gesdisc_filepath(product, product_type, date, version, filename):
# Retrieve product directory url
url_product_dir = get_ges_disc_product_directory(product=product, date=date, version=version)
# Define GES DISC filepath
fpath = os.path.join(url_product_dir, filename)
fpath = f"{url_product_dir}/{filename}"
return fpath
5 changes: 1 addition & 4 deletions gpm_api/io/info.py
Original file line number Diff line number Diff line change
Expand Up @@ -116,10 +116,7 @@ def _get_info_from_filename(fname):

# Add product information
# - ATTENTION: can not be inferred for products not defined in etc/product.yml
try:
info_dict["product"] = get_product_from_filepath(fname)
except Exception:
pass
info_dict["product"] = get_product_from_filepath(fname)

# Return info dictionary
return info_dict
Expand Down
33 changes: 18 additions & 15 deletions gpm_api/io/pps.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,6 @@
@author: ghiggi
"""
import datetime
import os
import subprocess

from dateutil.relativedelta import relativedelta
Expand Down Expand Up @@ -77,7 +76,7 @@ def _get_pps_nrt_product_dir(product, date):
folder_name = _get_pps_nrt_product_folder_name(product)
# Specify the directory tree
if product in available_products(product_type="NRT", product_category="IMERG"):
directory_tree = os.path.join(folder_name, datetime.datetime.strftime(date, "%Y%m"))
directory_tree = f"{folder_name}/{datetime.datetime.strftime(date, '%Y%m')}"
else:
directory_tree = folder_name
return directory_tree
Expand All @@ -104,20 +103,24 @@ def _get_pps_rs_product_dir(product, date, version):

# Specify the directory tree for current RS version
if version == 7:
directory_tree = os.path.join(
"gpmdata",
datetime.datetime.strftime(date, "%Y/%m/%d"),
folder_name,
directory_tree = "/".join(
[
"gpmdata",
datetime.datetime.strftime(date, "%Y/%m/%d"),
folder_name,
]
)

# Specify the directory tree for old RS version
else: # version in [4, 5, 6]:
version_str = "V0" + str(int(version))
directory_tree = os.path.join(
"gpmallversions",
version_str,
datetime.datetime.strftime(date, "%Y/%m/%d"),
folder_name,
directory_tree = "/".join(
[
"gpmallversions",
version_str,
datetime.datetime.strftime(date, "%Y/%m/%d"),
folder_name,
]
)

# Return the directory tree
Expand Down Expand Up @@ -194,7 +197,7 @@ def get_pps_product_directory(product, product_type, date, version, server_type)
product=product, product_type=product_type, date=date, version=version
)
# Define product directory where data are listed
url_product_dir = os.path.join(url_server, dir_structure)
url_product_dir = f"{url_server}/{dir_structure}"
return url_product_dir


Expand Down Expand Up @@ -306,9 +309,9 @@ def get_pps_daily_filepaths(product, product_type, date, version, verbose=True):
verbose=verbose,
)
# Define the complete url of pps filepaths
# - Need to remove the starting "/" to each filepath
# Filepaths start with a "/"
url_data_server = _get_pps_data_server(product_type)
filepaths = [os.path.join(url_data_server, filepath[1:]) for filepath in filepaths]
filepaths = [f"{url_data_server}{filepath}" for filepath in filepaths]
return filepaths


Expand All @@ -323,7 +326,7 @@ def define_pps_filepath(product, product_type, date, version, filename):
server_type="data",
)
# Define PPS filepath
fpath = os.path.join(url_product_dir, filename)
fpath = f"{url_product_dir}/{filename}"
return fpath


Expand Down
5 changes: 3 additions & 2 deletions gpm_api/io/products.py
Original file line number Diff line number Diff line change
Expand Up @@ -99,8 +99,9 @@ def get_product_start_time(product):

def get_product_end_time(product):
"""Provide the product end_time."""
end_time = get_product_info(product)["end_time"]
end_time = datetime.datetime.utcnow()
end_time = get_info_dict()[product]["end_time"]
if end_time is None:
sphamba marked this conversation as resolved.
Show resolved Hide resolved
end_time = datetime.datetime.utcnow()
return end_time


Expand Down
Loading