Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for input retrieval from CDSE #101

Open
wants to merge 53 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
ce5fb25
fix of GPF graphs from pull request by Aleksandr Tulenkov
martin-boettcher Oct 4, 2024
366b40b
make Dockerfile work on Ubuntu 20
martin-boettcher Oct 4, 2024
c30ed83
add support for CDSE as replacement for SciHub
martin-boettcher Oct 4, 2024
cc5fba9
ignore IDEA ide files
martin-boettcher Oct 4, 2024
508a5e6
add first application script for pre-processing
martin-boettcher Oct 8, 2024
cc708ec
add first application script for pre-processing
martin-boettcher Oct 8, 2024
fb44518
add first application script for pre-processing
martin-boettcher Oct 8, 2024
ca4b547
add first application script for pre-processing
martin-boettcher Oct 8, 2024
98867ce
add first application script for pre-processing
martin-boettcher Oct 8, 2024
98ff312
add first application script for pre-processing
martin-boettcher Oct 8, 2024
3babde9
add first application script for pre-processing
martin-boettcher Oct 8, 2024
c8ead14
Add CWL file for Notebook 1 workflow
pont-us Oct 8, 2024
d1030b9
attempt to fix conversion from DIMAP to TIFF
martin-boettcher Oct 9, 2024
990f8c0
avoid access to CDSE if input is already available
martin-boettcher Oct 9, 2024
5f5bd7d
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
d1e4963
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
45f1f63
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
8b7d2df
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
a5edd99
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
bf13b78
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
343289e
run in mounted external dir, create input dir structure, attempt to g…
martin-boettcher Oct 9, 2024
0a37ca5
generate tiff, place DEMs in working dir outside of container
martin-boettcher Oct 9, 2024
2eaef49
Add STAC input and output to preprocessing.py
pont-us Oct 11, 2024
c7f77b4
preprocessing.py: reformat code
pont-us Oct 30, 2024
eea0796
Refactor and tidy up preprocessing.py
pont-us Oct 30, 2024
254befb
preprocessing.py: more refactoring
pont-us Oct 30, 2024
85d6dae
preprocessing.py: improve logging
pont-us Oct 30, 2024
a908e70
preprocessing.py: output to CWD, not /home/ost/shared
pont-us Oct 30, 2024
6fe38a3
preprocessing.py: output TIFF, not DIMAP
pont-us Oct 30, 2024
80228c2
preprocessing: copy input data if link fails
pont-us Nov 5, 2024
7cd322a
preprocessing: add --dry-run option for testing
pont-us Nov 5, 2024
4f89480
preprocessing: fix boolean argument handling
pont-us Nov 5, 2024
ba545ba
preprocessing: fix a typo
pont-us Nov 5, 2024
18cb512
Several improvements to preprocessing.py
pont-us Nov 5, 2024
d87dfa4
preprocessing: explicitly specify output STAC root
pont-us Nov 5, 2024
05345c6
Updates to Dockerfile and context
pont-us Nov 6, 2024
060e3f6
Updates to CWL file
pont-us Nov 6, 2024
829ed86
Dockerfile: improve wget progress for OTB download
pont-us Nov 6, 2024
9dc9716
Add an example for Application Package execution
pont-us Nov 6, 2024
0c69345
Merge pull request #1 from bcdev/version2
pont-us Nov 6, 2024
e5bda67
Dockerfile: build from bcdev repo default branch
pont-us Nov 6, 2024
e4ddfb3
Start adding support for non-zipped input
pont-us Dec 12, 2024
74b79c5
Preprocessor: minor refactoring
pont-us Dec 13, 2024
d794102
Preprocessor: handle SAFE directory input
pont-us Dec 13, 2024
e92c93e
Preprocessor: add a logging message
pont-us Dec 16, 2024
74bbec4
Add more logging to s1scene
pont-us Dec 17, 2024
fa31ffc
Some updates to CWL file
pont-us Dec 18, 2024
6dc44d8
Dockerfile: add ost_branch argument
pont-us Dec 18, 2024
52ada17
Merge pull request #2 from bcdev/pont-safe-directory
pont-us Jan 3, 2025
a672a65
CWL file: set Docker image tag to "version3"
pont-us Jan 3, 2025
2a7f536
Merge branch 'main' of github.com:bcdev/OpenSarToolkit into main
pont-us Jan 3, 2025
8ae2cb3
proposed changes to CWL - remove cdse and rename workflow ID
simonevaccari Jan 15, 2025
0db963d
Merge pull request #3 from simonevaccari/main
pont-us Jan 20, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -134,3 +134,4 @@ dmypy.json

# IDE
.vscode
.idea
22 changes: 16 additions & 6 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,13 @@ ENV TBX="esa-snap_sentinel_unix_${TBX_VERSION}_${TBX_SUBVERSION}.sh" \
HOME=/home/ost \
PATH=$PATH:/home/ost/programs/snap/bin:/home/ost/programs/OTB-${OTB_VERSION}-Linux64/bin

RUN apt-get update && apt-get install -yq wget libquadmath0

RUN wget http://archive.ubuntu.com/ubuntu/pool/universe/g/gcc-6/gcc-6-base_6.4.0-17ubuntu1_amd64.deb && \
dpkg -i gcc-6-base_6.4.0-17ubuntu1_amd64.deb && \
wget http://archive.ubuntu.com/ubuntu/pool/universe/g/gcc-6/libgfortran3_6.4.0-17ubuntu1_amd64.deb && \
dpkg -i libgfortran3_6.4.0-17ubuntu1_amd64.deb

# install all dependencies
RUN groupadd -r ost && \
useradd -r -g ost ost && \
Expand All @@ -29,7 +36,6 @@ RUN groupadd -r ost && \
libgdal-dev \
python3-gdal \
libspatialindex-dev \
libgfortran3 \
wget \
unzip \
imagemagick \
Expand All @@ -46,7 +52,7 @@ RUN alias python=python3 && \
rm $TBX && \
rm snap.varfile && \
cd /home/ost/programs && \
wget https://www.orfeo-toolbox.org/packages/${OTB} && \
wget https://www.orfeo-toolbox.org/packages/archives/OTB/${OTB} && \
chmod +x $OTB && \
./${OTB} && \
rm -f OTB-${OTB_VERSION}-Linux64.run
Expand All @@ -60,11 +66,15 @@ RUN /home/ost/programs/snap/bin/snap --nosplash --nogui --modules --update-all 2
# set usable memory to 12G
RUN echo "-Xmx12G" > /home/ost/programs/snap/bin/gpt.vmoptions

COPY requirements.txt $HOME

# get OST and tutorials
RUN python3 -m pip install git+https://github.com/ESA-PhiLab/OpenSarToolkit.git && \
git clone https://github.com/ESA-PhiLab/OST_Notebooks && \
jupyter labextension install @jupyter-widgets/jupyterlab-manager && \
jupyter nbextension enable --py widgetsnbextension
RUN python3 -m pip install git+https://github.com/ESA-PhiLab/OpenSarToolkit.git -c requirements.txt && \
git clone https://github.com/ESA-PhiLab/OST_Notebooks

#RUN jupyter labextension install @jupyter-widgets/jupyterlab-manager
#RUN jupyter nbextension enable --py widgetsnbextension
RUN pip install widgetsnbextension

EXPOSE 8888
CMD jupyter lab --ip='0.0.0.0' --port=8888 --no-browser --allow-root
9 changes: 9 additions & 0 deletions examples/application-package/README
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
The contents of this directory support testing of the OpenSarToolkit
CWL Workflow as an OGC EO Application Package. The JSON files
(which in a real deployment would be generated by the EOAP platform's
data stage-in process) provide a STAC catalogue for a specified input
item. The file input.yaml specifies the parameters for the workflow.
After setting the parameters appropriately in input.yaml, the workflow
can be executed in the manner of an Application Package by running

cwltool opensar.cwl inputs.cwl
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
{
"stac_version": "1.1.0",
"type": "Feature",
"id": "S1A_IW_GRDH_1SDV_20221004T164316_20221004T164341_045295_056A44_13CB",
"geometry": null,
"properties": {
"datetime": "2022-10-04T16:43:16Z",
"platform": "sentinel-1a",
"constellation": "sentinel-1"
},
"assets": {
"GRD": {
"type": "application/zip",
"roles": [ "data" ],
"href": "S1A_IW_GRDH_1SDV_20221004T164316_20221004T164341_045295_056A44_13CB.zip"
}
},
"links": [
{
"rel": "parent",
"href": "../catalog.json"
}
]
}
13 changes: 13 additions & 0 deletions examples/application-package/SAR/GRD/2022/10/04/catalog.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
{
"stac_version": "1.1.0",
"id": "catalog",
"type": "Catalog",
"description": "Root catalog",
"links": [
{
"type": "application/json",
"rel": "item",
"href": "./S1A_IW_GRDH_1SDV_20221004T164316_20221004T164341_045295_056A44_13CB.json"
}
]
}
11 changes: 11 additions & 0 deletions examples/application-package/inputs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
---
input:
class: Directory
path: /data/opensar/SAR/GRD/2022/10/04
resolution: 100
ard-type: Earth-Engine
with-speckle-filter: false
resampling-method: BILINEAR_INTERPOLATION
cdse-user: <USERNAME>
cdse-password: <PASSWORD>
dry-run: true
Empty file added ost/app/__init__.py
Empty file.
281 changes: 281 additions & 0 deletions ost/app/preprocessing.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
from datetime import datetime
import sys
import os
import pathlib
from pathlib import Path
import pprint
import logging
import shutil

from ost import Sentinel1Scene
import click
import pystac
import rasterio

LOGGER = logging.getLogger(__name__)


@click.command()
@click.argument("input_", metavar="input")
@click.option("--resolution", default=100)
@click.option(
"--ard-type",
type=click.Choice(["OST_GTC", "OST-RTC", "CEOS", "Earth-Engine"]),
default="Earth-Engine",
)
@click.option("--with-speckle-filter", is_flag=True, default=False)
@click.option(
"--resampling-method",
type=click.Choice(["BILINEAR_INTERPOLATION", "BICUBIC_INTERPOLATION"]),
default="BILINEAR_INTERPOLATION",
)
@click.option("--cdse-user", default="dummy")
@click.option("--cdse-password", default="dummy")
@click.option(
"--dry-run", is_flag=True, default=False,
help="Skip processing and write a placeholder output file instead. "
"Useful for testing."
)
def run(
input_: str,
resolution: int,
ard_type: str,
with_speckle_filter: bool,
resampling_method: str,
cdse_user: str,
cdse_password: str,
dry_run: bool
):
horizontal_line = "-" * 79 # Used in log output

logging.basicConfig(level=logging.INFO)
# from ost.helpers.settings import set_log_level
# import logging
# set_log_level(logging.DEBUG)

scene_presets = {
# very first IW (VV/VH) S1 image available over Istanbul/Turkey
# NOTE:only available via ASF data mirror
"istanbul": "S1A_IW_GRDH_1SDV_20141003T040550_20141003T040619_002660_002F64_EC04",
# ???
"unknown": "S1A_IW_GRDH_1SDV_20221004T164316_20221004T164341_045295_056A44_13CB",
# IW scene (dual-polarised HH/HV) over Norway/Spitzbergen
"spitzbergen": "S1B_IW_GRDH_1SDH_20200325T150411_20200325T150436_020850_02789D_2B85",
# IW scene (single-polarised VV) over Ecuadorian Amazon
"ecuador": "S1A_IW_GRDH_1SSV_20150205T232009_20150205T232034_004494_00583A_1C80",
# EW scene (dual-polarised VV/VH) over Azores
# (needs a different DEM,see ARD parameters below)
"azores": "S1B_EW_GRDM_1SDV_20200303T193150_20200303T193250_020532_026E82_5CE9",
# EW scene (dual-polarised HH/HV) over Greenland
"greenland": "S1B_EW_GRDM_1SDH_20200511T205319_20200511T205419_021539_028E4E_697E",
# Stripmap mode S5 scene (dual-polarised VV/VH) over Germany
"germany": "S1B_S5_GRDH_1SDV_20170104T052519_20170104T052548_003694_006587_86AB",
}

# "When executed, the Application working directory is also the Application
# output directory. Any file created by the Application should be added
# under that directory." -- https://docs.ogc.org/bp/20-089r1.html#toc20
output_dir = os.getcwd()
output_path = Path(output_dir)

# We expect input to be the path to a directory containing a STAC catalog
# containing an item which contains an asset for either a zip file
# (zipped SAFE archive) or a SAFE manifest (which is used to determine
# the location of a non-zipped SAFE directory). The returned path is
# either the zip file or the SAFE directory
input_path = get_input_path_from_stac(input_)

# We assume that any file input path is a zip, and any non-file input
# path is a SAFE directory.
zip_input = pathlib.Path(input_path).is_file()
LOGGER.info(f"Input is {'zip' if zip_input else 'SAFE directory'}")

scene_id = input_path[input_path.rfind("/") + 1 : input_path.rfind(".")]
if zip_input:
copy_zip_input(input_path, output_dir, scene_id)

# Instantiate a Sentinel1Scene from the specified scene identifier
s1 = Sentinel1Scene(scene_id)
s1.info() # write scene summary information to stdout
if zip_input:
s1.download(
output_path, mirror="5", uname=cdse_user, pword=cdse_password
)

single_ard = s1.ard_parameters["single_ARD"]
# Set ARD type. Choices: "OST_GTC", "OST-RTC", "CEOS", "Earth Engine"
s1.update_ard_parameters(ard_type)
LOGGER.info(
f"{horizontal_line}\n"
f"Dictionary of Earth Engine ARD parameters:\n"
f"{horizontal_line}\n"
f"{pprint.pformat(single_ard)}\n"
f"{horizontal_line}"
)

# Customize ARD parameters
single_ard["resolution"] = resolution
single_ard["remove_speckle"] = with_speckle_filter
single_ard["dem"][
"image_resampling"
] = resampling_method # default: BICUBIC_INTERPOLATION
single_ard["to_tif"] = True
# single_ard['product_type'] = 'RTC-gamma0'

# uncomment this for the Azores EW scene
# s1.ard_parameters['single_ARD']['dem']['dem_name'] = 'GETASSE30'

LOGGER.info(
f"{horizontal_line}\n"
"Dictionary of customized ARD parameters for final scene processing:\n"
f"{horizontal_line}\n"
f"{pprint.pformat(single_ard)}\n"
f"{horizontal_line}"
)

if dry_run:
tiff_path = output_path / f"{s1.start_date}.tif"
LOGGER.info(f"Dry run -- creating dummy output at {tiff_path}")
create_dummy_tiff(tiff_path)
else:
LOGGER.info(f"Creating ARD at {output_path}")
# create_ard seems to be a prerequisite for create_rgb.
if zip_input:
s1.create_ard(
infile=s1.get_path(output_path), out_dir=output_path, overwrite=True
)
else:
s1.create_ard(
infile=input_path, out_dir=output_path, overwrite=True
)

LOGGER.info(f"Path to newly created ARD product: {s1.ard_dimap}")
LOGGER.info(f"Creating RGB at {output_path}")
s1.create_rgb(outfile=output_path.joinpath(f"{s1.start_date}.tif"))
tiff_path = s1.ard_rgb
LOGGER.info(f"Path to newly created RGB product: {tiff_path}")

# Write a STAC catalog and item pointing to the output product.
LOGGER.info("Writing STAC catalogue and item")
write_stac_for_tiff(str(output_path), str(tiff_path), scene_id)


def copy_zip_input(input_path, output_dir, scene_id):
year = scene_id[17:21]
month = scene_id[21:23]
day = scene_id[23:25]
output_subdir = f"{output_dir}/SAR/GRD/{year}/{month}/{day}"
os.makedirs(output_subdir, exist_ok=True)
try:
scene_path = f"{output_subdir}/{scene_id}"
try:
os.link(input_path, f"{scene_path}.zip")
except OSError as e:
LOGGER.warning("Exception linking input data", exc_info=e)
LOGGER.warning("Attempting to copy instead.")
shutil.copy2(input_path, f"{scene_path}.zip")
with open(f"{scene_path}.downloaded", mode="w") as f:
f.write("successfully found here")
except Exception as e:
LOGGER.warning("Exception linking/copying input data", exc_info=e)


def create_dummy_tiff(path: Path) -> None:
import numpy as np
import rasterio

data = np.linspace(np.arange(100), 50 * np.sin(np.arange(100)), 100)
with rasterio.open(
str(path),
'w',
driver='GTiff',
height=data.shape[0],
width=data.shape[1],
count=1,
dtype=data.dtype,
crs="+proj=latlong",
transform=rasterio.transform.Affine.scale(0.1, 0.1),
) as dst:
dst.write(data, 1)

def get_input_path_from_stac(stac_root: str) -> str:
stac_path = pathlib.Path(stac_root)
catalog = pystac.Catalog.from_file(str(stac_path / "catalog.json"))
item_links = [link for link in catalog.links if link.rel == "item"]
assert len(item_links) == 1
item_link = item_links[0]
item = pystac.Item.from_file(str(stac_path / item_link.href))
if "manifest" in item.assets:
LOGGER.info(f"Found manifest asset in {catalog}")
manifest_asset = item.assets["manifest"]
if "filename" in manifest_asset.extra_fields:
filename = pathlib.Path(manifest_asset.extra_fields["filename"])
safe_dir = stac_path / filename.parent
LOGGER.info(f"Found SAFE directory at {safe_dir}")
return str(safe_dir)
else:
raise RuntimeError(
f"No filename for manifest asset in {catalog}"
)
else:
LOGGER.info("No manifest asset found; looking for zip asset")
zip_assets = [
asset
for asset in item.assets.values()
if asset.media_type == "application/zip"
]
if len(zip_assets) < 1:
raise RuntimeError(
f"No manifest assets or zip assets found in {catalog}"
)
elif len(zip_assets) > 1:
raise RuntimeError(
f"No manifest assets and multiple zip assets found in "
f"{stac_root}, so it's not clear which zip asset to use."
)
else:
zip_path = stac_path / zip_assets[0].href
LOGGER.info(f"Found input zip at {zip_path}")
return str(zip_path)


def write_stac_for_tiff(stac_root: str, asset_path: str, scene_id: str) -> None:
LOGGER.info(f"Writing STAC for asset {asset_path} to {stac_root}")
ds = rasterio.open(asset_path)
asset = pystac.Asset(
roles=["data"],
href=asset_path,
media_type="image/tiff; application=geotiff;",
)
bb = ds.bounds
s = scene_id
item = pystac.Item(
id="result-item",
geometry=[
[bb.left, bb.bottom],
[bb.left, bb.top],
[bb.right, bb.top],
[bb.right, bb.bottom],
[bb.left, bb.bottom]
],
bbox=[bb.left, bb.bottom, bb.right, bb.top],
datetime=None,
start_datetime=datetime(*map(int, (
s[17:21], s[21:23], s[23:25], s[26:28], s[28:30], s[30:32]))),
end_datetime=datetime(*map(int, (
s[33:37], s[37:39], s[39:41], s[42:44], s[44:46], s[46:48]))),
properties={}, # datetime values will be filled in automatically
assets={"TIFF": asset},
)
catalog = pystac.Catalog(
id="catalog",
description="Root catalog",
href=f"{stac_root}/catalog.json",
)
catalog.add_item(item)
catalog.make_all_asset_hrefs_relative()
catalog.save(catalog_type=pystac.CatalogType.SELF_CONTAINED)


if __name__ == "__main__":
sys.exit(run())
Loading