Skip to content

Commit

Permalink
Refactor to python package and restructure project directory (#16)
Browse files Browse the repository at this point in the history
* rename generate.py to example.py and mv to directory with STAC generation scripts

* mkdir tests and mv validation.py to test_validation.py

* make dir for ci scripts; add convert.py to this dir.

* etl and coclico stac configs into src/coclicodata directory that can be packaged

* refactor

* add poetry for package management

* add azure vars to env

* mv keys function to cloud utils

* snakecase convention

* mv scripts to subdirs

* setup deltares drive configs

* refactor to python package

* usage instructions and integrated readme in package dirs here

* usage instructions and integrated readme in package dirs here

* mv readme to proj root

* coastal mask with coclicodata and coastmonitor package

* do not require deltares fields

* change id to raw gh path

* test schema

* change path

* deltares props no longer as required

* fp to href

* description for all files in repo

* bash script to upload stacs to azure cloud

* change schema uri to update json schema

* make stacs without deltares properties as required

* test to avoid collection id duplication

* mv root inside test as convention

* black formatting

* pre commit config

* pre commit cleanup

* add pre commit instructions to readme

* change href extension to main branch

* coastal mask stacs without redundant frontend properties

* added coastal mask to stac catalog

* sync instead of uploading

* load catalog outside function for CI tests

* fix href to root in tests

* reset hrefs to feature branch

---------

Co-authored-by: floriscalkoen <[email protected]>
  • Loading branch information
FlorisCalkoen and floriscalkoen authored Aug 31, 2023
1 parent 4210fb8 commit f88c04d
Show file tree
Hide file tree
Showing 358 changed files with 73,076 additions and 72,805 deletions.
3 changes: 3 additions & 0 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1 +1,4 @@
MAPBOX_ACCESS_TOKEN=""
AZURE_STORAGE_ACCOUNT=""
AZURE_STORAGE_SAS_TOKEN=""
GH_COASTMONITOR_TOKEN=""
2 changes: 1 addition & 1 deletion .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ jobs:
path: live
- name: Release
run: |
python convert.py
python ci/convert.py
rm -rf ../live/current
cp -rp live ../live/current
cd ../live
Expand Down
23 changes: 23 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
ci:
autofix_prs: false
autoupdate_schedule: weekly

repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v4.4.0
hooks:
- id: trailing-whitespace
- id: debug-statements
- id: check-yaml
- id: check-added-large-files

- repo: https://github.com/psf/black
rev: 23.7.0
hooks:
- id: black
language_version: python3.11

- repo: https://github.com/kynan/nbstripout
rev: 0.6.1
hooks:
- id: nbstripout
133 changes: 119 additions & 14 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,141 @@
# coclicodata
STAC catalog for CoCliCo

This is a **relative** STAC catalog for development purposes.
This repository contains code to maintain the CoCliCo STAC catalog. Please note that
this is a **relative** STAC catalog for development purposes.

## Usage

Given that `coclicodata` is under active development, it's recommended to clone the repository and then install it in 'editable' mode. This way, any changes made to the package are immediately available without needing a reinstall.

Follow these steps for installation:

1. **Clone the repository**:

``` bash
git clone https://github.com/openearth/coclicodata.git
```

2. **Install the environment**:

``` bash
mamba install -f /path/to/coclicodata/environment.yaml
```

3. **Activate the environment**:

``` bash
mamba activate coclico
```

4. **Install the package in editable mode**:

``` bash
pip install -e /path/to/coclicodata
```

After installation, you can easily import and use any module or function from the
`coclicodata` package in your Python scripts or interactive sessions:

```python
from coclicodata.coclico_stac import utils
# Further code utilizing the utils module...
```

## Use pre-commit locally

Ensure consistent code formatting and avoid big repositories by removing output with pre-commit.

In the root of the repository run:

```bash
pre-commit install
```

If the hooks catch issues when you commit your changes, they will fix them automatically.:

```bash
git commit -m "Your message"
```
Once hooks pass, push your changes.

## Test
You can run pytest to check whether you current STAC collection is valid

You can run `pytest` to check whether you current STAC collection is valid. The command
will automatically run the test scripts that are maintained in `tests/test_*.py`

## Release
On succesfull validation of STAC catalog in the main branch, an **absolute** version

On successful validation of STAC catalog in the main branch, an **absolute** version
of the catalog will be published in the `live` branch that can be used externally.

## CoCliCoData repository structure

- **ci**
- `convert.py`: CI script to convert current to live stacs.

- **current**: STAC catalog that is used for web portal development.

- **docs**: Various documentation images like flowcharts and diagrams representing data formats and workflows.

- **json-schema**
- `schema.json`: JSON schema definition for the frontend Deltares STAC extension.

- **live**: STAC catalog that is used by the web-portal to serve end users.

- **notebooks**: Jupyter notebooks used to load, explore and transform the data;
typically one per dataset, to make it CF compliant.

- **scripts**
- **bash**: Shell scripts, like `build-stacs.sh` and `upload-stacs-to-azure.sh`, for various automation tasks.
- **create_stacs**: Python scripts for creating STACs, each typically corresponding to a specific dataset or processing step.
- **utils**: Utility scripts, like `coclico_common_vocab_from_stac.py` and `upload_and_generate_geojson.py`, for various data operations.

- **src/coclicodata**
- `__init__.py`: Main package initialization.
- `drive_config.py`: Configuration settings for the drive or storage medium.
- **etl**
- `__init__.py`: Subpackage initialization.
- `cf_compliancy_checker.py`: Checks for compliancy with the Climate and Forecast (CF) conventions.
- `cloud_utils.py`: Utilities for cloud-based operations and data processing.
- `extract.py`: Data extraction and transform functionalities.

- **coclico_stac**
- `__init__.py`: Subpackage initialization.
- `datacube.py`: Functions for extracting dimension shapes and metadata from zarr stores.
- `extension.py`: CoCliCo STAC extension that is used for frontend visualization.
- `io.py`: Defines the CoCLiCo JSON I/O strategy for STAC catalogs.
- `layouts.py`: Provides CoCliCo layout strategies for STAC for the data formats used.
- `templates.py`: Defines CoCliCo templates for generating STAC items, assets and collections.
- `utils.py`: Utility functions for data migration and other STAC-related operations.

- **stories**: Contains narrative data and associated images.

- **tests**: Contains test scripts to ensure code quality and functionality.

- `.pre-commit-config.yaml`: Hooks that will be run when making a commit.
- `metadata_template.json`: Template file for a STAC collection from a dataset.

## Metadata

The following attributes are required at dataset level:

- title -
- title abbreviation -
- description - description that will be used to as dataset explanation in the web portal.
- title -
- title abbreviation -
- description - description that will be used to as dataset explanation in the web portal.
- short description - description which is convenient when loading the data into a
programming environment
- institution - data producer
- providers - data host (Deltares / CoCliCo)
- name
- url
- roles - e.g., providers, licensor
- description -
- history - list of institutions and people who have processed the data
- description -
- history - list of institutions and people who have processed the data
- media_type - [also known as mime type](https://www.iana.org/assignments/media-types/media-types.xhtml)
- spatial extent - bbox [minx, miny, maxx, maxy]
- temporal extent - time interval in [iso 8601](https://en.wikipedia.org/wiki/ISO_8601), i.e., YYYY-MM-DDTHH:mm:ssZ
- license -
- author -
- license -
- author -

The following attributes are optional at dataset level:
- keywords - these can be used to search using the STAC API
Expand All @@ -39,13 +144,13 @@ The following attributes are optional at dataset level:
Publisher. (resourceTypeGeneral). Identifier format (Zenodo specification)
- doi - following [Zenodo specification](https://about.zenodo.org/principles/)
- thumbnail asset image - image that will be shown to represent the dataset
- columns - when data is tabular and has column names
- columns - when data is tabular and has column names

The following attributes are required at variable level

- long_name - descriptive name
- standard_name - iff available in [CF convention standard table](https://cfconventions.org/Data/cf-standard-names/current/build/cf-standard-name-table.html)
- units - follow CF conventions where possible; leave blank when no units.
- units - follow CF conventions where possible; leave blank when no units.
- cell_bnds

The following attributes are optional at variable level:
Expand All @@ -54,7 +159,7 @@ The following attributes are optional at variable level:
The following coordinate labels are required:

- crs or spatial_ref
- time
- time

### Controlled vocabulary
| **name** | **long_name** | **standard_name** | **data_structure_type** | **dtype** |
Expand Down
File renamed without changes.
Loading

0 comments on commit f88c04d

Please sign in to comment.