-
Notifications
You must be signed in to change notification settings - Fork 1
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Fix merge conflict with Julien's changes
- Loading branch information
Showing
1 changed file
with
80 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -2,9 +2,8 @@ | |
|
||
## A default suite | ||
|
||
In this guide we will go through the steps of creating a suite from the very beginning. | ||
This suite will be named `efas_report` and its code will live in a directory named | ||
`projects`. | ||
This tutorial will cover creating a wellies suite, from the creation of the default configuration files to the customisation | ||
and deployment of the suite. We'll use the `wellies-quickstart` tool to create all of the files and folders we need to build a wellies suite. | ||
|
||
```shell | ||
$ wellies-quickstart ~/projects/efas_report -p efas_report | ||
|
@@ -15,13 +14,35 @@ The command will start the project with associated base configuration files in t | |
Before we do any further changes, it's always good to keep track of our changes. So, let's initialize a local git repository | ||
|
||
```console | ||
$ wellies-quickstart -p efas_report ~/projects/efas_report | ||
``` | ||
|
||
Let's have a look at the folder created | ||
|
||
```tree | ||
efas_report/ | ||
├── configs | ||
│ ├── config.yaml | ||
│ ├── data.yaml | ||
│ ├── execution_contexts.yaml | ||
│ └── tools.yaml | ||
├── deploy.py | ||
├── Makefile | ||
└── suite | ||
└── nodes.py | ||
``` | ||
|
||
We can see that the suite configuration files, deployment Python script, Makefile and suite customisation code has been created for us. Before we start we'll initialise a git repository so we can keep track of our changes. | ||
|
||
```console | ||
$ cd ~/projects/efas_report | ||
$ git init | ||
$ echo "__pycache__" > .gitignore | ||
$ git add --all | ||
$ git commit -m "Start of project from wellies-quickstart" | ||
``` | ||
|
||
This example suite is ready to deploy and we can do this by running | ||
The example suite is ready to deploy and we can do this using `deploy.py` and passing it the paths to our configuration files. | ||
|
||
```console | ||
$ ./deploy configs/*.yaml | ||
|
@@ -86,19 +107,18 @@ Paths to the temporary directories have been changed for brevity | |
|
||
We've deployed the default suite, now let's look at how we can configure it to do what we want. | ||
|
||
## Customising the suite | ||
### Deploying the suite | ||
|
||
When we ran the deploy script we passed it the path to the configuration files `configs/*.yaml`. | ||
Within that directory there are four files | ||
We can deploy this default suite to an ecflow server using `ecflow_client`. | ||
You'll need to either load the `ecflow` module or [install ecflow](https://ecflow.readthedocs.io/en/latest/install/index.html). | ||
|
||
```tree | ||
configs/ | ||
config.yaml | ||
data.yaml | ||
execution_contexts.yaml | ||
tools.yaml | ||
```console | ||
$ ecflow_client --host ecflow_server.example.com --port 3141 --load /perm/username/pyflow/efas_report/efas_report.def | ||
``` | ||
|
||
## Customising the suite | ||
|
||
When we ran the deploy script we passed it the path to the configuration files `configs/*.yaml`. | ||
For a quick overview of what these files do: | ||
|
||
- `config.yaml` - handles the main options of the suite (paths, hosts, user, etc.) | ||
|
@@ -107,13 +127,13 @@ For a quick overview of what these files do: | |
- [`tools.yaml`](./config/tools_config.md) - for conda environment creation and loading, environment variable handling etc | ||
|
||
|
||
In this tutorial we'll only cover making changes to `config.yaml` and `data.yaml`, click on filenames above for more information on making changes to the others. | ||
Click on filenames above for more information on available configuration options. | ||
|
||
To start, let's take a look at `config.yaml` | ||
|
||
### `config.yaml` | ||
|
||
As mentioned above, this file handles the main options of the suite. We'll start with the minmial example generated by wellies. | ||
As mentioned above, this file handles the main options of the suite. We'll start with the minimal example generated by wellies. | ||
|
||
```yaml title="config.yaml" | ||
# Configuration file for pyflow suite. | ||
|
@@ -178,7 +198,7 @@ workdir: "$TMPDIR" | |
output_root: "{SCRATCH}/efas_report" | ||
``` | ||
And to deploy the updated suite we do | ||
To deploy the updated suite we do | ||
```console | ||
$ ./deploy configs/*.yaml | ||
|
@@ -220,7 +240,7 @@ With this deployment wellies detects that changes have been made to the configur | |
### `data.yaml` | ||
|
||
This file configures data retrieval and handling. In our workflow we will need two datasets that are *static*, or they are data that need to be fetched just once for our computations to work. Within the `configs/data.yaml` we will add entries to transfer the latest station file from the EFAS repository and the computed flood thresholds that we know are available in a shared directory. | ||
This file configures data retrieval and handling. In our workflow we will need two datasets that are *static*, or they are data that need to be fetched just once for our computations to work. Within `configs/data.yaml` we will add entries to transfer the latest station file from the EFAS repository and the computed flood thresholds that we know are available in a shared directory. | ||
|
||
We'll start by creating an `outlets` section in our `configs/data.yaml` file. | ||
The EFAS station file is tracked within EFAS suite repository, we tell wellies to clone the repo, use the `develop` branch and just keep the files specified in the `files` list. Next we add a `static_maps` section for the thresholds and upstream area files. | ||
|
@@ -304,8 +324,7 @@ When we run the deploy command | |
$ ./deploy config/*.yaml | ||
``` | ||
|
||
wellies reads and parses the YAML files from path given. The configuration settings are stored in a `Config` object created from the class that's defined | ||
in `deploy.py`. | ||
wellies reads and parses the YAML files from path given. The configuration settings are stored in a `Config` object created from the class that's defined in `deploy.py`. | ||
|
||
```python title="suite/nodes.py" | ||
class Config: | ||
|
@@ -359,8 +378,7 @@ class Config: | |
``` | ||
|
||
To add such a node to our suite we will modify the main family definition in `suite/nodes.py`. | ||
At the moment we have the `MainFamily` definition with a single | ||
placeholder task in it. | ||
At the moment we have the `MainFamily` definition with a single placeholder task in it. | ||
|
||
|
||
```python title="suite/nodes.py" | ||
|
@@ -375,8 +393,8 @@ class MainFamily(pf.AnchorFamily): | |
|
||
Let's replace this by a repeating node that will run | ||
every day, retrieve the input data for each cycle and then run our processing. We see | ||
that the `MainFamily` class receives a config argument so we can use that to carry | ||
on data like, start and end dates and the keys for our data retrieval. | ||
that the `MainFamily` class receives a config argument so we can use that to pass data such as | ||
start and end dates and the keys for our data retrieval. | ||
|
||
```python title="suite/nodes.py" | ||
class IssueFamily(pf.Family): | ||
|
@@ -411,16 +429,48 @@ class MainFamily(pf.AnchorFamily): | |
f_previous = f_issue | ||
``` | ||
|
||
For readability we also define a `IssueFamily` class that holds the logic of a | ||
single run of our analysis and transfer the configuration of how many cycles we | ||
are going to run. | ||
For readability we also define an `IssueFamily` class that holds the logic of a | ||
single run of our analysis and configures how many cycles we are going to run. | ||
|
||
The `post_script` added to our retrievals in `data.yaml` uses an external conversion tool | ||
provided by the `ecmwf-toolbox` module. We need to add such runtime dependency on top of | ||
our script. The way to do it is again via the `config` object which has a `tools` | ||
attribute pointing to a `ToolStore` object. Using this `load` function we can load | ||
any of the tools defined in our `tool.yaml`. | ||
provided by the `ecmwf-toolbox` module. We need to let wellies know we're going to use this module by making sure it's in our `tools.yaml` file. | ||
|
||
```yaml title="tools.yaml" | ||
tools: | ||
modules: | ||
python: | ||
name: python3 | ||
version: 3.10.10-01 | ||
ecmwf-toolbox: | ||
version: 2023.10.0.0 | ||
depends: [python] | ||
packages: | ||
earthkit: | ||
type: git | ||
source: [email protected]:ecmwf/earthkit-data.git | ||
branch: develop | ||
post_script: "pip install . --no-deps" | ||
environments: | ||
suite_env: | ||
type: system_venv | ||
depends: [python, ecmwf-toolbox] | ||
packages: [earthkit] | ||
``` | ||
|
||
We're now able to add the loading of the `ecmwf-toolbox` module to our script using the `config.tools.load('ecmwf-toolbox')` | ||
call. | ||
|
||
```python | ||
n_ret = pf.Task( | ||
name='retrieve', | ||
script=[ | ||
config.tools.load('ecmwf-toolbox'), | ||
[dd.script for dd in config.fc_retrievals], | ||
], | ||
) | ||
``` | ||
|
||
For more detail on using tools see the [tools documentation](./config/tools_config.md). | ||
|
||
# TODO's | ||
|
||
|