Skip to content

Commit

Permalink
Fix merge conflict with Julien's changes
Browse files Browse the repository at this point in the history
  • Loading branch information
gareth-j committed Oct 18, 2024
1 parent 8e65a47 commit 4a5d71d
Showing 1 changed file with 80 additions and 30 deletions.
110 changes: 80 additions & 30 deletions docs/quickstart_guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,9 +2,8 @@

## A default suite

In this guide we will go through the steps of creating a suite from the very beginning.
This suite will be named `efas_report` and its code will live in a directory named
`projects`.
This tutorial will cover creating a wellies suite, from the creation of the default configuration files to the customisation
and deployment of the suite. We'll use the `wellies-quickstart` tool to create all of the files and folders we need to build a wellies suite.

```shell
$ wellies-quickstart ~/projects/efas_report -p efas_report
Expand All @@ -15,13 +14,35 @@ The command will start the project with associated base configuration files in t
Before we do any further changes, it's always good to keep track of our changes. So, let's initialize a local git repository

```console
$ wellies-quickstart -p efas_report ~/projects/efas_report
```

Let's have a look at the folder created

```tree
efas_report/
├── configs
│   ├── config.yaml
│   ├── data.yaml
│   ├── execution_contexts.yaml
│   └── tools.yaml
├── deploy.py
├── Makefile
└── suite
└── nodes.py
```

We can see that the suite configuration files, deployment Python script, Makefile and suite customisation code has been created for us. Before we start we'll initialise a git repository so we can keep track of our changes.

```console
$ cd ~/projects/efas_report
$ git init
$ echo "__pycache__" > .gitignore
$ git add --all
$ git commit -m "Start of project from wellies-quickstart"
```

This example suite is ready to deploy and we can do this by running
The example suite is ready to deploy and we can do this using `deploy.py` and passing it the paths to our configuration files.

```console
$ ./deploy configs/*.yaml
Expand Down Expand Up @@ -86,19 +107,18 @@ Paths to the temporary directories have been changed for brevity

We've deployed the default suite, now let's look at how we can configure it to do what we want.

## Customising the suite
### Deploying the suite

When we ran the deploy script we passed it the path to the configuration files `configs/*.yaml`.
Within that directory there are four files
We can deploy this default suite to an ecflow server using `ecflow_client`.
You'll need to either load the `ecflow` module or [install ecflow](https://ecflow.readthedocs.io/en/latest/install/index.html).

```tree
configs/
config.yaml
data.yaml
execution_contexts.yaml
tools.yaml
```console
$ ecflow_client --host ecflow_server.example.com --port 3141 --load /perm/username/pyflow/efas_report/efas_report.def
```

## Customising the suite

When we ran the deploy script we passed it the path to the configuration files `configs/*.yaml`.
For a quick overview of what these files do:

- `config.yaml` - handles the main options of the suite (paths, hosts, user, etc.)
Expand All @@ -107,13 +127,13 @@ For a quick overview of what these files do:
- [`tools.yaml`](./config/tools_config.md) - for conda environment creation and loading, environment variable handling etc


In this tutorial we'll only cover making changes to `config.yaml` and `data.yaml`, click on filenames above for more information on making changes to the others.
Click on filenames above for more information on available configuration options.

To start, let's take a look at `config.yaml`

### `config.yaml`

As mentioned above, this file handles the main options of the suite. We'll start with the minmial example generated by wellies.
As mentioned above, this file handles the main options of the suite. We'll start with the minimal example generated by wellies.

```yaml title="config.yaml"
# Configuration file for pyflow suite.
Expand Down Expand Up @@ -178,7 +198,7 @@ workdir: "$TMPDIR"
output_root: "{SCRATCH}/efas_report"
```
And to deploy the updated suite we do
To deploy the updated suite we do
```console
$ ./deploy configs/*.yaml
Expand Down Expand Up @@ -220,7 +240,7 @@ With this deployment wellies detects that changes have been made to the configur
### `data.yaml`

This file configures data retrieval and handling. In our workflow we will need two datasets that are *static*, or they are data that need to be fetched just once for our computations to work. Within the `configs/data.yaml` we will add entries to transfer the latest station file from the EFAS repository and the computed flood thresholds that we know are available in a shared directory.
This file configures data retrieval and handling. In our workflow we will need two datasets that are *static*, or they are data that need to be fetched just once for our computations to work. Within `configs/data.yaml` we will add entries to transfer the latest station file from the EFAS repository and the computed flood thresholds that we know are available in a shared directory.

We'll start by creating an `outlets` section in our `configs/data.yaml` file.
The EFAS station file is tracked within EFAS suite repository, we tell wellies to clone the repo, use the `develop` branch and just keep the files specified in the `files` list. Next we add a `static_maps` section for the thresholds and upstream area files.
Expand Down Expand Up @@ -304,8 +324,7 @@ When we run the deploy command
$ ./deploy config/*.yaml
```

wellies reads and parses the YAML files from path given. The configuration settings are stored in a `Config` object created from the class that's defined
in `deploy.py`.
wellies reads and parses the YAML files from path given. The configuration settings are stored in a `Config` object created from the class that's defined in `deploy.py`.

```python title="suite/nodes.py"
class Config:
Expand Down Expand Up @@ -359,8 +378,7 @@ class Config:
```

To add such a node to our suite we will modify the main family definition in `suite/nodes.py`.
At the moment we have the `MainFamily` definition with a single
placeholder task in it.
At the moment we have the `MainFamily` definition with a single placeholder task in it.


```python title="suite/nodes.py"
Expand All @@ -375,8 +393,8 @@ class MainFamily(pf.AnchorFamily):

Let's replace this by a repeating node that will run
every day, retrieve the input data for each cycle and then run our processing. We see
that the `MainFamily` class receives a config argument so we can use that to carry
on data like, start and end dates and the keys for our data retrieval.
that the `MainFamily` class receives a config argument so we can use that to pass data such as
start and end dates and the keys for our data retrieval.

```python title="suite/nodes.py"
class IssueFamily(pf.Family):
Expand Down Expand Up @@ -411,16 +429,48 @@ class MainFamily(pf.AnchorFamily):
f_previous = f_issue
```

For readability we also define a `IssueFamily` class that holds the logic of a
single run of our analysis and transfer the configuration of how many cycles we
are going to run.
For readability we also define an `IssueFamily` class that holds the logic of a
single run of our analysis and configures how many cycles we are going to run.

The `post_script` added to our retrievals in `data.yaml` uses an external conversion tool
provided by the `ecmwf-toolbox` module. We need to add such runtime dependency on top of
our script. The way to do it is again via the `config` object which has a `tools`
attribute pointing to a `ToolStore` object. Using this `load` function we can load
any of the tools defined in our `tool.yaml`.
provided by the `ecmwf-toolbox` module. We need to let wellies know we're going to use this module by making sure it's in our `tools.yaml` file.

```yaml title="tools.yaml"
tools:
modules:
python:
name: python3
version: 3.10.10-01
ecmwf-toolbox:
version: 2023.10.0.0
depends: [python]
packages:
earthkit:
type: git
source: [email protected]:ecmwf/earthkit-data.git
branch: develop
post_script: "pip install . --no-deps"
environments:
suite_env:
type: system_venv
depends: [python, ecmwf-toolbox]
packages: [earthkit]
```

We're now able to add the loading of the `ecmwf-toolbox` module to our script using the `config.tools.load('ecmwf-toolbox')`
call.

```python
n_ret = pf.Task(
name='retrieve',
script=[
config.tools.load('ecmwf-toolbox'),
[dd.script for dd in config.fc_retrievals],
],
)
```

For more detail on using tools see the [tools documentation](./config/tools_config.md).

# TODO's

Expand Down

0 comments on commit 4a5d71d

Please sign in to comment.