diff --git a/docs/quickstart_guide.md b/docs/quickstart_guide.md index 91f9160..78e1202 100644 --- a/docs/quickstart_guide.md +++ b/docs/quickstart_guide.md @@ -2,9 +2,8 @@ ## A default suite -In this guide we will go through the steps of creating a suite from the very beginning. -This suite will be named `efas_report` and its code will live in a directory named -`projects`. +This tutorial will cover creating a wellies suite, from the creation of the default configuration files to the customisation +and deployment of the suite. We'll use the `wellies-quickstart` tool to create all of the files and folders we need to build a wellies suite. ```shell $ wellies-quickstart ~/projects/efas_report -p efas_report @@ -15,13 +14,35 @@ The command will start the project with associated base configuration files in t Before we do any further changes, it's always good to keep track of our changes. So, let's initialize a local git repository ```console +$ wellies-quickstart -p efas_report ~/projects/efas_report +``` + +Let's have a look at the folder created + +```tree +efas_report/ +├── configs +│   ├── config.yaml +│   ├── data.yaml +│   ├── execution_contexts.yaml +│   └── tools.yaml +├── deploy.py +├── Makefile +└── suite + └── nodes.py +``` + +We can see that the suite configuration files, deployment Python script, Makefile and suite customisation code has been created for us. Before we start we'll initialise a git repository so we can keep track of our changes. + +```console +$ cd ~/projects/efas_report $ git init $ echo "__pycache__" > .gitignore $ git add --all $ git commit -m "Start of project from wellies-quickstart" ``` -This example suite is ready to deploy and we can do this by running +The example suite is ready to deploy and we can do this using `deploy.py` and passing it the paths to our configuration files. ```console $ ./deploy configs/*.yaml @@ -86,19 +107,18 @@ Paths to the temporary directories have been changed for brevity We've deployed the default suite, now let's look at how we can configure it to do what we want. -## Customising the suite +### Deploying the suite -When we ran the deploy script we passed it the path to the configuration files `configs/*.yaml`. -Within that directory there are four files +We can deploy this default suite to an ecflow server using `ecflow_client`. +You'll need to either load the `ecflow` module or [install ecflow](https://ecflow.readthedocs.io/en/latest/install/index.html). -```tree -configs/ - config.yaml - data.yaml - execution_contexts.yaml - tools.yaml +```console +$ ecflow_client --host ecflow_server.example.com --port 3141 --load /perm/username/pyflow/efas_report/efas_report.def ``` +## Customising the suite + +When we ran the deploy script we passed it the path to the configuration files `configs/*.yaml`. For a quick overview of what these files do: - `config.yaml` - handles the main options of the suite (paths, hosts, user, etc.) @@ -107,13 +127,13 @@ For a quick overview of what these files do: - [`tools.yaml`](./config/tools_config.md) - for conda environment creation and loading, environment variable handling etc -In this tutorial we'll only cover making changes to `config.yaml` and `data.yaml`, click on filenames above for more information on making changes to the others. +Click on filenames above for more information on available configuration options. To start, let's take a look at `config.yaml` ### `config.yaml` -As mentioned above, this file handles the main options of the suite. We'll start with the minmial example generated by wellies. +As mentioned above, this file handles the main options of the suite. We'll start with the minimal example generated by wellies. ```yaml title="config.yaml" # Configuration file for pyflow suite. @@ -178,7 +198,7 @@ workdir: "$TMPDIR" output_root: "{SCRATCH}/efas_report" ``` -And to deploy the updated suite we do +To deploy the updated suite we do ```console $ ./deploy configs/*.yaml @@ -220,7 +240,7 @@ With this deployment wellies detects that changes have been made to the configur ### `data.yaml` -This file configures data retrieval and handling. In our workflow we will need two datasets that are *static*, or they are data that need to be fetched just once for our computations to work. Within the `configs/data.yaml` we will add entries to transfer the latest station file from the EFAS repository and the computed flood thresholds that we know are available in a shared directory. +This file configures data retrieval and handling. In our workflow we will need two datasets that are *static*, or they are data that need to be fetched just once for our computations to work. Within `configs/data.yaml` we will add entries to transfer the latest station file from the EFAS repository and the computed flood thresholds that we know are available in a shared directory. We'll start by creating an `outlets` section in our `configs/data.yaml` file. The EFAS station file is tracked within EFAS suite repository, we tell wellies to clone the repo, use the `develop` branch and just keep the files specified in the `files` list. Next we add a `static_maps` section for the thresholds and upstream area files. @@ -304,8 +324,7 @@ When we run the deploy command $ ./deploy config/*.yaml ``` -wellies reads and parses the YAML files from path given. The configuration settings are stored in a `Config` object created from the class that's defined -in `deploy.py`. +wellies reads and parses the YAML files from path given. The configuration settings are stored in a `Config` object created from the class that's defined in `deploy.py`. ```python title="suite/nodes.py" class Config: @@ -359,8 +378,7 @@ class Config: ``` To add such a node to our suite we will modify the main family definition in `suite/nodes.py`. -At the moment we have the `MainFamily` definition with a single -placeholder task in it. +At the moment we have the `MainFamily` definition with a single placeholder task in it. ```python title="suite/nodes.py" @@ -375,8 +393,8 @@ class MainFamily(pf.AnchorFamily): Let's replace this by a repeating node that will run every day, retrieve the input data for each cycle and then run our processing. We see -that the `MainFamily` class receives a config argument so we can use that to carry -on data like, start and end dates and the keys for our data retrieval. +that the `MainFamily` class receives a config argument so we can use that to pass data such as +start and end dates and the keys for our data retrieval. ```python title="suite/nodes.py" class IssueFamily(pf.Family): @@ -411,16 +429,48 @@ class MainFamily(pf.AnchorFamily): f_previous = f_issue ``` -For readability we also define a `IssueFamily` class that holds the logic of a -single run of our analysis and transfer the configuration of how many cycles we -are going to run. +For readability we also define an `IssueFamily` class that holds the logic of a +single run of our analysis and configures how many cycles we are going to run. The `post_script` added to our retrievals in `data.yaml` uses an external conversion tool -provided by the `ecmwf-toolbox` module. We need to add such runtime dependency on top of -our script. The way to do it is again via the `config` object which has a `tools` -attribute pointing to a `ToolStore` object. Using this `load` function we can load -any of the tools defined in our `tool.yaml`. +provided by the `ecmwf-toolbox` module. We need to let wellies know we're going to use this module by making sure it's in our `tools.yaml` file. + +```yaml title="tools.yaml" +tools: + modules: + python: + name: python3 + version: 3.10.10-01 + ecmwf-toolbox: + version: 2023.10.0.0 + depends: [python] + packages: + earthkit: + type: git + source: git@github.com:ecmwf/earthkit-data.git + branch: develop + post_script: "pip install . --no-deps" + environments: + suite_env: + type: system_venv + depends: [python, ecmwf-toolbox] + packages: [earthkit] +``` + +We're now able to add the loading of the `ecmwf-toolbox` module to our script using the `config.tools.load('ecmwf-toolbox')` +call. + +```python +n_ret = pf.Task( + name='retrieve', + script=[ + config.tools.load('ecmwf-toolbox'), + [dd.script for dd in config.fc_retrievals], + ], + ) +``` +For more detail on using tools see the [tools documentation](./config/tools_config.md). # TODO's