Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH11 Add an "update_resource" command #12

Merged
merged 20 commits into from
Mar 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
a731c30
Add skeleton of a new update_resource command
IanHopkinson Mar 9, 2024
2fb7ad9
Add a minimal integration test - does not yet address update_resource
IanHopkinson Mar 9, 2024
a6603f6
Passing actual test of of update_resource
IanHopkinson Mar 9, 2024
6997c5a
Do setup/teardown with pytest
IanHopkinson Mar 10, 2024
733e7be
Make update_resource more verbose in use
IanHopkinson Mar 10, 2024
dbd75a9
Update documentation
IanHopkinson Mar 10, 2024
069a1d0
Change dry_run flag to live
IanHopkinson Mar 10, 2024
4880f0b
Update version
IanHopkinson Mar 10, 2024
50291a3
Include hdx_key_stage in pipeline
IanHopkinson Mar 10, 2024
f3b63cb
Add in a diagnostic to try to fix GitHub Action issue
IanHopkinson Mar 10, 2024
984c1b7
Update configuration command to show hdx_key_stage
IanHopkinson Mar 10, 2024
1fe2ea8
A couple more tweaks to try to get the test to run
IanHopkinson Mar 10, 2024
f507bc2
It seems that HDX_KEY_STAGE might not be supported
IanHopkinson Mar 10, 2024
ce02b62
Add a missing test fixture
IanHopkinson Mar 10, 2024
53a6132
update_resource will now add a resource if it does not exist but refu…
IanHopkinson Mar 11, 2024
c5a1331
Detect Authorization Errors and report update status correctly on error
IanHopkinson Mar 14, 2024
c7a390f
Finally get the reordering to work!
IanHopkinson Mar 14, 2024
3f719a4
Small fix to pipeline
IanHopkinson Mar 14, 2024
e657587
Minor cosmetic fixes
IanHopkinson Mar 14, 2024
8e94909
Clarify the "add resource" test a little bit
IanHopkinson Mar 15, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/dev_pipeline.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,10 @@ jobs:
make lint
- name: Running unit tests
env:
HDX_KEY: ${{ secrets.HDX_KEY }}
HDX_KEY_STAGE: ${{ secrets.HDX_KEY_STAGE }}
HDX_SITE: ${{ vars.HDX_SITE }}
USER_AGENT: ${{ vars.USER_AGENT }}
PREPREFIX: ${{ vars.PREPREFIX }}
run: |
hdx-toolkit configuration
make unit_tests
2 changes: 2 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -10,3 +10,5 @@ cov.xml


tests/fixtures/test.csv

src/temp/
54 changes: 40 additions & 14 deletions DEMO.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,77 +27,94 @@ Understanding the `Configuration` used by `hdx-python-api` can be challenging fo
hdx-toolkit configuration
```

The `list` and `update` commands are designed to be used together, using `list` to check what a potentially destructive `update` will do, and then simply repeating the same commandline with `list` replaced with `update`:
The `list` and `update` commands are designed to be used together, using `list` to check what a potentially destructive `update` will do, and then simply repeating the same commandline with `list` replaced with `update`. This commandline selects a single dataset, `mali-healthsites`:

```
```shell
hdx-toolkit list --organization=healthsites --dataset_filter=mali-healthsites --hdx_site=stage --key=private --value=True
```

The command to `update` these datasets with the supplied `--value` is simply:

```shell
hdx-toolkit update --organization=healthsites --dataset_filter=mali-healthsites --hdx_site=stage --key=private --value=True
```

For this action an organization is required unless an exact dataset name is supplied.

The `list` command can output multiple comma separated keys to a table, and also to a CSV file specified using the `--output_path` keyword.
The `list` and `update` commands take wildcard arguments, i.e.:

```shell
hdx-toolkit list --organization=healthsites --dataset_filter=*la* --hdx_site=stage --key=private --value=True
```

which selects 29 datasets matching the filter `*la*`, or
```shell
hdx-toolkit list --organization=healthsites--dataset_filter=* --hdx_site=stage --key=private --value=True
```
which selects all the datasets of an organization.


The `list` command can output multiple comma separated keys to a table, and also to a CSV file specified using the `--output_path` keyword.

```shell
hdx-toolkit list --organization=international-organization-for-migration --key=data_update_frequency,dataset_date --output_path=2024-02-05-iom-dtm.csv
```

If the `query` keyword is supplied then `organization` and `dataset_filter` keywords are ignored and the `query` is passed to CKAN:

```
```shell
hdx-toolkit list --query=archived:true --key=owner_org
```

Another pain point for me is getting an organization id, the `get_organization_metadata` command fixes this. We can just get the id with an organization name, note wildcards are implicit in the organization specification since this is how the CKAN API works:

```
```shell
hdx-toolkit get_organization_metadata --organization=zurich
```

We can get the full organization record using the `--verbose` flag:

```
```shell
hdx-toolkit get_organization_metadata --organization=eth-zurich-weather-and-climate-risks --verbose
```

Similarly we can get user ids:

```
```shell
hdx-toolkit get_user_metadata --user=hopkinson
```

And see the complete records:

```
```shell
hdx-toolkit get_user_metadata --user=hopkinson --verbose
```

Note I first joined HDX in March 2015!

Finally, you can print the metadata for a dataset:

```
```shell
hdx-toolkit print --dataset_filter=climada-litpop-dataset
```

This output is valid JSON and can be piped into a file to use as a test fixture or template.

It is possible to include resource, showcase and QuickChart (resource_view) metadata into the `print` view using the `--with_extras` flag:

```
```shell
hdx-toolkit print --dataset_filter=wfp-food-prices-for-nigeria --with_extras
```

This adds resources under a `resources` key which includes a `quickcharts` key and showcases under a `showcases` key. These new keys mean that the output JSON cannot be created directly in HDX. The `fs_check_info` and `hxl_preview_config` keys which previously contained a JSON object serialised as a single string are expanded as dictionaries so that they are printed out in an easy to read format.

A Quick Chart can be uploaded from a JSON file using a commandline like where the `dataset_filter`
specifies a single dataset and the `resource_name` specifies the resource to which the Quick Chart is attached:
A Quick Chart can be uploaded from a JSON file using a commandline like where the `dataset_filter` specifies a single dataset and the `resource_name` specifies the resource to which the Quick Chart is attached:

```
hdx-toolkit quickcharts --dataset_filter=climada-flood-dataset --hdx_site=stage --resource_name=admin1-summaries-flood.csv --hdx_hxl_preview_file_path=quickchart-flood.json
```

The `hdx_hxl_preview_file_path` points to a JSON format file with the key `hxl_preview_config` which
contains the Quick Chart definition. This file is converted to a single string via a temporary yaml file so should be easily readable. Quick Chart recipe documentation can be found [here](https://github.com/OCHA-DAP/hxl-recipes?tab=readme-ov-file). There is an example file in the `hdx-cli-toolkit` [repo](https://github.com/OCHA-DAP/hdx-cli-toolkit/blob/main/tests/fixtures/quickchart-flood.json).
The `hdx_hxl_preview_file_path` points to a JSON format file with the key `hxl_preview_config` which contains the Quick Chart definition. This file is converted to a single string via a temporary yaml file so should be easily readable. Quick Chart recipe documentation can be found [here](https://github.com/OCHA-DAP/hxl-recipes?tab=readme-ov-file). There is an example file in the `hdx-cli-toolkit` [repo](https://github.com/OCHA-DAP/hdx-cli-toolkit/blob/main/tests/fixtures/quickchart-flood.json).

A showcase can be uploaded from attributes found in either a CSV format file like this:
```
Expand Down Expand Up @@ -137,6 +154,14 @@ Using a commandline like:
```
hdx-toolkit showcase --showcase_name=climada-litpop-showcase --hdx_site=stage --attributes_file_path=attributes.csv
```

An individual resource can be updated with a commandline like:
```
hdx-toolkit update_resource --dataset_name=hdx_cli_toolkit_test --resource_name="test_resource_1" --hdx_site=stage --resource_file_path=test-2.csv --live
```

Without the `--live` flag no update on HDX is made.

## Future Work

Potential new features can be found in the [GitHub issue tracker](https://github.com/OCHA-DAP/hdx-cli-toolkit/issues)
Expand All @@ -158,4 +183,5 @@ hdx-toolkit print --dataset_filter=climada-litpop-dataset
hdx-toolkit print --dataset_filter=wfp-food-prices-for-nigeria --with_extras
hdx-toolkit quickcharts --dataset_filter=climada-flood-dataset --hdx_site=stage --resource_name=admin1-summaries-flood.csv --hdx_hxl_preview_file_path=quickchart-flood.json
hdx-toolkit showcase --showcase_name=climada-litpop-showcase --hdx_site=stage --attributes_file_path=attributes.csv
hdx-toolkit update_resource --dataset_name=hdx_cli_toolkit_test --resource_name="test_resource_1" --hdx_site=stage --resource_file_path=test-2.csv --live
```
32 changes: 5 additions & 27 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ hdx-cli-toolkit:

## Usage

The `hdx-toolkit` is built using the Python `click` library. Details of the currently implemented commands can be revealed by running:
The `hdx-toolkit` is built using the Python `click` library. Details of the currently implemented commands can be revealed by running `hdx-toolkit --help`:

```
$ hdx-toolkit --help
Expand All @@ -59,39 +59,16 @@ Commands:
list List datasets in HDX
print Print datasets in HDX to the terminal
quickcharts Upload QuickChart JSON description to HDX
showcase Upload showcase to HDX
update Update datasets in HDX
update_resource Update a resource in HDX
```

The output from the `print` command is designed to be piped to file to make a valid JSON fixture.

And details of the arguments for a command can be found using:

```
hdx-toolkit [COMMAND] --help
```

`update` is clearly an operation with potential negative side-effects. Commands can be tested on the HDX `stage` site by setting `--hdx_site=stage`. In addition the `list` command can be used to check the datasets to be affected since `list` and `update` both take the same arguments and use the same filtering function although for `list` the `--value` argument is ignored:

The original purpose of the `hdx-cli-toolkit` was to quarantine the Healthsites datasets, for which the process was a cautious single dataset update
```shell
hdx-toolkit list --organization=healthsites --dataset_filter=mali-healthsites --hdx_site=stage --key=private --value=True
hdx-toolkit update --organization=healthsites --dataset_filter=mali-healthsites --hdx_site=stage --key=private --value=True
```

A slightly more adventurous update that selects 29 datasets using the `*la*` wildcard:

```shell
hdx-toolkit list --organization=healthsites --dataset_filter=*la* --hdx_site=stage --key=private --value=True
hdx-toolkit update --organization=healthsites --dataset_filter=*la* --hdx_site=stage --key=private --value=True
```

Then applying to all the datasets in the organization, those already updated are skipped:

```shell
hdx-toolkit list --organization=healthsites--dataset_filter=* --hdx_site=stage --key=private --value=True
hdx-toolkit update --organization=healthsites --dataset_filter=* --hdx_site=stage --key=private --value=True
hdx-toolkit [COMMAND] --help
```
The initial update takes approximately 10 seconds but subsequent updates in a list take only a couple of seconds.

A detailed walk through of commands can be found in the [DEMO.md](DEMO.md) file

Expand All @@ -101,6 +78,7 @@ This project users a GitHub Action to run tests and linting. It requires the fol

```
HDX_KEY - secret. Value: fake secret
HDX_KEY_STAGE - secret. Value: a live API key for the stage server
HDX_SITE - environment variable. Value: stage
USER_AGENT - environment variable. Value: hdx_cli_toolkit_gha
PREPREFIX - - environment variable. Value: [YOUR_organization]
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
[project]
name = "hdx_cli_toolkit"
version = "2024.2.1"
version = "2024.3.1"
description = "HDX CLI tool kit for commandline interaction with HDX"
readme = "README.md"
requires-python = ">=3.11"
Expand Down
106 changes: 93 additions & 13 deletions src/hdx_cli_toolkit/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@
import json
import os
import time
import yaml
import traceback

from collections.abc import Callable

import yaml
import click
from click.decorators import FC

Expand All @@ -27,7 +29,11 @@
make_conversion_func,
)

from hdx_cli_toolkit.hdx_utilities import add_showcase, configure_hdx_connection
from hdx_cli_toolkit.hdx_utilities import (
add_showcase,
configure_hdx_connection,
update_resource_in_hdx,
)


@click.group()
Expand Down Expand Up @@ -193,14 +199,27 @@ def update(
skip_validation=True,
ignore_check=True,
)
print(
f"{dataset['name']:<70.70}{old_value:<20.20}{str(dataset[key]):<20.20}"
f"{time.time()-t0:0.2f}",
flush=True,
)
except (HDXError, KeyError):
n_failures += 0
print(f"Could not update {dataset['name']}")
print(
f"{dataset['name']:<70.70}{old_value:<20.20}{str(dataset[key]):<20.20}"
f"{time.time()-t0:0.2f}",
flush=True,
)
if "Authorization Error" in traceback.format_exc():
print(
f"Could not update {dataset['name']} on '{hdx_site}' "
"because of an Authorization Error",
flush=True,
)
else:
print(f"Could not update {dataset['name']} on '{hdx_site}'", flush=True)
n_failures += 1

print(
f"{dataset['name']:<70.70}{old_value:<20.20}{old_value:<20.20}"
f"{time.time()-t0:0.2f}",
flush=True,
)

print(f"Changed {n_changed} values", flush=True)
print(f"{n_failures} failures as evidenced by HDXError", flush=True)
Expand Down Expand Up @@ -354,14 +373,14 @@ def show_configuration():
print(user_agents_file_contents, flush=True)

# Check Environment variables
environment_variables = ["HDX_KEY", "HDX_SITE", "HDX_URL"]
environment_variables = ["HDX_KEY", "HDX_KEY_STAGE", "HDX_SITE", "HDX_URL"]
click.secho(
"Values of relevant environment variables (used in absence of supplied values):", bold=True
)
for variable in environment_variables:
env_variable = os.getenv(variable)
if env_variable is not None:
if variable == "HDX_KEY":
if "HDX_KEY" in variable:
env_variable = censor_secret(env_variable)
print(f"{variable}:{env_variable}", flush=True)
else:
Expand Down Expand Up @@ -483,6 +502,67 @@ def showcase(
print(f"Showcase update took {time.time() - t0:.2f} seconds")


@hdx_toolkit.command(name="update_resource")
@click.option(
"--dataset_name",
is_flag=False,
default="*",
help="name of the dataset to update",
)
@click.option(
"--resource_name",
is_flag=False,
default="*",
help="name of the resource in the dataset to update",
)
@click.option(
"--hdx_site",
is_flag=False,
default="stage",
help="an hdx_site value {stage|prod}",
)
@click.option(
"--resource_file_path",
is_flag=False,
default="stage",
help="path to the resource file to upload",
)
@click.option(
"--live",
is_flag=True,
default=False,
help="if present then update to HDX is made, if absent then a dry run is done",
)
@click.option(
"--description",
is_flag=False,
default="new resource",
help="if the resource is to be added, rather than updated this provides the description",
)
def update_resource(
dataset_name: str = "",
resource_name: str = "",
hdx_site: str = "stage",
resource_file_path: str = "",
live: bool = False,
description: str = "new resource",
):
"""Update a resource in HDX"""
print_banner("Update resource")
print(
f"Updating/adding '{resource_name}' in '{dataset_name}' "
f"with file at '{resource_file_path}'"
)
t0 = time.time()
statuses = update_resource_in_hdx(
dataset_name, resource_name, hdx_site, resource_file_path, live, description=description
)
for status in statuses:
print(status, flush=True)

print(f"Resource update took {time.time() - t0:.2f} seconds")


def get_filtered_datasets(
organization: str = "",
dataset_filter: str = "*",
Expand Down Expand Up @@ -560,10 +640,10 @@ def decorate_dataset_with_extras(dataset: Dataset) -> dict:
resource_dict = resource.data
if "fs_check_info" in resource_dict:
resource_dict["fs_check_info"] = json.loads(resource_dict["fs_check_info"])
quickcharts = ResourceView.get_all_for_resource(resource_dict["id"])
dataset_quickcharts = ResourceView.get_all_for_resource(resource_dict["id"])
resource_dict["quickcharts"] = []
if quickcharts is not None:
for quickchart in quickcharts:
for quickchart in dataset_quickcharts:
quickchart_dict = quickchart.data
if "hxl_preview_config" in quickchart_dict:
quickchart_dict["hxl_preview_config"] = json.loads(
Expand Down
Loading