-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GTC-3081: Add political/id-lookup endpoint #616
Changes from 26 commits
b94f3fa
505cc16
0514d42
9d2ae13
44a253f
9c2e0ca
899e772
d633596
7f5e9f3
2f2facd
15de81a
40f7772
2cee550
e07c4f4
bb69f18
1b95ca2
79ae7c3
09e628e
2934fd6
e09cf01
2d979a2
0c6d541
ed5f2cd
34c41f8
1952fc0
c6384fd
4aef63c
0f80b9e
b23bf7f
275ff6e
53ddba5
3bc92f5
fbacd70
9c5fc87
849d68a
4668790
68f6590
d35bb02
0bdb3e5
a9cf2df
83aa0a0
0effc5b
84bf869
af3c244
498ecc6
d9e0cda
55e1a2f
ffe9b6b
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,2 @@ | ||
"""Here live a number of endpoints which provide higher-level services | ||
mostly intended to make life easier for consumers of the Data API.""" |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,216 @@ | ||
import re | ||
from typing import Any, Dict, List, Optional | ||
|
||
from fastapi import APIRouter, HTTPException, Query | ||
from unidecode import unidecode | ||
|
||
from app.crud.versions import get_version, get_version_names | ||
from app.errors import RecordNotFoundError | ||
from app.models.pydantic.responses import Response | ||
from app.routes import VERSION_REGEX | ||
from app.routes.datasets.queries import _query_dataset_json | ||
|
||
router = APIRouter() | ||
|
||
|
||
@router.get( | ||
"/geoencode", | ||
tags=["Geoencoder"], | ||
status_code=200, | ||
) | ||
async def geoencode( | ||
*, | ||
admin_source: str = Query( | ||
"GADM", description="The source of administrative boundaries to use." | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Should we document that right now the only option is "GADM"? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
), | ||
admin_version: str = Query( | ||
..., | ||
description="Version of the administrative boundaries dataset to use.", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Similar, is there a way to document the choices available? I guess this may get more confusing if ever have multiple providers There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Wild idea: should we consolidate admin version and admin dataset to one field, and have the options be like: "GADM 3.6", "GADM 4.1", "geoBoundaries 1.0", "middleEarth 3.2". Then it'll be clear what you're getting from a set of options. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Combining the provider and version has merit ( However, we'd be encoding a string with special information (i.e., provider That's my two cents. |
||
), | ||
country: str = Query( | ||
description="Name of the country to match.", | ||
), | ||
region: Optional[str] = Query( | ||
None, | ||
description="Name of the region to match.", | ||
), | ||
subregion: Optional[str] = Query( | ||
None, | ||
description="Name of the subregion to match.", | ||
), | ||
normalize_search: bool = Query( | ||
True, | ||
description="Whether or not to perform a case- and accent-insensitive search.", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Add "Default is to perform case- and accent-insensitive search". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Think I need to even with the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. OK, no, no change needed, I see now that the Default value (if not None) is displayed in the documentation. |
||
), | ||
): | ||
"""Look up administrative boundary IDs matching a specified country name | ||
(and region name and subregion names, if specified). | ||
""" | ||
admin_source_to_dataset: Dict[str, str] = {"GADM": "gadm_administrative_boundaries"} | ||
|
||
try: | ||
dataset: str = admin_source_to_dataset[admin_source.upper()] | ||
except KeyError: | ||
raise HTTPException( | ||
status_code=400, | ||
detail=( | ||
"Invalid admin boundary source. Valid sources:" | ||
f" {[source for source in admin_source_to_dataset.keys()]}" | ||
), | ||
) | ||
|
||
version_str: str = "v" + str(admin_version).lstrip("v") | ||
|
||
await version_is_valid(dataset, version_str) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Like mentioned about the documentation above, should it be well known in advance which versions we support for providers, rather than just trying and throwing back an error? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I fundamentally agree that your suggestion is a good one, but it turns out to be difficult to do in practice. Especially considering the limited use of this endpoint and the fact that we will be omitting it from the docs. |
||
|
||
names: List[str | None] = sanitize_names( | ||
normalize_search, country, region, subregion | ||
) | ||
|
||
adm_level: int = determine_admin_level(*names) | ||
|
||
sql: str = _admin_boundary_lookup_sql( | ||
adm_level, normalize_search, admin_source, *names | ||
) | ||
|
||
json_data: List[Dict[str, Any]] = await _query_dataset_json( | ||
dataset, version_str, sql, None | ||
) | ||
|
||
return Response( | ||
data={ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This might be cleaner as a pydantic model instead of a dict There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Made a model, and a helper function to construct the response in hopefully a clear way. |
||
"adminSource": admin_source, | ||
"adminVersion": admin_version, | ||
"matches": [ | ||
{ | ||
"country": { | ||
"id": match["gid_0"].rsplit("_")[0], | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. A function for getting the ID here would be easier to read There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done! |
||
"name": match["country"], | ||
}, | ||
"region": { | ||
"id": ( | ||
(match["gid_1"].rsplit("_")[0]).split(".")[1] | ||
if adm_level >= 1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nitpick, I personally find ternary operators declared inside dicts confusing to read There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think they're all gone! |
||
else None | ||
), | ||
"name": match["name_1"] if adm_level >= 1 else None, | ||
}, | ||
"subregion": { | ||
"id": ( | ||
(match["gid_2"].rsplit("_")[0]).split(".")[2] | ||
if adm_level >= 2 | ||
else None | ||
), | ||
"name": match["name_2"] if adm_level >= 2 else None, | ||
}, | ||
} | ||
for match in json_data | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. same with nested list comprehensions as above There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Banished! |
||
], | ||
} | ||
) | ||
|
||
|
||
def sanitize_names( | ||
normalize_search: bool, | ||
country: str | None, | ||
region: str | None, | ||
subregion: str | None, | ||
) -> List[str | None]: | ||
"""Turn any empty strings into Nones, enforces the admin level hierarchy, | ||
and optionally unaccents and decapitalizes names. | ||
""" | ||
names = [] | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The type checker is happier if you change this to: names: List[str | None] = [] There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Done, thanks! |
||
|
||
if subregion and not region: | ||
raise HTTPException( | ||
status_code=400, | ||
detail="If subregion is specified, region must be specified as well.", | ||
) | ||
|
||
for name in (country, region, subregion): | ||
if name and normalize_search: | ||
names.append(unidecode(name).lower()) | ||
elif name: | ||
names.append(name) | ||
else: | ||
names.append(None) | ||
return names | ||
|
||
|
||
def determine_admin_level( | ||
country: str | None, region: str | None, subregion: str | None | ||
) -> int: | ||
"""Infer the native admin level of a request based on the presence of | ||
non-empty fields | ||
""" | ||
if subregion: | ||
return 2 | ||
elif region: | ||
return 1 | ||
elif country: | ||
return 0 | ||
else: # Shouldn't get here if FastAPI route definition worked | ||
raise HTTPException(status_code=400, detail="Country MUST be specified.") | ||
|
||
|
||
def _admin_boundary_lookup_sql( | ||
adm_level: int, | ||
normalize_search: bool, | ||
dataset: str, | ||
country_name: str, | ||
region_name: str | None, | ||
subregion_name: str | None, | ||
) -> str: | ||
"""Generate the SQL required to look up administrative boundary | ||
IDs by name. | ||
""" | ||
name_fields: List[str] = ["country", "name_1", "name_2"] | ||
if normalize_search: | ||
match_name_fields = [name_field + "_normalized" for name_field in name_fields] | ||
else: | ||
match_name_fields = name_fields | ||
|
||
sql = ( | ||
f"SELECT gid_0, gid_1, gid_2, {name_fields[0]}, {name_fields[1]}, {name_fields[2]}" | ||
f" FROM {dataset} WHERE {match_name_fields[0]}='{country_name}'" | ||
) | ||
if region_name is not None: | ||
sql += f" AND {match_name_fields[1]}='{region_name}'" | ||
if subregion_name is not None: | ||
sql += f" AND {match_name_fields[2]}='{subregion_name}'" | ||
|
||
sql += f" AND adm_level='{adm_level}'" | ||
|
||
return sql | ||
|
||
|
||
async def version_is_valid( | ||
dataset: str, | ||
version: str, | ||
) -> None: | ||
"""Validate a version string for a given dataset.""" | ||
# Note: At some point I intend to change the version validator to | ||
# use messaging like this. However, that is out of scope for the | ||
# current ticket, and I want to see what people think, so I'll | ||
# keep it here for now. | ||
if re.match(VERSION_REGEX, version) is None: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. per above, maybe just have pre-validated versions rather than checking the version There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Now specified on a per-env basis! |
||
raise HTTPException( | ||
status_code=400, | ||
detail=( | ||
"Invalid version name. Version names begin with a 'v' and " | ||
"consist of one to three integers separated by periods. " | ||
"eg. 'v1', 'v7.1', 'v4.1.0', 'v20240801'" | ||
), | ||
) | ||
|
||
try: | ||
_ = await get_version(dataset, version) | ||
except RecordNotFoundError: | ||
raise HTTPException( | ||
status_code=400, | ||
detail=( | ||
"Version not found. Existing versions for this dataset " | ||
f"include {[v[0] for v in await get_version_names(dataset)]}" | ||
# FIXME: Maybe change get_version_names to do unpacking? ^ | ||
), | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmannarino I know you said you didn't like thematic. I was thinking of other ideas, is this admin area specific endpoints, maybe we just put it under like something like "political"? E.g.
/political/geoencoder
. Then the future can include additional GADM endpoints, WDPA, concessions, etc. Not sure if we need to distinguish it from the rest of the API, or we can just have it all in under a header in the docs.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I do think
geoencoder
is a little vague if it only works for admin areas, geoencoding implies converting any text of the place into coordinates: https://en.wikipedia.org/wiki/Address_geocodingThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to /political/id-lookup