Skip to content

Commit

Permalink
Merge pull request #84 from minus34/202405
Browse files Browse the repository at this point in the history
202405
  • Loading branch information
minus34 authored May 21, 2024
2 parents 051097e + 8a2aed8 commit f102ea7
Show file tree
Hide file tree
Showing 41 changed files with 177 additions and 171 deletions.
16 changes: 8 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,8 +51,8 @@ The behaviour of gnaf-loader can be controlled by specifying various command lin

#### Optional Arguments
* `--srid` Sets the coordinate system of the input data. Valid values are `4283` (the default: GDA94 lat/long) and `7844` (GDA2020 lat/long).
* `--geoscape-version` Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. `202402`.
* `--previous-geoscape-version` Previous Geoscape release version number as YYYYMM; used for QA comparison. e.g. `202311`.
* `--geoscape-version` Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. `202405`.
* `--previous-geoscape-version` Previous Geoscape release version number as YYYYMM; used for QA comparison. e.g. `202402`.
* `--raw-gnaf-schema` schema name to store raw GNAF tables in. Defaults to `raw_gnaf_<geoscape_version>`.
* `--raw-admin-schema` schema name to store raw admin boundary tables in. Defaults to `raw_admin_bdys_<geoscape_version>`.
* `--gnaf-schema` destination schema name to store final GNAF tables in. Defaults to `gnaf_<geoscape_version>`.
Expand All @@ -69,7 +69,7 @@ The behaviour of gnaf-loader can be controlled by specifying various command lin
* `--no-boundary-tag` DO NOT tag all addresses with some of the key admin boundary IDs for creating aggregates and choropleth maps.

### Example Command Line Arguments
* Local Postgres server: `python load-gnaf.py --gnaf-tables-path="C:\temp\geoscape_202402\G-NAF" --admin-bdys-path="C:\temp\geoscape_202402\Administrative Boundaries"` Loads the GNAF tables to a Postgres server running locally. GNAF archives have been extracted to the folder `C:\temp\geoscape_202402\G-NAF`, and admin boundaries have been extracted to the `C:\temp\geoscape_202402\Administrative Boundaries` folder.
* Local Postgres server: `python load-gnaf.py --gnaf-tables-path="C:\temp\geoscape_202405\G-NAF" --admin-bdys-path="C:\temp\geoscape_202405\Administrative Boundaries"` Loads the GNAF tables to a Postgres server running locally. GNAF archives have been extracted to the folder `C:\temp\geoscape_202405\G-NAF`, and admin boundaries have been extracted to the `C:\temp\geoscape_202405\Administrative Boundaries` folder.
* Remote Postgres server: `python load-gnaf.py --gnaf-tables-path="\\svr\shared\gnaf" --local-server-dir="f:\shared\gnaf" --admin-bdys-path="c:\temp\unzipped\AdminBounds_ESRI"` Loads the GNAF tables which have been extracted to the shared folder `\\svr\shared\gnaf`. This shared folder corresponds to the local `f:\shared\gnaf` folder on the Postgres server. Admin boundaries have been extracted to the `c:\temp\unzipped\AdminBounds_ESRI` folder.
* Loading only selected states: `python load-gnaf.py --states VIC TAS NT ...` Loads only the data for Victoria, Tasmania and Northern Territory

Expand Down Expand Up @@ -117,8 +117,8 @@ Should take 15-60 minutes.
- A knowledge of [Postgres pg_restore parameters](https://www.postgresql.org/docs/14/app-pgrestore.html)

### Process
1. Download the [GNAF dump file](https://minus34.com/opendata/geoscape-202402/gnaf-202402.dmp) or [GNAF GDA2020 dump file](https://minus34.com/opendata/geoscape-202402-gda2020/gnaf-202402.dmp) (~2.0Gb)
2. Download the [Admin Bdys dump file](https://minus34.com/opendata/geoscape-202402/admin-bdys-202402.dmp) or [Admin Bdys GDA2020 dump file](https://minus34.com/opendata/geoscape-202402-gda2020/admin-bdys-202402.dmp) (~2.8Gb)
1. Download the [GNAF dump file](https://minus34.com/opendata/geoscape-202405/gnaf-202405.dmp) or [GNAF GDA2020 dump file](https://minus34.com/opendata/geoscape-202405-gda2020/gnaf-202405.dmp) (~2.0Gb)
2. Download the [Admin Bdys dump file](https://minus34.com/opendata/geoscape-202405/admin-bdys-202405.dmp) or [Admin Bdys GDA2020 dump file](https://minus34.com/opendata/geoscape-202405-gda2020/admin-bdys-202405.dmp) (~2.8Gb)
3. Edit the _restore-gnaf-admin-bdys.bat_ or _.sh_ script in the supporting-files folder for your dump file names, database parameters and for the location of pg_restore
5. Run the script, come back in 15-60 minutes and enjoy!

Expand All @@ -127,11 +127,11 @@ Geoparquet versions of the spatial tables, as well as parquet versions of the no

Geometries have WGS84 lat/long coordinates (SRID/EPSG:4326). A sample query for analysing the data using [Apache Sedona](https://sedona.apache.org/), the spatial extension to [Apache Spark](https://spark.apache.org/) is in the `spark` folder.

The files are here: `s3://minus34.com/opendata/geoscape-202402/geoparquet/`
The files are here: `s3://minus34.com/opendata/geoscape-202405/geoparquet/`

### AWS CLI Examples:
- List all datasets: `aws s3 ls s3://minus34.com/opendata/geoscape-202402/geoparquet/`
- Copy all datasets: `aws s3 sync s3://minus34.com/opendata/geoscape-202402/geoparquet/ <my-local-folder>`
- List all datasets: `aws s3 ls s3://minus34.com/opendata/geoscape-202405/geoparquet/`
- Copy all datasets: `aws s3 sync s3://minus34.com/opendata/geoscape-202405/geoparquet/ <my-local-folder>`

## DATA LICENSES

Expand Down
20 changes: 10 additions & 10 deletions docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
FROM debian:bookworm-slim

# replaced the downloading of the Potgres dump files to use local files instead (for performance)
# ARG BASE_URL="https://minus34.com/opendata/geoscape-202402"
# ARG BASE_URL="https://minus34.com/opendata/geoscape-202405"
# ENV BASE_URL ${BASE_URL}

# Postgres user password - WARNING: change this to something a lot more secure
Expand All @@ -13,7 +13,7 @@ ENV DEBIAN_FRONTEND=noninteractive
RUN apt-get update \
&& apt-get install -y sudo wget gnupg2 \
&& wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - \
&& echo "deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list \
&& echo "deb http://apt.postgresql.org/pub/repos/apt/ bookworm-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list \
&& apt-get update \
&& apt-get install -y postgresql-15 postgresql-client-15 postgis postgresql-15-postgis-3 \
&& apt-get autoremove -y --purge \
Expand All @@ -33,23 +33,23 @@ RUN echo "listen_addresses='*'" >> /etc/postgresql/15/main/postgresql.conf
RUN mkdir -p /data
WORKDIR /data

ADD gnaf-202402.dmp .
ADD admin-bdys-202402.dmp .
ADD gnaf-202405.dmp .
ADD admin-bdys-202405.dmp .

# replace the add statements above if wanting to download Postgres dump files
# RUN /data \
# && wget --quiet ${BASE_URL}/gnaf-202402.dmp \
# && wget --quiet ${BASE_URL}/admin-bdys-202402.dmp
# && wget --quiet ${BASE_URL}/gnaf-202405.dmp \
# && wget --quiet ${BASE_URL}/admin-bdys-202405.dmp

RUN /etc/init.d/postgresql start \
&& pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/gnaf-202402.dmp \
&& pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/gnaf-202405.dmp \
&& /etc/init.d/postgresql stop \
&& rm /data/gnaf-202402.dmp
&& rm /data/gnaf-202405.dmp

RUN /etc/init.d/postgresql start \
&& pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/admin-bdys-202402.dmp \
&& pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/admin-bdys-202405.dmp \
&& /etc/init.d/postgresql stop \
&& rm /data/admin-bdys-202402.dmp
&& rm /data/admin-bdys-202405.dmp

EXPOSE 5432

Expand Down
2 changes: 1 addition & 1 deletion docker/xx_code_snippets.sh
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
cd /Users/$(whoami)/git/minus34/gnaf-loader/docker

# build gnaf loader image
docker build --squash --tag minus34/gnafloader:latest --tag minus34/gnafloader:202402 .
docker build --squash --tag minus34/gnafloader:latest --tag minus34/gnafloader:202405 .

# run gnaf loader container
docker run --name=gnafloader --publish=5433:5432 minus34/gnafloader:latest
Expand Down
4 changes: 3 additions & 1 deletion load-gnaf.py
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,7 @@ def populate_raw_gnaf(pg_cur):
# load all PSV files using multiprocessing
geoscape.multiprocess_list("sql", sql_list, logger)

# fix missing geocodes (added due to missing data in 202402 release)
# fix missing geocodes (added due to missing data in 202405 release)
sql = geoscape.open_sql_file("01-04-raw-gnaf-fix-missing-geocodes.sql")
pg_cur.execute(sql)

Expand Down Expand Up @@ -382,6 +382,8 @@ def load_raw_admin_boundaries(pg_cur):
# are there any files to load?
if len(create_list) == 0:
logger.fatal("No admin boundary files found\nACTION: Check your 'admin-bdys-path' argument")
pg_cur.close()
quit()
else:
# load files in separate processes
geoscape.multiprocess_shapefile_load(create_list, logger)
Expand Down
2 changes: 1 addition & 1 deletion postgres-scripts/01-04-raw-gnaf-fix-missing-geocodes.sql
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
-- workaround for missing default coordinates - 202402 release issue
-- workaround for missing default coordinates - 202405 release issue
with missing as (
select address_detail_pid
from raw_gnaf.address_default_geocode
Expand Down
4 changes: 2 additions & 2 deletions postgres-scripts/02-02a-prep-admin-bdys-tables.sql
Original file line number Diff line number Diff line change
Expand Up @@ -203,10 +203,10 @@ UPDATE admin_bdys.locality_bdys
;


-- -- add old locality_pids to unedited localities -- need to rollover old locality pids from GNAF 202402 release - not supplied in 202402 release
-- -- add old locality_pids to unedited localities -- need to rollover old locality pids from GNAF 202405 release - not supplied in 202405 release
-- UPDATE admin_bdys.locality_bdys as new
-- SET old_locality_pid = old.old_locality_pid
-- FROM admin_bdys_202402.locality_bdys AS old
-- FROM admin_bdys_202405.locality_bdys AS old
-- WHERE new.locality_pid = old.locality_pid;


Expand Down
2 changes: 1 addition & 1 deletion postgres-scripts/xx-04-02-manual-bdy-tags.sql
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@


-- fix 35 boatsheds
update gnaf_202402.address_principal_admin_boundaries
update gnaf_202405.address_principal_admin_boundaries
set lga_pid = 'lgacbffb11990f2',
lga_name = 'Hobart City'
where locality_pid = 'loc0f7a581b85b7'
Expand Down
4 changes: 2 additions & 2 deletions postgres-scripts/xx-add-elevation-to-gnaf.sql
Original file line number Diff line number Diff line change
Expand Up @@ -43,7 +43,7 @@ DROP TABLE IF EXISTS temp_gnaf_100m_points;
--
-- SELECT ST_Value(dem.rast, gnaf.geom) as elevation,
-- *
-- FROM gnaf_202402.address_principals as gnaf
-- INNER JOIN gnaf_202402.srtm_3s_dem as dem on ST_Intersects(gnaf.geom, dem.rast) limit 100;
-- FROM gnaf_202405.address_principals as gnaf
-- INNER JOIN gnaf_202405.srtm_3s_dem as dem on ST_Intersects(gnaf.geom, dem.rast) limit 100;


Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ SELECT als.gnaf_pid, als.street_locality_pid, als.locality_pid, als.alias_princi
ST_MakePoint(als.longitude, als.latitude)::geography,
ST_MakePoint(gnaf.longitude, gnaf.latitude)::geography
) as distance
FROM gnaf_202402.address_aliases as als
INNER JOIN gnaf_202402.address_alias_lookup as lkp on als.gnaf_pid = lkp.alias_pid
INNER JOIN gnaf_202402.address_principals as gnaf on lkp.principal_pid = gnaf.gnaf_pid
FROM gnaf_202405.address_aliases as als
INNER JOIN gnaf_202405.address_alias_lookup as lkp on als.gnaf_pid = lkp.alias_pid
INNER JOIN gnaf_202405.address_principals as gnaf on lkp.principal_pid = gnaf.gnaf_pid
WHERE als.latitude <> gnaf.latitude
OR als.longitude <> als.longitude
order by ST_Distance(
Expand Down
2 changes: 1 addition & 1 deletion postgres-scripts/xx-export-address-principals-to-csv.sql
Original file line number Diff line number Diff line change
Expand Up @@ -6,5 +6,5 @@ COPY (
address, locality_name, postcode, state, locality_postcode, confidence,
legal_parcel_id, mb_2016_code, mb_2021_code, latitude, longitude,
geocode_type, reliability
FROM gnaf_202402.address_principals
FROM gnaf_202405.address_principals
) TO '/Users/hugh.saalmans/tmp/address_principals.psv' HEADER CSV;
12 changes: 6 additions & 6 deletions postgres-scripts/xx-get-population-per-gnafpid.sql
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
--WITH counts AS (
-- SELECT mb_2016_code,
-- count(*) AS address_count
-- FROM gnaf_202402.address_principals
-- FROM gnaf_202405.address_principals
-- GROUP BY mb_2016_code
--)
--UPDATE testing.mb_2016_counts AS mb
Expand All @@ -35,7 +35,7 @@
---- add geoms
--UPDATE testing.mb_2016_counts AS mb
-- SET geom = bdys.geom
-- FROM admin_bdys_202402.abs_2016_mb as bdys
-- FROM admin_bdys_202405.abs_2016_mb as bdys
-- WHERE mb.mb_2016_code = bdys.mb_16code::bigint;
--
--ANALYSE testing.mb_2016_counts;
Expand All @@ -58,7 +58,7 @@ SELECT gnaf.gnaf_pid,
mb.person,
mb.address_count,
gnaf.geom
FROM gnaf_202402.address_principals as gnaf
FROM gnaf_202405.address_principals as gnaf
INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code
WHERE mb.address_count >= mb.dwelling
AND mb.dwelling > 0
Expand Down Expand Up @@ -92,7 +92,7 @@ SELECT gnaf.gnaf_pid,
mb.address_count,
gnaf.geom,
generate_series(1, ceiling(mb.dwelling::float / mb.address_count::float)::integer) as duplicate_number
FROM gnaf_202402.address_principals as gnaf
FROM gnaf_202405.address_principals as gnaf
INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code
WHERE mb.address_count < mb.dwelling
AND address_count > 0
Expand Down Expand Up @@ -219,7 +219,7 @@ WITH adr AS (
mb.person,
mb.address_count,
gnaf.geom
FROM gnaf_202402.address_principals as gnaf
FROM gnaf_202405.address_principals as gnaf
INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code
WHERE mb.address_count >= mb.person
AND mb.dwelling = 0
Expand Down Expand Up @@ -253,7 +253,7 @@ WITH adr AS (
mb.address_count,
gnaf.geom,
generate_series(1, ceiling(mb.person::float / mb.address_count::float)::integer) as duplicate_number
FROM gnaf_202402.address_principals as gnaf
FROM gnaf_202405.address_principals as gnaf
INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code
WHERE mb.address_count < mb.person
AND mb.address_count > 0
Expand Down
10 changes: 5 additions & 5 deletions postgres-scripts/xx_calculate_partitions.sql
Original file line number Diff line number Diff line change
Expand Up @@ -5,13 +5,13 @@ CREATE TABLE testing2.gnaf_partitions AS
WITH parts AS(
SELECT unnest((select array_agg(counter) from generate_series(1, 99, 1) AS counter)) as partition_id,
unnest((select array_agg(fraction) from generate_series(0.01, 0.99, 0.01) AS fraction)) as percentile,
unnest((select percentile_cont((select array_agg(s) from generate_series(0.01, 0.99, 0.01) as s)) WITHIN GROUP (ORDER BY longitude) FROM gnaf_202402.address_principals)) as longitude
unnest((select percentile_cont((select array_agg(s) from generate_series(0.01, 0.99, 0.01) as s)) WITHIN GROUP (ORDER BY longitude) FROM gnaf_202405.address_principals)) as longitude
), parts2 AS (
SELECT 0 AS partition_id, 0.0 AS percentile, min(longitude) - 0.0001 AS longitude FROM gnaf_202402.address_principals
SELECT 0 AS partition_id, 0.0 AS percentile, min(longitude) - 0.0001 AS longitude FROM gnaf_202405.address_principals
UNION ALL
SELECT * FROM parts
UNION ALL
SELECT 100 AS partition_id, 1.0 AS percentile, max(longitude) - 0.0001 AS longitude FROM gnaf_202402.address_principals
SELECT 100 AS partition_id, 1.0 AS percentile, max(longitude) - 0.0001 AS longitude FROM gnaf_202405.address_principals
)
SELECT partition_id,
percentile,
Expand Down Expand Up @@ -43,7 +43,7 @@ WITH merge AS (
name,
state,
st_intersection(bdy.geom, part.geom) AS geom
FROM admin_bdys_202402.commonwealth_electorates as bdy
FROM admin_bdys_202405.commonwealth_electorates as bdy
INNER JOIN testing2.gnaf_partitions as part ON st_intersects(bdy.geom, part.geom)
)
INSERT INTO testing2.commonwealth_electorates_partitioned (partition_id, ce_pid, name, state, geom)
Expand All @@ -65,4 +65,4 @@ commit;

select count(*) from testing2.commonwealth_electorates_partitioned;

select count(*) from admin_bdys_202402.commonwealth_electorates_analysis;
select count(*) from admin_bdys_202405.commonwealth_electorates_analysis;
4 changes: 2 additions & 2 deletions postgres-scripts/xx_qa_table_counts.sql
Original file line number Diff line number Diff line change
Expand Up @@ -3,14 +3,14 @@ SELECT new.table_name,
new.aus - old.aus as difference,
new.aus as new_aus,
old.aus as old_aus
FROM gnaf_202402.qa as new
FROM gnaf_202405.qa as new
INNER JOIN gnaf_202102.qa as old ON new.table_name = old.table_name
;

SELECT new.table_name,
new.aus - old.aus as difference,
new.aus as new_aus,
old.aus as old_aus
FROM admin_bdys_202402.qa as new
FROM admin_bdys_202405.qa as new
INNER JOIN admin_bdys_202102.qa as old ON new.table_name = old.table_name
;
14 changes: 7 additions & 7 deletions postgres-scripts/xx_test_state_electorates.sql
Original file line number Diff line number Diff line change
Expand Up @@ -2,27 +2,27 @@



DROP VIEW IF EXISTS raw_admin_bdys_202402.vw_tenp_state_electorates;
CREATE VIEW raw_admin_bdys_202402.vw_tenp_state_electorates AS
DROP VIEW IF EXISTS raw_admin_bdys_202405.vw_tenp_state_electorates;
CREATE VIEW raw_admin_bdys_202405.vw_tenp_state_electorates AS
SELECT dat.*,
aut.name,
bdy.se_ply_pid,
bdy.geom
FROM raw_admin_bdys_202402.aus_state_electoral as dat
INNER JOIN raw_admin_bdys_202402.aus_state_electoral_class_aut as aut on dat.secl_code = aut.code
INNER JOIN raw_admin_bdys_202402.aus_state_electoral_polygon as bdy on dat.se_pid = bdy.se_pid
FROM raw_admin_bdys_202405.aus_state_electoral as dat
INNER JOIN raw_admin_bdys_202405.aus_state_electoral_class_aut as aut on dat.secl_code = aut.code
INNER JOIN raw_admin_bdys_202405.aus_state_electoral_polygon as bdy on dat.se_pid = bdy.se_pid
-- where name = 'KEW'
;

select * from raw_admin_bdys_202402.vw_tenp_state_electorates
select * from raw_admin_bdys_202405.vw_tenp_state_electorates
where name = 'KEW'
order by se_pid,
dt_create
;



select * from raw_admin_bdys_202402.aus_state_electoral_polygon
select * from raw_admin_bdys_202405.aus_state_electoral_polygon
where se_pid = 'VIC292'
order by se_pid,
dt_create
Expand Down
Loading

0 comments on commit f102ea7

Please sign in to comment.