diff --git a/CHANGELOG.md b/CHANGELOG.md index 4545d97..03e4f67 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -1,3 +1,7 @@ +### February 2023 Release +- Postgres dump files are now built on Postgres 14. Requiring Postgres 14+ to use them +- Docker images have been upgraded to Postgres 15 + ### August 2022 Release - Docker images have been upgraded to Postgres 14 diff --git a/README.md b/README.md index bb45018..1204775 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ Have a look at [these intro slides](https://minus34.com/opendata/intro-to-gnaf.p ### There are 4 options for loading the data 1. [Run](https://github.com/minus34/gnaf-loader#option-1---run-loadgnafpy) the load-gnaf Python script and build the database yourself in a single step 2. [Pull](https://github.com/minus34/gnaf-loader#option-2---run-the-database-in-a-docker-container) the database from Docker Hub and run it in a container -3. [Download](https://github.com/minus34/gnaf-loader#option-3---load-pg_dump-files) the GNAF and/or Admin Bdys Postgres dump files & restore them in your Postgres 13+ database +3. [Download](https://github.com/minus34/gnaf-loader#option-3---load-pg_dump-files) the GNAF and/or Admin Bdys Postgres dump files & restore them in your Postgres 14+ database 4. [Use or download](https://github.com/minus34/gnaf-loader#option-4---parquet-files-in-s3) Parquet Files in S3 for your data & analytics workflows; either in AWS or your own platform. ## Option 1 - Run load.gnaf.py @@ -51,7 +51,7 @@ The behaviour of gnaf-loader can be controlled by specifying various command lin #### Optional Arguments * `--srid` Sets the coordinate system of the input data. Valid values are `4283` (the default: GDA94 lat/long) and `7844` (GDA2020 lat/long). -* `--geoscape-version` Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. `202211`. +* `--geoscape-version` Geoscape version number in YYYYMM format. Defaults to current year and last release month. e.g. `202302`. * `--raw-gnaf-schema` schema name to store raw GNAF tables in. Defaults to `raw_gnaf_`. * `--raw-admin-schema` schema name to store raw admin boundary tables in. Defaults to `raw_admin_bdys_`. * `--gnaf-schema` destination schema name to store final GNAF tables in. Defaults to `gnaf_`. @@ -66,7 +66,7 @@ The behaviour of gnaf-loader can be controlled by specifying various command lin * `--no-boundary-tag` DO NOT tag all addresses with some of the key admin boundary IDs for creating aggregates and choropleth maps. ### Example Command Line Arguments -* Local Postgres server: `python load-gnaf.py --gnaf-tables-path="C:\temp\geoscape_202211\G-NAF" --admin-bdys-path="C:\temp\geoscape_202211\Administrative Boundaries"` Loads the GNAF tables to a Postgres server running locally. GNAF archives have been extracted to the folder `C:\temp\geoscape_202211\G-NAF`, and admin boundaries have been extracted to the `C:\temp\geoscape_202211\Administrative Boundaries` folder. +* Local Postgres server: `python load-gnaf.py --gnaf-tables-path="C:\temp\geoscape_202302\G-NAF" --admin-bdys-path="C:\temp\geoscape_202302\Administrative Boundaries"` Loads the GNAF tables to a Postgres server running locally. GNAF archives have been extracted to the folder `C:\temp\geoscape_202302\G-NAF`, and admin boundaries have been extracted to the `C:\temp\geoscape_202302\Administrative Boundaries` folder. * Remote Postgres server: `python load-gnaf.py --gnaf-tables-path="\\svr\shared\gnaf" --local-server-dir="f:\shared\gnaf" --admin-bdys-path="c:\temp\unzipped\AdminBounds_ESRI"` Loads the GNAF tables which have been extracted to the shared folder `\\svr\shared\gnaf`. This shared folder corresponds to the local `f:\shared\gnaf` folder on the Postgres server. Admin boundaries have been extracted to the `c:\temp\unzipped\AdminBounds_ESRI` folder. * Loading only selected states: `python load-gnaf.py --states VIC TAS NT ...` Loads only the data for Victoria, Tasmania and Northern Territory @@ -110,12 +110,12 @@ Download Postgres dump files and restore them in your database. Should take 15-60 minutes. ### Pre-requisites -- Postgres 13+ with PostGIS 3.0+ -- A knowledge of [Postgres pg_restore parameters](https://www.postgresql.org/docs/13/app-pgrestore.html) +- Postgres 14+ with PostGIS 3.0+ +- A knowledge of [Postgres pg_restore parameters](https://www.postgresql.org/docs/14/app-pgrestore.html) ### Process -1. Download the [GNAF dump file](https://minus34.com/opendata/geoscape-202211/gnaf-202211.dmp) or [GNAF GDA2020 dump file](https://minus34.com/opendata/geoscape-202211-gda2020/gnaf-202211.dmp) (~2.0Gb) -2. Download the [Admin Bdys dump file](https://minus34.com/opendata/geoscape-202211/admin-bdys-202211.dmp) or [Admin Bdys GDA2020 dump file](https://minus34.com/opendata/geoscape-202211-gda2020/admin-bdys-202211.dmp) (~2.8Gb) +1. Download the [GNAF dump file](https://minus34.com/opendata/geoscape-202302/gnaf-202302.dmp) or [GNAF GDA2020 dump file](https://minus34.com/opendata/geoscape-202302-gda2020/gnaf-202302.dmp) (~2.0Gb) +2. Download the [Admin Bdys dump file](https://minus34.com/opendata/geoscape-202302/admin-bdys-202302.dmp) or [Admin Bdys GDA2020 dump file](https://minus34.com/opendata/geoscape-202302-gda2020/admin-bdys-202302.dmp) (~2.8Gb) 3. Edit the _restore-gnaf-admin-bdys.bat_ or _.sh_ script in the supporting-files folder for your dump file names, database parameters and for the location of pg_restore 5. Run the script, come back in 15-60 minutes and enjoy! @@ -124,11 +124,11 @@ Parquet versions of all the tables are in a public S3 bucket for use directly in Geometries are stored as Well Known Text (WKT) strings with WGS84 lat/long coordinates (SRID/EPSG:4326). They can be queried using spatial extensions to analytical platforms, such as [Apache Sedona](https://sedona.apache.org/) running on [Apache Spark](https://spark.apache.org/). -The files are here: `s3://minus34.com/opendata/geoscape-202211/parquet/` or `s3://minus34.com/opendata/geoscape-202211-gda2020/parquet/` +The files are here: `s3://minus34.com/opendata/geoscape-202302/parquet/` or `s3://minus34.com/opendata/geoscape-202302-gda2020/parquet/` ### AWS CLI Examples: -- List all datasets: `aws s3 ls s3://minus34.com/opendata/geoscape-202211/parquet/` -- Copy all datasets: `aws s3 sync s3://minus34.com/opendata/geoscape-202211/parquet/ ` +- List all datasets: `aws s3 ls s3://minus34.com/opendata/geoscape-202302/parquet/` +- Copy all datasets: `aws s3 sync s3://minus34.com/opendata/geoscape-202302/parquet/ ` ## DATA LICENSES diff --git a/docker/Dockerfile b/docker/Dockerfile index b263879..7a33ef5 100644 --- a/docker/Dockerfile +++ b/docker/Dockerfile @@ -1,6 +1,6 @@ FROM debian:buster-slim -ARG BASE_URL="https://minus34.com/opendata/geoscape-202211" +ARG BASE_URL="https://minus34.com/opendata/geoscape-202302" ENV BASE_URL ${BASE_URL} # Postgres user password - WARNING: change this to something a lot more secure @@ -14,7 +14,7 @@ RUN apt-get update \ && wget --quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo apt-key add - \ && echo "deb http://apt.postgresql.org/pub/repos/apt/ buster-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list \ && apt-get update \ - && apt-get install -y postgresql-14 postgresql-client-14 postgis postgresql-14-postgis-3 \ + && apt-get install -y postgresql-15 postgresql-client-15 postgis postgresql-15-postgis-3 \ && apt-get autoremove -y --purge \ && apt-get clean && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* @@ -24,21 +24,26 @@ RUN /etc/init.d/postgresql start \ && sudo -u postgres psql -c "CREATE EXTENSION postgis;" \ && /etc/init.d/postgresql stop +# enable external access to postgres - WARNING: these are insecure settings! Edit these to restrict access +RUN echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/15/main/pg_hba.conf +RUN echo "listen_addresses='*'" >> /etc/postgresql/15/main/postgresql.conf + # download and restore GNAF & Admin Boundary Postgres dump files RUN mkdir -p /data \ && cd /data \ - && wget --quiet ${BASE_URL}/gnaf-202211.dmp \ - && wget --quiet ${BASE_URL}/admin-bdys-202211.dmp \ - && /etc/init.d/postgresql start \ - && pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/gnaf-202211.dmp \ - && pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/admin-bdys-202211.dmp \ + && wget --quiet ${BASE_URL}/gnaf-202302.dmp \ + && wget --quiet ${BASE_URL}/admin-bdys-202302.dmp + +RUN /etc/init.d/postgresql start \ + && pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/gnaf-202302.dmp \ && /etc/init.d/postgresql stop \ - && rm /data/gnaf-202211.dmp \ - && rm /data/admin-bdys-202211.dmp + && rm /data/gnaf-202302.dmp + +RUN /etc/init.d/postgresql start \ + && pg_restore -Fc -d postgres -h localhost -p 5432 -U postgres /data/admin-bdys-202302.dmp \ + && /etc/init.d/postgresql stop \ + && rm /data/admin-bdys-202302.dmp -# enable external access to postgres - WARNING: these are insecure settings! Edit these to restrict access -RUN echo "host all all 0.0.0.0/0 md5" >> /etc/postgresql/14/main/pg_hba.conf -RUN echo "listen_addresses='*'" >> /etc/postgresql/14/main/postgresql.conf EXPOSE 5432 # set user for postgres startup @@ -48,4 +53,4 @@ USER postgres # VOLUME ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql"] # Start postgres when starting the container -CMD ["/usr/lib/postgresql/14/bin/postgres", "-D", "/var/lib/postgresql/14/main", "-c", "config_file=/etc/postgresql/14/main/postgresql.conf"] +CMD ["/usr/lib/postgresql/15/bin/postgres", "-D", "/var/lib/postgresql/15/main", "-c", "config_file=/etc/postgresql/15/main/postgresql.conf"] diff --git a/docker/xx_code_snippets.sh b/docker/xx_code_snippets.sh index 5046d50..1fce327 100644 --- a/docker/xx_code_snippets.sh +++ b/docker/xx_code_snippets.sh @@ -2,7 +2,7 @@ cd /Users/$(whoami)/git/minus34/gnaf-loader/docker # build gnaf loader image -docker build --squash --tag minus34/gnafloader:latest --tag minus34/gnafloader:202211 . +docker build --squash --tag minus34/gnafloader:latest --tag minus34/gnafloader:202302 . # run gnaf loader container docker run --name=gnafloader --publish=5433:5432 minus34/gnafloader:latest diff --git a/load-gnaf.py b/load-gnaf.py index 2e2a5ae..87777da 100644 --- a/load-gnaf.py +++ b/load-gnaf.py @@ -214,7 +214,7 @@ def populate_raw_gnaf(pg_cur): # load all PSV files using multiprocessing geoscape.multiprocess_list("sql", sql_list, logger) - # fix missing geocodes (added due to missing data in 202211 release) + # fix missing geocodes (added due to missing data in 202302 release) sql = geoscape.open_sql_file("01-04-raw-gnaf-fix-missing-geocodes.sql") pg_cur.execute(sql) diff --git a/postgres-scripts/01-04-raw-gnaf-fix-missing-geocodes.sql b/postgres-scripts/01-04-raw-gnaf-fix-missing-geocodes.sql index ec9a690..0eebc7b 100644 --- a/postgres-scripts/01-04-raw-gnaf-fix-missing-geocodes.sql +++ b/postgres-scripts/01-04-raw-gnaf-fix-missing-geocodes.sql @@ -1,4 +1,4 @@ --- workaround for missing default coordinates - 202211 release issue +-- workaround for missing default coordinates - 202302 release issue with missing as ( select address_detail_pid from raw_gnaf.address_default_geocode diff --git a/postgres-scripts/02-02a-prep-admin-bdys-tables.sql b/postgres-scripts/02-02a-prep-admin-bdys-tables.sql index d1758ee..3d42912 100644 --- a/postgres-scripts/02-02a-prep-admin-bdys-tables.sql +++ b/postgres-scripts/02-02a-prep-admin-bdys-tables.sql @@ -203,10 +203,10 @@ UPDATE admin_bdys.locality_bdys ; --- -- add old locality_pids to unedited localities -- need to rollover old locality pids from GNAF 202211 release - not supplied in 202211 release +-- -- add old locality_pids to unedited localities -- need to rollover old locality pids from GNAF 202302 release - not supplied in 202302 release -- UPDATE admin_bdys.locality_bdys as new -- SET old_locality_pid = old.old_locality_pid --- FROM admin_bdys_202211.locality_bdys AS old +-- FROM admin_bdys_202302.locality_bdys AS old -- WHERE new.locality_pid = old.locality_pid; diff --git a/postgres-scripts/xx-04-02-manual-bdy-tags.sql b/postgres-scripts/xx-04-02-manual-bdy-tags.sql index f03a512..ceccdeb 100644 --- a/postgres-scripts/xx-04-02-manual-bdy-tags.sql +++ b/postgres-scripts/xx-04-02-manual-bdy-tags.sql @@ -4,7 +4,7 @@ -- fix 35 boatsheds -update gnaf_202211.address_principal_admin_boundaries +update gnaf_202302.address_principal_admin_boundaries set lga_pid = 'lgacbffb11990f2', lga_name = 'Hobart City' where locality_pid = 'loc0f7a581b85b7' diff --git a/postgres-scripts/xx-add-elevation-to-gnaf.sql b/postgres-scripts/xx-add-elevation-to-gnaf.sql index 508c01b..3c64488 100644 --- a/postgres-scripts/xx-add-elevation-to-gnaf.sql +++ b/postgres-scripts/xx-add-elevation-to-gnaf.sql @@ -43,7 +43,7 @@ DROP TABLE IF EXISTS temp_gnaf_100m_points; -- -- SELECT ST_Value(dem.rast, gnaf.geom) as elevation, -- * --- FROM gnaf_202211.address_principals as gnaf --- INNER JOIN gnaf_202211.srtm_3s_dem as dem on ST_Intersects(gnaf.geom, dem.rast) limit 100; +-- FROM gnaf_202302.address_principals as gnaf +-- INNER JOIN gnaf_202302.srtm_3s_dem as dem on ST_Intersects(gnaf.geom, dem.rast) limit 100; diff --git a/postgres-scripts/xx-alias-principals-with-different-coordinates.sql b/postgres-scripts/xx-alias-principals-with-different-coordinates.sql index 57a6c19..cb9fdf7 100644 --- a/postgres-scripts/xx-alias-principals-with-different-coordinates.sql +++ b/postgres-scripts/xx-alias-principals-with-different-coordinates.sql @@ -6,9 +6,9 @@ SELECT als.gnaf_pid, als.street_locality_pid, als.locality_pid, als.alias_princi ST_MakePoint(als.longitude, als.latitude)::geography, ST_MakePoint(gnaf.longitude, gnaf.latitude)::geography ) as distance - FROM gnaf_202211.address_aliases as als - INNER JOIN gnaf_202211.address_alias_lookup as lkp on als.gnaf_pid = lkp.alias_pid - INNER JOIN gnaf_202211.address_principals as gnaf on lkp.principal_pid = gnaf.gnaf_pid + FROM gnaf_202302.address_aliases as als + INNER JOIN gnaf_202302.address_alias_lookup as lkp on als.gnaf_pid = lkp.alias_pid + INNER JOIN gnaf_202302.address_principals as gnaf on lkp.principal_pid = gnaf.gnaf_pid WHERE als.latitude <> gnaf.latitude OR als.longitude <> als.longitude order by ST_Distance( diff --git a/postgres-scripts/xx-export-address-principals-to-csv.sql b/postgres-scripts/xx-export-address-principals-to-csv.sql index dac99fb..187a74c 100644 --- a/postgres-scripts/xx-export-address-principals-to-csv.sql +++ b/postgres-scripts/xx-export-address-principals-to-csv.sql @@ -6,5 +6,5 @@ COPY ( address, locality_name, postcode, state, locality_postcode, confidence, legal_parcel_id, mb_2016_code, mb_2021_code, latitude, longitude, geocode_type, reliability - FROM gnaf_202211.address_principals + FROM gnaf_202302.address_principals ) TO '/Users/hugh.saalmans/tmp/address_principals.psv' HEADER CSV; diff --git a/postgres-scripts/xx-get-population-per-gnafpid.sql b/postgres-scripts/xx-get-population-per-gnafpid.sql index 33c84de..2e58f3b 100644 --- a/postgres-scripts/xx-get-population-per-gnafpid.sql +++ b/postgres-scripts/xx-get-population-per-gnafpid.sql @@ -22,7 +22,7 @@ --WITH counts AS ( -- SELECT mb_2016_code, -- count(*) AS address_count --- FROM gnaf_202211.address_principals +-- FROM gnaf_202302.address_principals -- GROUP BY mb_2016_code --) --UPDATE testing.mb_2016_counts AS mb @@ -35,7 +35,7 @@ ---- add geoms --UPDATE testing.mb_2016_counts AS mb -- SET geom = bdys.geom --- FROM admin_bdys_202211.abs_2016_mb as bdys +-- FROM admin_bdys_202302.abs_2016_mb as bdys -- WHERE mb.mb_2016_code = bdys.mb_16code::bigint; -- --ANALYSE testing.mb_2016_counts; @@ -58,7 +58,7 @@ SELECT gnaf.gnaf_pid, mb.person, mb.address_count, gnaf.geom -FROM gnaf_202211.address_principals as gnaf +FROM gnaf_202302.address_principals as gnaf INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code WHERE mb.address_count >= mb.dwelling AND mb.dwelling > 0 @@ -92,7 +92,7 @@ SELECT gnaf.gnaf_pid, mb.address_count, gnaf.geom, generate_series(1, ceiling(mb.dwelling::float / mb.address_count::float)::integer) as duplicate_number -FROM gnaf_202211.address_principals as gnaf +FROM gnaf_202302.address_principals as gnaf INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code WHERE mb.address_count < mb.dwelling AND address_count > 0 @@ -219,7 +219,7 @@ WITH adr AS ( mb.person, mb.address_count, gnaf.geom - FROM gnaf_202211.address_principals as gnaf + FROM gnaf_202302.address_principals as gnaf INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code WHERE mb.address_count >= mb.person AND mb.dwelling = 0 @@ -253,7 +253,7 @@ WITH adr AS ( mb.address_count, gnaf.geom, generate_series(1, ceiling(mb.person::float / mb.address_count::float)::integer) as duplicate_number - FROM gnaf_202211.address_principals as gnaf + FROM gnaf_202302.address_principals as gnaf INNER JOIN testing.mb_2016_counts AS mb on gnaf.mb_2016_code = mb.mb_2016_code WHERE mb.address_count < mb.person AND mb.address_count > 0 diff --git a/postgres-scripts/xx_calculate_partitions.sql b/postgres-scripts/xx_calculate_partitions.sql index 91f4e32..84578f1 100644 --- a/postgres-scripts/xx_calculate_partitions.sql +++ b/postgres-scripts/xx_calculate_partitions.sql @@ -5,13 +5,13 @@ CREATE TABLE testing2.gnaf_partitions AS WITH parts AS( SELECT unnest((select array_agg(counter) from generate_series(1, 99, 1) AS counter)) as partition_id, unnest((select array_agg(fraction) from generate_series(0.01, 0.99, 0.01) AS fraction)) as percentile, - unnest((select percentile_cont((select array_agg(s) from generate_series(0.01, 0.99, 0.01) as s)) WITHIN GROUP (ORDER BY longitude) FROM gnaf_202211.address_principals)) as longitude + unnest((select percentile_cont((select array_agg(s) from generate_series(0.01, 0.99, 0.01) as s)) WITHIN GROUP (ORDER BY longitude) FROM gnaf_202302.address_principals)) as longitude ), parts2 AS ( -SELECT 0 AS partition_id, 0.0 AS percentile, min(longitude) - 0.0001 AS longitude FROM gnaf_202211.address_principals +SELECT 0 AS partition_id, 0.0 AS percentile, min(longitude) - 0.0001 AS longitude FROM gnaf_202302.address_principals UNION ALL SELECT * FROM parts UNION ALL -SELECT 100 AS partition_id, 1.0 AS percentile, max(longitude) - 0.0001 AS longitude FROM gnaf_202211.address_principals +SELECT 100 AS partition_id, 1.0 AS percentile, max(longitude) - 0.0001 AS longitude FROM gnaf_202302.address_principals ) SELECT partition_id, percentile, @@ -43,7 +43,7 @@ WITH merge AS ( name, state, st_intersection(bdy.geom, part.geom) AS geom - FROM admin_bdys_202211.commonwealth_electorates as bdy + FROM admin_bdys_202302.commonwealth_electorates as bdy INNER JOIN testing2.gnaf_partitions as part ON st_intersects(bdy.geom, part.geom) ) INSERT INTO testing2.commonwealth_electorates_partitioned (partition_id, ce_pid, name, state, geom) @@ -65,4 +65,4 @@ commit; select count(*) from testing2.commonwealth_electorates_partitioned; -select count(*) from admin_bdys_202211.commonwealth_electorates_analysis; +select count(*) from admin_bdys_202302.commonwealth_electorates_analysis; diff --git a/postgres-scripts/xx_qa_table_counts.sql b/postgres-scripts/xx_qa_table_counts.sql index 13647a0..fd302e2 100644 --- a/postgres-scripts/xx_qa_table_counts.sql +++ b/postgres-scripts/xx_qa_table_counts.sql @@ -3,7 +3,7 @@ SELECT new.table_name, new.aus - old.aus as difference, new.aus as new_aus, old.aus as old_aus - FROM gnaf_202211.qa as new + FROM gnaf_202302.qa as new INNER JOIN gnaf_202102.qa as old ON new.table_name = old.table_name ; @@ -11,6 +11,6 @@ SELECT new.table_name, new.aus - old.aus as difference, new.aus as new_aus, old.aus as old_aus - FROM admin_bdys_202211.qa as new + FROM admin_bdys_202302.qa as new INNER JOIN admin_bdys_202102.qa as old ON new.table_name = old.table_name ; \ No newline at end of file diff --git a/postgres-scripts/xx_test_state_electorates.sql b/postgres-scripts/xx_test_state_electorates.sql index 02c2da4..126468f 100644 --- a/postgres-scripts/xx_test_state_electorates.sql +++ b/postgres-scripts/xx_test_state_electorates.sql @@ -2,19 +2,19 @@ -DROP VIEW IF EXISTS raw_admin_bdys_202211.vw_tenp_state_electorates; -CREATE VIEW raw_admin_bdys_202211.vw_tenp_state_electorates AS +DROP VIEW IF EXISTS raw_admin_bdys_202302.vw_tenp_state_electorates; +CREATE VIEW raw_admin_bdys_202302.vw_tenp_state_electorates AS SELECT dat.*, aut.name, bdy.se_ply_pid, bdy.geom - FROM raw_admin_bdys_202211.aus_state_electoral as dat - INNER JOIN raw_admin_bdys_202211.aus_state_electoral_class_aut as aut on dat.secl_code = aut.code - INNER JOIN raw_admin_bdys_202211.aus_state_electoral_polygon as bdy on dat.se_pid = bdy.se_pid + FROM raw_admin_bdys_202302.aus_state_electoral as dat + INNER JOIN raw_admin_bdys_202302.aus_state_electoral_class_aut as aut on dat.secl_code = aut.code + INNER JOIN raw_admin_bdys_202302.aus_state_electoral_polygon as bdy on dat.se_pid = bdy.se_pid -- where name = 'KEW' ; -select * from raw_admin_bdys_202211.vw_tenp_state_electorates +select * from raw_admin_bdys_202302.vw_tenp_state_electorates where name = 'KEW' order by se_pid, dt_create @@ -22,7 +22,7 @@ select * from raw_admin_bdys_202211.vw_tenp_state_electorates -select * from raw_admin_bdys_202211.aus_state_electoral_polygon +select * from raw_admin_bdys_202302.aus_state_electoral_polygon where se_pid = 'VIC292' order by se_pid, dt_create diff --git a/postgres-scripts/xx_testing.sql b/postgres-scripts/xx_testing.sql index 248cf55..09af719 100644 --- a/postgres-scripts/xx_testing.sql +++ b/postgres-scripts/xx_testing.sql @@ -1,25 +1,25 @@ select * -from admin_bdys_202211.locality_bdys_display; +from admin_bdys_202302.locality_bdys_display; select count(*) -from gnaf_202211.address_principals; +from gnaf_202302.address_principals; -- addresses missing bdy tags -drop view if exists gnaf_202211.vw_address_principal_admin_boundaries; -create view gnaf_202211.vw_address_principal_admin_boundaries as +drop view if exists gnaf_202302.vw_address_principal_admin_boundaries; +create view gnaf_202302.vw_address_principal_admin_boundaries as select bdy.*, geom -from gnaf_202211.address_principal_admin_boundaries as bdy -inner join gnaf_202211.address_principals as gnaf on gnaf.gnaf_pid = bdy.gnaf_pid +from gnaf_202302.address_principal_admin_boundaries as bdy +inner join gnaf_202302.address_principals as gnaf on gnaf.gnaf_pid = bdy.gnaf_pid where bdy.lga_pid is null and bdy.state <> 'ACT' ; select * -from gnaf_202211.address_principal_admin_boundaries +from gnaf_202302.address_principal_admin_boundaries ; @@ -30,7 +30,7 @@ select count(*) as address_count, locality_name, postcode, state -from gnaf_202211.address_principal_admin_boundaries +from gnaf_202302.address_principal_admin_boundaries where ce_pid is null and state <> 'ACT' group by locality_pid, @@ -44,42 +44,42 @@ order by address_count desc -- REINDEX DATABASE geo; -select count(*) from gnaf_202211.address_principals; -- 14404238 +select count(*) from gnaf_202302.address_principals; -- 14404238 -- find geoms that don't match select count(*) -from gnaf_202211.address_principals as old -inner join gnaf_202211_gda94.address_principals as new on old.gnaf_pid = new.gnaf_pid +from gnaf_202302.address_principals as old +inner join gnaf_202302_gda94.address_principals as new on old.gnaf_pid = new.gnaf_pid and not st_equals(old.geom, new.geom) ; --- root : INFO SQL FAILED! : ALTER TABLE ONLY gnaf_202211.locality_neighbour_lookup ADD CONSTRAINT locality_neighbour_lookup_pk PRIMARY KEY (locality_pid, neighbour_locality_pid); : could not create unique index "locality_neighbour_lookup_pk" +-- root : INFO SQL FAILED! : ALTER TABLE ONLY gnaf_202302.locality_neighbour_lookup ADD CONSTRAINT locality_neighbour_lookup_pk PRIMARY KEY (locality_pid, neighbour_locality_pid); : could not create unique index "locality_neighbour_lookup_pk" -- DETAIL: Key (locality_pid, neighbour_locality_pid)=(loc46e919f53d9f, loc5ecbe4a59b8c) is duplicated. -select * from gnaf_202211.locality_neighbour_lookup +select * from gnaf_202302.locality_neighbour_lookup where locality_pid = 'loc46e919f53d9f' and neighbour_locality_pid = 'loc5ecbe4a59b8c' ; -select * from gnaf_202211.localities +select * from gnaf_202302.localities where locality_pid = 'loc46e919f53d9f' ; -select count(*) from gnaf_202211.locality_neighbour_lookup -- 88868 +select count(*) from gnaf_202302.locality_neighbour_lookup -- 88868 -- 88284 (584 duplicates) with fred as ( - select distinct locality_pid, neighbour_locality_pid from gnaf_202211.locality_neighbour_lookup + select distinct locality_pid, neighbour_locality_pid from gnaf_202302.locality_neighbour_lookup ) select count(*) from fred ; -select * from admin_bdys_202211.qa_comparison; +select * from admin_bdys_202302.qa_comparison; select gid, @@ -106,7 +106,7 @@ select gid, mb21_pop, loci21_uri, geom -from raw_admin_bdys_202211.aus_mb_2021; +from raw_admin_bdys_202302.aus_mb_2021; -- yes, you can transorm a geom to its own SRID! (simplifies supporting 2 coord systems in one set of code @@ -114,4 +114,4 @@ select 'yep' where ST_SetSRID(ST_MakePoint(115.81778, -31.98092), 4283) = ST_transform(ST_SetSRID(ST_MakePoint(115.81778, -31.98092), 4283), 4283); -select Find_SRID('admin_bdys_202211', 'locality_bdys', 'geom'); +select Find_SRID('admin_bdys_202302', 'locality_bdys', 'geom'); diff --git a/postgres-scripts/xx_testing_missing_default_geocode_coordinates.sql b/postgres-scripts/xx_testing_missing_default_geocode_coordinates.sql index b01cfb1..89e8cdc 100644 --- a/postgres-scripts/xx_testing_missing_default_geocode_coordinates.sql +++ b/postgres-scripts/xx_testing_missing_default_geocode_coordinates.sql @@ -6,7 +6,7 @@ -- find default geocodes with no lat/longs -- 10 records select * -from raw_gnaf_202211.address_default_geocode +from raw_gnaf_202302.address_default_geocode where latitude is null or longitude is null; --GASA_424662224 @@ -22,7 +22,7 @@ where latitude is null or longitude is null; -- get address_site_pids for gnaf_pids with no coords -select address_detail_pid, address_site_pid from raw_gnaf_202211.address_detail +select address_detail_pid, address_site_pid from raw_gnaf_202302.address_detail where address_detail_pid in ( 'GASA_424662224', 'GASA_424664998', @@ -51,7 +51,7 @@ where address_detail_pid in ( -- check if lat/longs exist in full geocode table using address_site_pids: all 10 have coords & good geocodes select * -from raw_gnaf_202211.address_site_geocode +from raw_gnaf_202302.address_site_geocode where address_site_pid in ( '424747613', '424750387', @@ -71,22 +71,22 @@ and geocode_type_code = 'PAPS' -- workaround for missing default coordinates with missing as ( select address_detail_pid - from raw_gnaf_202211.address_default_geocode + from raw_gnaf_202302.address_default_geocode where latitude is null or longitude is null ), site as ( select gnaf.address_detail_pid, gnaf.address_site_pid - from raw_gnaf_202211.address_detail as gnaf + from raw_gnaf_202302.address_detail as gnaf inner join missing on gnaf.address_detail_pid = missing.address_detail_pid ), coords as ( select site.address_detail_pid, geo.latitude, geo.longitude - from raw_gnaf_202211.address_site_geocode as geo + from raw_gnaf_202302.address_site_geocode as geo inner join site on geo.address_site_pid = site.address_site_pid where geocode_type_code = 'PAPS' ) -update raw_gnaf_202211.address_default_geocode as def +update raw_gnaf_202302.address_default_geocode as def set latitude = coords.latitude, longitude = coords.longitude from coords diff --git a/spark/02_export_gnaf_and_admin_bdys_to_s3.py b/spark/02_export_gnaf_and_admin_bdys_to_s3.py index ac034fd..2af3984 100644 --- a/spark/02_export_gnaf_and_admin_bdys_to_s3.py +++ b/spark/02_export_gnaf_and_admin_bdys_to_s3.py @@ -67,7 +67,7 @@ def get_password(connection_name): # # aws details # s3_bucket = "minus34.com" -# s3_folder = "opendata/geoscape-202211/parquet" +# s3_folder = "opendata/geoscape-202302/parquet" # get runtime arguments parser = argparse.ArgumentParser(description="Converts Postgres/PostGIS tables to Parquet files with WKT geometries.") @@ -103,7 +103,7 @@ def main(): .config("spark.sql.adaptive.enabled", "true") .config("spark.executor.cores", 1) .config("spark.cores.max", num_processors) - .config("spark.driver.memory", "8g") + .config("spark.driver.memory", "24g") .config("spark.driver.maxResultSize", "2g") .getOrCreate() ) diff --git a/supporting-files/dump-gnaf-admin-bdys.bat b/supporting-files/dump-gnaf-admin-bdys.bat index 87ddab2..3c79ab2 100644 --- a/supporting-files/dump-gnaf-admin-bdys.bat +++ b/supporting-files/dump-gnaf-admin-bdys.bat @@ -1,11 +1,11 @@ -"C:\Program Files\PostgreSQL\12\bin\pg_dump" -Fc -d geo -n gnaf_202211 -p 5432 -U postgres > "C:\git\minus34\gnaf-202211.dmp" -"C:\Program Files\PostgreSQL\12\bin\pg_dump" -Fc -d geo -n admin_bdys_202211 -p 5432 -U postgres > "C:\git\minus34\admin-bdys-202211.dmp" +"C:\Program Files\PostgreSQL\12\bin\pg_dump" -Fc -d geo -n gnaf_202302 -p 5432 -U postgres > "C:\git\minus34\gnaf-202302.dmp" +"C:\Program Files\PostgreSQL\12\bin\pg_dump" -Fc -d geo -n admin_bdys_202302 -p 5432 -U postgres > "C:\git\minus34\admin-bdys-202302.dmp" REM OPTIONAL - copy files to AWS S3 and allow public read access (requires awscli installed) -REM aws --profile=default s3 cp "C:\git\minus34\gnaf-202211.dmp" s3://minus34.com/opendata/geoscape-202211/gnaf-202211.dmp -REM aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202211/gnaf-202211.dmp +REM aws --profile=default s3 cp "C:\git\minus34\gnaf-202302.dmp" s3://minus34.com/opendata/geoscape-202302/gnaf-202302.dmp +REM aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202302/gnaf-202302.dmp -REM aws --profile=default s3 cp "C:\git\minus34\admin-bdys-202211.dmp" s3://minus34.com/opendata/geoscape-202211/admin-bdys-202211.dmp -REM aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202211/admin-bdys-202211.dmREM +REM aws --profile=default s3 cp "C:\git\minus34\admin-bdys-202302.dmp" s3://minus34.com/opendata/geoscape-202302/admin-bdys-202302.dmp +REM aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202302/admin-bdys-202302.dmREM pause \ No newline at end of file diff --git a/supporting-files/dump-gnaf-admin-bdys.sh b/supporting-files/dump-gnaf-admin-bdys.sh index b4fa986..e1db6f7 100644 --- a/supporting-files/dump-gnaf-admin-bdys.sh +++ b/supporting-files/dump-gnaf-admin-bdys.sh @@ -3,18 +3,18 @@ # set this to taste - NOTE: you can't use "~" for your home folder output_folder="/Users/$(whoami)/tmp" -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202211 -p 5432 -U postgres -f ${output_folder}/gnaf-202211.dmp --no-owner +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202302 -p 5432 -U postgres -f ${output_folder}/gnaf-202302.dmp --no-owner echo "GNAF schema exported to dump file" -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n admin_bdys_202211 -p 5432 -U postgres -f ${output_folder}/admin-bdys-202211.dmp --no-owner +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n admin_bdys_202302 -p 5432 -U postgres -f ${output_folder}/admin-bdys-202302.dmp --no-owner echo "Admin Bdys schema exported to dump file" # OPTIONAL - copy files to AWS S3 and allow public read access (requires AWSCLI installed and your AWS credentials setup) cd ${output_folder} -for f in *-202211.dmp; +for f in *-202302.dmp; do - aws --profile=default s3 cp --storage-class REDUCED_REDUNDANCY ./${f} s3://minus34.com/opendata/geoscape-202211/${f}; - aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202211/${f} + aws --profile=default s3 cp --storage-class REDUCED_REDUNDANCY ./${f} s3://minus34.com/opendata/geoscape-202302/${f}; + aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202302/${f} echo "${f} uploaded to AWS S3" done diff --git a/supporting-files/quarterly_processing/01_setup_conda_env.sh b/supporting-files/quarterly_processing/01_setup_conda_env.sh index e1afe74..6b22906 100644 --- a/supporting-files/quarterly_processing/01_setup_conda_env.sh +++ b/supporting-files/quarterly_processing/01_setup_conda_env.sh @@ -26,7 +26,7 @@ conda create -y -n ${ENV_NAME} python=${PYTHON_VERSION} # activate and setup env conda activate ${ENV_NAME} -conda env config vars set JAVA_HOME="/usr/local/opt/openjdk@11" +#conda env config vars set JAVA_HOME="/opt/homebrew/opt/openjdk@11" conda config --env --add channels conda-forge conda config --env --set channel_priority strict diff --git a/supporting-files/quarterly_processing/02-run-gnaf-loader-locality-clean-and-copy-to-aws-s3.sh b/supporting-files/quarterly_processing/02-run-gnaf-loader-locality-clean-and-copy-to-aws-s3.sh index 6f41f13..0c33a77 100644 --- a/supporting-files/quarterly_processing/02-run-gnaf-loader-locality-clean-and-copy-to-aws-s3.sh +++ b/supporting-files/quarterly_processing/02-run-gnaf-loader-locality-clean-and-copy-to-aws-s3.sh @@ -12,9 +12,9 @@ SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" # --------------------------------------------------------------------------------------------------------------------- AWS_PROFILE="minus34" -OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202211" -GNAF_PATH="/Users/$(whoami)/Downloads/g-naf_nov22_allstates_gda94_psv_109" -BDYS_PATH="/Users/$(whoami)/Downloads/NOV22_AdminBounds_GDA94_SHP" +OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202302" +GNAF_PATH="/Users/$(whoami)/Downloads/g-naf_feb23_allstates_gda94_psv_1010" +BDYS_PATH="/Users/$(whoami)/Downloads/FEB23_AdminBounds_GDA94_SHP" echo "---------------------------------------------------------------------------------------------------------------------" echo "Run gnaf-loader and locality boundary clean" @@ -24,7 +24,7 @@ python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pg python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER} # upload locality bdy files to S3 -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202211 --exclude "*" --include "*.zip" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202302 --exclude "*" --include "*.zip" --acl public-read echo "---------------------------------------------------------------------------------------------------------------------" echo "create concordance file" @@ -34,7 +34,7 @@ echo "-------------------------------------------------------------------------- mkdir -p "${OUTPUT_FOLDER}" python3 /Users/$(whoami)/git/iag_geo/concord/create_concordance_file.py --pgdb=geo --output-path=${OUTPUT_FOLDER} -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202211 --exclude "*" --include "*.csv" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202302 --exclude "*" --include "*.csv" --acl public-read # copy concordance score file to GitHub repo local files cp ${OUTPUT_FOLDER}/boundary_concordance_score.csv /Users/$(whoami)/git/iag_geo/concord/data/ @@ -43,16 +43,16 @@ echo "-------------------------------------------------------------------------- echo "dump postgres schemas to a local folder" echo "---------------------------------------------------------------------------------------------------------------------" -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202211 -p 5432 -U postgres -f "${OUTPUT_FOLDER}/gnaf-202211.dmp" --no-owner +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202302 -p 5432 -U postgres -f "${OUTPUT_FOLDER}/gnaf-202302.dmp" --no-owner echo "GNAF schema exported to dump file" -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n admin_bdys_202211 -p 5432 -U postgres -f "${OUTPUT_FOLDER}/admin-bdys-202211.dmp" --no-owner +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n admin_bdys_202302 -p 5432 -U postgres -f "${OUTPUT_FOLDER}/admin-bdys-202302.dmp" --no-owner echo "Admin Bdys schema exported to dump file" echo "---------------------------------------------------------------------------------------------------------------------" echo "copy Postgres dump files to AWS S3 and allow public read access (requires AWSCLI installed & AWS credentials setup)" echo "---------------------------------------------------------------------------------------------------------------------" -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202211 --exclude "*" --include "*.dmp" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202302 --exclude "*" --include "*.dmp" --acl public-read echo "---------------------------------------------------------------------------------------------------------------------" echo "create parquet versions of GNAF and Admin Bdys and upload to AWS S3" @@ -63,6 +63,6 @@ echo "-------------------------------------------------------------------------- conda activate sedona -python ${SCRIPT_DIR}/../../spark/02_export_gnaf_and_admin_bdys_to_s3.py --admin-schema="admin_bdys_202211" --gnaf-schema="gnaf_202211" --output-path="${OUTPUT_FOLDER}/parquet" +python ${SCRIPT_DIR}/../../spark/02_export_gnaf_and_admin_bdys_to_s3.py --admin-schema="admin_bdys_202302" --gnaf-schema="gnaf_202302" --output-path="${OUTPUT_FOLDER}/parquet" -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER}/parquet s3://minus34.com/opendata/geoscape-202211/parquet --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER}/parquet s3://minus34.com/opendata/geoscape-202302/parquet --acl public-read diff --git a/supporting-files/quarterly_processing/03-run-gnaf-loader-locality-clean-and-copy-to-aws-s3-gda2020.sh b/supporting-files/quarterly_processing/03-run-gnaf-loader-locality-clean-and-copy-to-aws-s3-gda2020.sh index 48bc1f3..a6d5a8e 100644 --- a/supporting-files/quarterly_processing/03-run-gnaf-loader-locality-clean-and-copy-to-aws-s3-gda2020.sh +++ b/supporting-files/quarterly_processing/03-run-gnaf-loader-locality-clean-and-copy-to-aws-s3-gda2020.sh @@ -12,19 +12,19 @@ SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" # --------------------------------------------------------------------------------------------------------------------- AWS_PROFILE="minus34" -OUTPUT_FOLDER_2020="/Users/$(whoami)/tmp/geoscape_202211_gda2020" -GNAF_2020_PATH="/Users/$(whoami)/Downloads/g-naf_nov22_allstates_gda2020_psv_109" -BDYS_2020_PATH="/Users/$(whoami)/Downloads/NOV22_AdminBounds_GDA2020_SHP" +OUTPUT_FOLDER_2020="/Users/$(whoami)/tmp/geoscape_202302_gda2020" +GNAF_2020_PATH="/Users/$(whoami)/Downloads/g-naf_feb23_allstates_gda94_psv_1010" +BDYS_2020_PATH="/Users/$(whoami)/Downloads/FEB23_AdminBounds_GDA_2020_SHP" echo "---------------------------------------------------------------------------------------------------------------------" echo "Run gnaf-loader and locality boundary clean" echo "---------------------------------------------------------------------------------------------------------------------" -python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pgdb=geo --max-processes=6 --gnaf-tables-path="${GNAF_2020_PATH}" --admin-bdys-path="${BDYS_2020_PATH}" --srid=7844 --gnaf-schema gnaf_202211_gda2020 --admin-schema admin_bdys_202211_gda2020 --previous-gnaf-schema gnaf_202211 --previous-admin-schema admin_bdys_202211 -python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER_2020} --admin-schema admin_bdys_202211_gda2020 +python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pgdb=geo --max-processes=6 --gnaf-tables-path="${GNAF_2020_PATH}" --admin-bdys-path="${BDYS_2020_PATH}" --srid=7844 --gnaf-schema gnaf_202302_gda2020 --admin-schema admin_bdys_202302_gda2020 --previous-gnaf-schema gnaf_202302 --previous-admin-schema admin_bdys_202302 +python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER_2020} --admin-schema admin_bdys_202302_gda2020 # upload locality bdy files to S3 -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202211-gda2020 --exclude "*" --include "*.zip" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202302-gda2020 --exclude "*" --include "*.zip" --acl public-read echo "---------------------------------------------------------------------------------------------------------------------" echo "create concordance file" @@ -32,23 +32,23 @@ echo "-------------------------------------------------------------------------- # create concordance file and upload to S3 mkdir -p "${OUTPUT_FOLDER_2020}" -python3 /Users/$(whoami)/git/iag_geo/concord/create_concordance_file.py --pgdb=geo --admin-schema="admin_bdys_202211_gda2020" --gnaf-schema="gnaf_202211_gda2020" --output-path=${OUTPUT_FOLDER_2020} -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202211-gda2020 --exclude "*" --include "*.csv" --acl public-read +python3 /Users/$(whoami)/git/iag_geo/concord/create_concordance_file.py --pgdb=geo --admin-schema="admin_bdys_202302_gda2020" --gnaf-schema="gnaf_202302_gda2020" --output-path=${OUTPUT_FOLDER_2020} +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202302-gda2020 --exclude "*" --include "*.csv" --acl public-read echo "---------------------------------------------------------------------------------------------------------------------" echo "dump postgres schemas to a local folder" echo "---------------------------------------------------------------------------------------------------------------------" -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202211_gda2020 -p 5432 -U postgres -f "${OUTPUT_FOLDER_2020}/gnaf-202211.dmp" --no-owner +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202302_gda2020 -p 5432 -U postgres -f "${OUTPUT_FOLDER_2020}/gnaf-202302.dmp" --no-owner echo "GNAF schema exported to dump file" -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n admin_bdys_202211_gda2020 -p 5432 -U postgres -f "${OUTPUT_FOLDER_2020}/admin-bdys-202211.dmp" --no-owner +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n admin_bdys_202302_gda2020 -p 5432 -U postgres -f "${OUTPUT_FOLDER_2020}/admin-bdys-202302.dmp" --no-owner echo "Admin Bdys schema exported to dump file" echo "---------------------------------------------------------------------------------------------------------------------" echo "copy Postgres dump files to AWS S3 and allow public read access (requires AWSCLI installed & AWS credentials setup)" echo "---------------------------------------------------------------------------------------------------------------------" -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202211-gda2020 --exclude "*" --include "*.dmp" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202302-gda2020 --exclude "*" --include "*.dmp" --acl public-read echo "---------------------------------------------------------------------------------------------------------------------" echo "create parquet versions of GNAF and Admin Bdys and upload to AWS S3" @@ -59,6 +59,6 @@ echo "-------------------------------------------------------------------------- conda activate sedona -python ${SCRIPT_DIR}/../../spark/02_export_gnaf_and_admin_bdys_to_s3.py --admin-schema="admin_bdys_202211_gda2020" --gnaf-schema="gnaf_202211_gda2020" --output-path="${OUTPUT_FOLDER_2020}/parquet" +python ${SCRIPT_DIR}/../../spark/02_export_gnaf_and_admin_bdys_to_s3.py --admin-schema="admin_bdys_202302_gda2020" --gnaf-schema="gnaf_202302_gda2020" --output-path="${OUTPUT_FOLDER_2020}/parquet" -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020}/parquet s3://minus34.com/opendata/geoscape-202211-gda2020/parquet --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020}/parquet s3://minus34.com/opendata/geoscape-202302-gda2020/parquet --acl public-read diff --git a/supporting-files/quarterly_processing/04-create-docker-images.sh b/supporting-files/quarterly_processing/04-create-docker-images.sh index 6526993..f3e78a4 100644 --- a/supporting-files/quarterly_processing/04-create-docker-images.sh +++ b/supporting-files/quarterly_processing/04-create-docker-images.sh @@ -9,22 +9,28 @@ echo "-------------------------------------------------------------------------- echo "build gnaf-loader GDA94 docker image" echo "---------------------------------------------------------------------------------------------------------------------" -docker build --tag minus34/gnafloader:latest --tag minus34/gnafloader:202211 --no-cache --build-arg BASE_URL="https://minus34.com/opendata/geoscape-202211" . +docker build --no-cache --squash --tag docker.io/minus34/gnafloader:latest --tag docker.io/minus34/gnafloader:202302 --build-arg BASE_URL="https://minus34.com/opendata/geoscape-202302" . + +echo "---------------------------------------------------------------------------------------------------------------------" +echo "push image (with 2 tags) to Docker Hub" +echo "---------------------------------------------------------------------------------------------------------------------" + +docker push minus34/gnafloader --all-tags echo "---------------------------------------------------------------------------------------------------------------------" echo "build gnaf-loader GDA2020 docker image" echo "---------------------------------------------------------------------------------------------------------------------" -docker build --tag minus34/gnafloader:latest-gda2020 --tag minus34/gnafloader:202211-gda2020 --no-cache --build-arg BASE_URL="https://minus34.com/opendata/geoscape-202211-gda2020" . +docker build --no-cache --squash --tag docker.io/minus34/gnafloader:latest-gda2020 --tag docker.io/minus34/gnafloader:202302-gda2020 --build-arg BASE_URL="https://minus34.com/opendata/geoscape-202302-gda2020" . echo "---------------------------------------------------------------------------------------------------------------------" -echo "push both images (with 4 tags) to Docker Hub" +echo "push images (with 2 new tags) to Docker Hub" echo "---------------------------------------------------------------------------------------------------------------------" docker push minus34/gnafloader --all-tags -#echo "---------------------------------------------------------------------------------------------------------------------" -#echo "clean up Docker locally - warning: this could accidentally destroy other Docker images" -#echo "---------------------------------------------------------------------------------------------------------------------" -# -#echo 'y' | docker system prune +echo "---------------------------------------------------------------------------------------------------------------------" +echo "clean up Docker locally - warning: this could accidentally destroy other Docker images" +echo "---------------------------------------------------------------------------------------------------------------------" + +echo 'y' | docker system prune diff --git a/supporting-files/quarterly_processing/postgresql.conf b/supporting-files/quarterly_processing/postgresql.conf new file mode 100644 index 0000000..ab88936 --- /dev/null +++ b/supporting-files/quarterly_processing/postgresql.conf @@ -0,0 +1,796 @@ +# ----------------------------- +# PostgreSQL configuration file +# ----------------------------- +# +# This file consists of lines of the form: +# +# name = value +# +# (The "=" is optional.) Whitespace may be used. Comments are introduced with +# "#" anywhere on a line. The complete list of parameter names and allowed +# values can be found in the PostgreSQL documentation. +# +# The commented-out settings shown in this file represent the default values. +# Re-commenting a setting is NOT sufficient to revert it to the default value; +# you need to reload the server. +# +# This file is read on server startup and when the server receives a SIGHUP +# signal. If you edit the file on a running system, you have to SIGHUP the +# server for the changes to take effect, run "pg_ctl reload", or execute +# "SELECT pg_reload_conf()". Some parameters, which are marked below, +# require a server shutdown and restart to take effect. +# +# Any parameter can also be given as a command-line option to the server, e.g., +# "postgres -c log_connections=on". Some parameters can be changed at run time +# with the "SET" SQL command. +# +# Memory units: B = bytes Time units: us = microseconds +# kB = kilobytes ms = milliseconds +# MB = megabytes s = seconds +# GB = gigabytes min = minutes +# TB = terabytes h = hours +# d = days + + +#------------------------------------------------------------------------------ +# FILE LOCATIONS +#------------------------------------------------------------------------------ + +# The default values of these variables are driven from the -D command-line +# option or PGDATA environment variable, represented here as ConfigDir. + +#data_directory = 'ConfigDir' # use data in another directory + # (change requires restart) +#hba_file = 'ConfigDir/pg_hba.conf' # host-based authentication file + # (change requires restart) +#ident_file = 'ConfigDir/pg_ident.conf' # ident configuration file + # (change requires restart) + +# If external_pid_file is not explicitly set, no extra PID file is written. +#external_pid_file = '' # write an extra PID file + # (change requires restart) + + +#------------------------------------------------------------------------------ +# CONNECTIONS AND AUTHENTICATION +#------------------------------------------------------------------------------ + +# - Connection Settings - + +listen_addresses = 'localhost' # what IP address(es) to listen on; + # comma-separated list of addresses; + # defaults to 'localhost'; use '*' for all + # (change requires restart) +#port = 5432 # (change requires restart) +max_connections = 24 # (change requires restart) +#superuser_reserved_connections = 3 # (change requires restart) +#unix_socket_directories = '/tmp' # comma-separated list of directories + # (change requires restart) +#unix_socket_group = '' # (change requires restart) +#unix_socket_permissions = 0777 # begin with 0 to use octal notation + # (change requires restart) +#bonjour = off # advertise server via Bonjour + # (change requires restart) +#bonjour_name = '' # defaults to the computer name + # (change requires restart) + +# - TCP settings - +# see "man tcp" for details + +#tcp_keepalives_idle = 0 # TCP_KEEPIDLE, in seconds; + # 0 selects the system default +#tcp_keepalives_interval = 0 # TCP_KEEPINTVL, in seconds; + # 0 selects the system default +#tcp_keepalives_count = 0 # TCP_KEEPCNT; + # 0 selects the system default +#tcp_user_timeout = 0 # TCP_USER_TIMEOUT, in milliseconds; + # 0 selects the system default + +#client_connection_check_interval = 0 # time between checks for client + # disconnection while running queries; + # 0 for never + +# - Authentication - + +#authentication_timeout = 1min # 1s-600s +#password_encryption = scram-sha-256 # scram-sha-256 or md5 +#db_user_namespace = off + +# GSSAPI using Kerberos +#krb_server_keyfile = 'FILE:${sysconfdir}/krb5.keytab' +#krb_caseins_users = off + +# - SSL - + +#ssl = off +#ssl_ca_file = '' +#ssl_cert_file = 'server.crt' +#ssl_crl_file = '' +#ssl_crl_dir = '' +#ssl_key_file = 'server.key' +#ssl_ciphers = 'HIGH:MEDIUM:+3DES:!aNULL' # allowed SSL ciphers +#ssl_prefer_server_ciphers = on +#ssl_ecdh_curve = 'prime256v1' +#ssl_min_protocol_version = 'TLSv1.2' +#ssl_max_protocol_version = '' +#ssl_dh_params_file = '' +#ssl_passphrase_command = '' +#ssl_passphrase_command_supports_reload = off + + +#------------------------------------------------------------------------------ +# RESOURCE USAGE (except WAL) +#------------------------------------------------------------------------------ + +# - Memory - + +shared_buffers = 8GB # min 128kB + # (change requires restart) +#huge_pages = try # on, off, or try + # (change requires restart) +#huge_page_size = 0 # zero for system default + # (change requires restart) +#temp_buffers = 8MB # min 800kB +#max_prepared_transactions = 0 # zero disables the feature + # (change requires restart) +# Caution: it is not advisable to set max_prepared_transactions nonzero unless +# you actively intend to use prepared transactions. +work_mem = 512MB # min 64kB +#hash_mem_multiplier = 1.0 # 1-1000.0 multiplier on hash table work_mem +maintenance_work_mem = 2GB # min 1MB +#autovacuum_work_mem = -1 # min 1MB, or -1 to use maintenance_work_mem +#logical_decoding_work_mem = 64MB # min 64kB +#max_stack_depth = 2MB # min 100kB +#shared_memory_type = mmap # the default is the first option + # supported by the operating system: + # mmap + # sysv + # windows + # (change requires restart) +dynamic_shared_memory_type = posix # the default is the first option + # supported by the operating system: + # posix + # sysv + # windows + # mmap + # (change requires restart) +#min_dynamic_shared_memory = 0MB # (change requires restart) + +# - Disk - + +#temp_file_limit = -1 # limits per-process temp file space + # in kilobytes, or -1 for no limit + +# - Kernel Resources - + +#max_files_per_process = 1000 # min 64 + # (change requires restart) + +# - Cost-Based Vacuum Delay - + +#vacuum_cost_delay = 0 # 0-100 milliseconds (0 disables) +#vacuum_cost_page_hit = 1 # 0-10000 credits +#vacuum_cost_page_miss = 2 # 0-10000 credits +#vacuum_cost_page_dirty = 20 # 0-10000 credits +#vacuum_cost_limit = 200 # 1-10000 credits + +# - Background Writer - + +#bgwriter_delay = 200ms # 10-10000ms between rounds +#bgwriter_lru_maxpages = 100 # max buffers written/round, 0 disables +#bgwriter_lru_multiplier = 2.0 # 0-10.0 multiplier on buffers scanned/round +#bgwriter_flush_after = 0 # measured in pages, 0 disables + +# - Asynchronous Behavior - + +#backend_flush_after = 0 # measured in pages, 0 disables +effective_io_concurrency = 0 # 1-1000; 0 disables prefetching +#maintenance_io_concurrency = 10 # 1-1000; 0 disables prefetching +max_worker_processes = 8 # (change requires restart) +max_parallel_workers_per_gather = 2 # taken from max_parallel_workers +#max_parallel_maintenance_workers = 2 # taken from max_parallel_workers +max_parallel_workers = 2 # maximum number of max_worker_processes that + # can be used in parallel operations +#parallel_leader_participation = on +#old_snapshot_threshold = -1 # 1min-60d; -1 disables; 0 is immediate + # (change requires restart) + + +#------------------------------------------------------------------------------ +# WRITE-AHEAD LOG +#------------------------------------------------------------------------------ + +# - Settings - + +#wal_level = minimal # minimal, replica, or logical + # (change requires restart) +#fsync = on # flush data to disk for crash safety + # (turning this off can cause + # unrecoverable data corruption) +#synchronous_commit = on # synchronization level; + # off, local, remote_write, remote_apply, or on +#wal_sync_method = fsync # the default is the first option + # supported by the operating system: + # open_datasync + # fdatasync (default on Linux and FreeBSD) + # fsync + # fsync_writethrough + # open_sync +#full_page_writes = on # recover from partial page writes +#wal_log_hints = off # also do full page writes of non-critical updates + # (change requires restart) +#wal_compression = off # enable compression of full-page writes +#wal_init_zero = on # zero-fill new WAL files +#wal_recycle = on # recycle WAL files +wal_buffers = -1 # min 32kB, -1 sets based on shared_buffers + # (change requires restart) +#wal_writer_delay = 200ms # 1-10000 milliseconds +#wal_writer_flush_after = 1MB # measured in pages, 0 disables +#wal_skip_threshold = 2MB + +#commit_delay = 0 # range 0-100000, in microseconds +#commit_siblings = 5 # range 1-1000 + +# - Checkpoints - + +#checkpoint_timeout = 5min # range 30s-1d +checkpoint_completion_target = 0.9 # checkpoint target duration, 0.0 - 1.0 +#checkpoint_flush_after = 0 # measured in pages, 0 disables +#checkpoint_warning = 30s # 0 disables +max_wal_size = 3GB +min_wal_size = 2GB + +# - Archiving - + +#archive_mode = off # enables archiving; off, on, or always + # (change requires restart) +#archive_command = '' # command to use to archive a logfile segment + # placeholders: %p = path of file to archive + # %f = file name only + # e.g. 'test ! -f /mnt/server/archivedir/%f && cp %p /mnt/server/archivedir/%f' +#archive_timeout = 0 # force a logfile segment switch after this + # number of seconds; 0 disables + +# - Archive Recovery - + +# These are only used in recovery mode. + +#restore_command = '' # command to use to restore an archived logfile segment + # placeholders: %p = path of file to restore + # %f = file name only + # e.g. 'cp /mnt/server/archivedir/%f %p' +#archive_cleanup_command = '' # command to execute at every restartpoint +#recovery_end_command = '' # command to execute at completion of recovery + +# - Recovery Target - + +# Set these only when performing a targeted recovery. + +#recovery_target = '' # 'immediate' to end recovery as soon as a + # consistent state is reached + # (change requires restart) +#recovery_target_name = '' # the named restore point to which recovery will proceed + # (change requires restart) +#recovery_target_time = '' # the time stamp up to which recovery will proceed + # (change requires restart) +#recovery_target_xid = '' # the transaction ID up to which recovery will proceed + # (change requires restart) +#recovery_target_lsn = '' # the WAL LSN up to which recovery will proceed + # (change requires restart) +#recovery_target_inclusive = on # Specifies whether to stop: + # just after the specified recovery target (on) + # just before the recovery target (off) + # (change requires restart) +#recovery_target_timeline = 'latest' # 'current', 'latest', or timeline ID + # (change requires restart) +#recovery_target_action = 'pause' # 'pause', 'promote', 'shutdown' + # (change requires restart) + + +#------------------------------------------------------------------------------ +# REPLICATION +#------------------------------------------------------------------------------ + +# - Sending Servers - + +# Set these on the primary and on any standby that will send replication data. + +#max_wal_senders = 10 # max number of walsender processes + # (change requires restart) +#max_replication_slots = 10 # max number of replication slots + # (change requires restart) +#wal_keep_size = 0 # in megabytes; 0 disables +#max_slot_wal_keep_size = -1 # in megabytes; -1 disables +#wal_sender_timeout = 60s # in milliseconds; 0 disables +#track_commit_timestamp = off # collect timestamp of transaction commit + # (change requires restart) + +# - Primary Server - + +# These settings are ignored on a standby server. + +#synchronous_standby_names = '' # standby servers that provide sync rep + # method to choose sync standbys, number of sync standbys, + # and comma-separated list of application_name + # from standby(s); '*' = all +#vacuum_defer_cleanup_age = 0 # number of xacts by which cleanup is delayed + +# - Standby Servers - + +# These settings are ignored on a primary server. + +#primary_conninfo = '' # connection string to sending server +#primary_slot_name = '' # replication slot on sending server +#promote_trigger_file = '' # file name whose presence ends recovery +#hot_standby = on # "off" disallows queries during recovery + # (change requires restart) +#max_standby_archive_delay = 30s # max delay before canceling queries + # when reading WAL from archive; + # -1 allows indefinite delay +#max_standby_streaming_delay = 30s # max delay before canceling queries + # when reading streaming WAL; + # -1 allows indefinite delay +#wal_receiver_create_temp_slot = off # create temp slot if primary_slot_name + # is not set +#wal_receiver_status_interval = 10s # send replies at least this often + # 0 disables +#hot_standby_feedback = off # send info from standby to prevent + # query conflicts +#wal_receiver_timeout = 60s # time that receiver waits for + # communication from primary + # in milliseconds; 0 disables +#wal_retrieve_retry_interval = 5s # time to wait before retrying to + # retrieve WAL after a failed attempt +#recovery_min_apply_delay = 0 # minimum delay for applying changes during recovery + +# - Subscribers - + +# These settings are ignored on a publisher. + +#max_logical_replication_workers = 4 # taken from max_worker_processes + # (change requires restart) +#max_sync_workers_per_subscription = 2 # taken from max_logical_replication_workers + + +#------------------------------------------------------------------------------ +# QUERY TUNING +#------------------------------------------------------------------------------ + +# - Planner Method Configuration - + +#enable_async_append = on +#enable_bitmapscan = on +#enable_gathermerge = on +#enable_hashagg = on +#enable_hashjoin = on +#enable_incremental_sort = on +#enable_indexscan = on +#enable_indexonlyscan = on +#enable_material = on +#enable_memoize = on +#enable_mergejoin = on +#enable_nestloop = on +#enable_parallel_append = on +#enable_parallel_hash = on +#enable_partition_pruning = on +#enable_partitionwise_join = off +#enable_partitionwise_aggregate = off +#enable_seqscan = on +#enable_sort = on +#enable_tidscan = on + +# - Planner Cost Constants - + +#seq_page_cost = 1.0 # measured on an arbitrary scale +random_page_cost = 1.1 # same scale as above +#cpu_tuple_cost = 0.01 # same scale as above +#cpu_index_tuple_cost = 0.005 # same scale as above +#cpu_operator_cost = 0.0025 # same scale as above +#parallel_setup_cost = 1000.0 # same scale as above +#parallel_tuple_cost = 0.1 # same scale as above +#min_parallel_table_scan_size = 8MB +#min_parallel_index_scan_size = 512kB +effective_cache_size = 24GB + +#jit_above_cost = 100000 # perform JIT compilation if available + # and query more expensive than this; + # -1 disables +#jit_inline_above_cost = 500000 # inline small functions if query is + # more expensive than this; -1 disables +#jit_optimize_above_cost = 500000 # use expensive JIT optimizations if + # query is more expensive than this; + # -1 disables + +# - Genetic Query Optimizer - + +#geqo = on +#geqo_threshold = 12 +#geqo_effort = 5 # range 1-10 +#geqo_pool_size = 0 # selects default based on effort +#geqo_generations = 0 # selects default based on effort +#geqo_selection_bias = 2.0 # range 1.5-2.0 +#geqo_seed = 0.0 # range 0.0-1.0 + +# - Other Planner Options - + +#default_statistics_target = 100 # range 1-10000 +#constraint_exclusion = partition # on, off, or partition +#cursor_tuple_fraction = 0.1 # range 0.0-1.0 +#from_collapse_limit = 8 +#jit = on # allow JIT compilation +#join_collapse_limit = 8 # 1 disables collapsing of explicit + # JOIN clauses +#plan_cache_mode = auto # auto, force_generic_plan or + # force_custom_plan + + +#------------------------------------------------------------------------------ +# REPORTING AND LOGGING +#------------------------------------------------------------------------------ + +# - Where to Log - + +#log_destination = 'stderr' # Valid values are combinations of + # stderr, csvlog, syslog, and eventlog, + # depending on platform. csvlog + # requires logging_collector to be on. + +# This is used when logging to stderr: +#logging_collector = off # Enable capturing of stderr and csvlog + # into log files. Required to be on for + # csvlogs. + # (change requires restart) + +# These are only used if logging_collector is on: +#log_directory = 'log' # directory where log files are written, + # can be absolute or relative to PGDATA +#log_filename = 'postgresql-%Y-%m-%d_%H%M%S.log' # log file name pattern, + # can include strftime() escapes +#log_file_mode = 0600 # creation mode for log files, + # begin with 0 to use octal notation +#log_rotation_age = 1d # Automatic rotation of logfiles will + # happen after that time. 0 disables. +#log_rotation_size = 10MB # Automatic rotation of logfiles will + # happen after that much log output. + # 0 disables. +#log_truncate_on_rotation = off # If on, an existing log file with the + # same name as the new log file will be + # truncated rather than appended to. + # But such truncation only occurs on + # time-driven rotation, not on restarts + # or size-driven rotation. Default is + # off, meaning append to existing files + # in all cases. + +# These are relevant when logging to syslog: +#syslog_facility = 'LOCAL0' +#syslog_ident = 'postgres' +#syslog_sequence_numbers = on +#syslog_split_messages = on + +# This is only relevant when logging to eventlog (Windows): +# (change requires restart) +#event_source = 'PostgreSQL' + +# - When to Log - + +#log_min_messages = warning # values in order of decreasing detail: + # debug5 + # debug4 + # debug3 + # debug2 + # debug1 + # info + # notice + # warning + # error + # log + # fatal + # panic + +#log_min_error_statement = error # values in order of decreasing detail: + # debug5 + # debug4 + # debug3 + # debug2 + # debug1 + # info + # notice + # warning + # error + # log + # fatal + # panic (effectively off) + +#log_min_duration_statement = -1 # -1 is disabled, 0 logs all statements + # and their durations, > 0 logs only + # statements running at least this number + # of milliseconds + +#log_min_duration_sample = -1 # -1 is disabled, 0 logs a sample of statements + # and their durations, > 0 logs only a sample of + # statements running at least this number + # of milliseconds; + # sample fraction is determined by log_statement_sample_rate + +#log_statement_sample_rate = 1.0 # fraction of logged statements exceeding + # log_min_duration_sample to be logged; + # 1.0 logs all such statements, 0.0 never logs + + +#log_transaction_sample_rate = 0.0 # fraction of transactions whose statements + # are logged regardless of their duration; 1.0 logs all + # statements from all transactions, 0.0 never logs + +# - What to Log - + +#debug_print_parse = off +#debug_print_rewritten = off +#debug_print_plan = off +#debug_pretty_print = on +#log_autovacuum_min_duration = -1 # log autovacuum activity; + # -1 disables, 0 logs all actions and + # their durations, > 0 logs only + # actions running at least this number + # of milliseconds. +#log_checkpoints = off +#log_connections = off +#log_disconnections = off +#log_duration = off +#log_error_verbosity = default # terse, default, or verbose messages +#log_hostname = off +#log_line_prefix = '%m [%p] ' # special values: + # %a = application name + # %u = user name + # %d = database name + # %r = remote host and port + # %h = remote host + # %b = backend type + # %p = process ID + # %P = process ID of parallel group leader + # %t = timestamp without milliseconds + # %m = timestamp with milliseconds + # %n = timestamp with milliseconds (as a Unix epoch) + # %Q = query ID (0 if none or not computed) + # %i = command tag + # %e = SQL state + # %c = session ID + # %l = session line number + # %s = session start timestamp + # %v = virtual transaction ID + # %x = transaction ID (0 if none) + # %q = stop here in non-session + # processes + # %% = '%' + # e.g. '<%u%%%d> ' +#log_lock_waits = off # log lock waits >= deadlock_timeout +#log_recovery_conflict_waits = off # log standby recovery conflict waits + # >= deadlock_timeout +#log_parameter_max_length = -1 # when logging statements, limit logged + # bind-parameter values to N bytes; + # -1 means print in full, 0 disables +#log_parameter_max_length_on_error = 0 # when logging an error, limit logged + # bind-parameter values to N bytes; + # -1 means print in full, 0 disables +#log_statement = 'none' # none, ddl, mod, all +#log_replication_commands = off +#log_temp_files = -1 # log temporary files equal or larger + # than the specified size in kilobytes; + # -1 disables, 0 logs all temp files +log_timezone = 'Australia/Sydney' + + +#------------------------------------------------------------------------------ +# PROCESS TITLE +#------------------------------------------------------------------------------ + +#cluster_name = '' # added to process titles if nonempty + # (change requires restart) +#update_process_title = on + + +#------------------------------------------------------------------------------ +# STATISTICS +#------------------------------------------------------------------------------ + +# - Query and Index Statistics Collector - + +#track_activities = on +#track_activity_query_size = 1024 # (change requires restart) +#track_counts = on +#track_io_timing = off +#track_wal_io_timing = off +#track_functions = none # none, pl, all +#stats_temp_directory = 'pg_stat_tmp' + + +# - Monitoring - + +#compute_query_id = auto +#log_statement_stats = off +#log_parser_stats = off +#log_planner_stats = off +#log_executor_stats = off + + +#------------------------------------------------------------------------------ +# AUTOVACUUM +#------------------------------------------------------------------------------ + +#autovacuum = on # Enable autovacuum subprocess? 'on' + # requires track_counts to also be on. +#autovacuum_max_workers = 3 # max number of autovacuum subprocesses + # (change requires restart) +#autovacuum_naptime = 1min # time between autovacuum runs +#autovacuum_vacuum_threshold = 50 # min number of row updates before + # vacuum +#autovacuum_vacuum_insert_threshold = 1000 # min number of row inserts + # before vacuum; -1 disables insert + # vacuums +#autovacuum_analyze_threshold = 50 # min number of row updates before + # analyze +#autovacuum_vacuum_scale_factor = 0.2 # fraction of table size before vacuum +#autovacuum_vacuum_insert_scale_factor = 0.2 # fraction of inserts over table + # size before insert vacuum +#autovacuum_analyze_scale_factor = 0.1 # fraction of table size before analyze +#autovacuum_freeze_max_age = 200000000 # maximum XID age before forced vacuum + # (change requires restart) +#autovacuum_multixact_freeze_max_age = 400000000 # maximum multixact age + # before forced vacuum + # (change requires restart) +#autovacuum_vacuum_cost_delay = 2ms # default vacuum cost delay for + # autovacuum, in milliseconds; + # -1 means use vacuum_cost_delay +#autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for + # autovacuum, -1 means use + # vacuum_cost_limit + + +#------------------------------------------------------------------------------ +# CLIENT CONNECTION DEFAULTS +#------------------------------------------------------------------------------ + +# - Statement Behavior - + +#client_min_messages = notice # values in order of decreasing detail: + # debug5 + # debug4 + # debug3 + # debug2 + # debug1 + # log + # notice + # warning + # error +#search_path = '"$user", public' # schema names +#row_security = on +#default_table_access_method = 'heap' +#default_tablespace = '' # a tablespace name, '' uses the default +#default_toast_compression = 'pglz' # 'pglz' or 'lz4' +#temp_tablespaces = '' # a list of tablespace names, '' uses + # only default tablespace +#check_function_bodies = on +#default_transaction_isolation = 'read committed' +#default_transaction_read_only = off +#default_transaction_deferrable = off +#session_replication_role = 'origin' +#statement_timeout = 0 # in milliseconds, 0 is disabled +#lock_timeout = 0 # in milliseconds, 0 is disabled +#idle_in_transaction_session_timeout = 0 # in milliseconds, 0 is disabled +#idle_session_timeout = 0 # in milliseconds, 0 is disabled +#vacuum_freeze_table_age = 150000000 +#vacuum_freeze_min_age = 50000000 +#vacuum_failsafe_age = 1600000000 +#vacuum_multixact_freeze_table_age = 150000000 +#vacuum_multixact_freeze_min_age = 5000000 +#vacuum_multixact_failsafe_age = 1600000000 +#bytea_output = 'hex' # hex, escape +#xmlbinary = 'base64' +#xmloption = 'content' +#gin_pending_list_limit = 4MB + +# - Locale and Formatting - + +datestyle = 'iso, mdy' +#intervalstyle = 'postgres' +timezone = 'Australia/Sydney' +#timezone_abbreviations = 'Default' # Select the set of available time zone + # abbreviations. Currently, there are + # Default + # Australia (historical usage) + # India + # You can create your own file in + # share/timezonesets/. +#extra_float_digits = 1 # min -15, max 3; any value >0 actually + # selects precise output mode +#client_encoding = sql_ascii # actually, defaults to database + # encoding + +# These settings are initialized by initdb, but they can be changed. +lc_messages = 'en_US.UTF-8' # locale for system error message + # strings +lc_monetary = 'en_US.UTF-8' # locale for monetary formatting +lc_numeric = 'en_US.UTF-8' # locale for number formatting +lc_time = 'en_US.UTF-8' # locale for time formatting + +# default configuration for text search +default_text_search_config = 'pg_catalog.english' + +# - Shared Library Preloading - + +#local_preload_libraries = '' +#session_preload_libraries = '' +#shared_preload_libraries = '' # (change requires restart) +#jit_provider = 'llvmjit' # JIT library to use + +# - Other Defaults - + +#dynamic_library_path = '$libdir' +#gin_fuzzy_search_limit = 0 + + +#------------------------------------------------------------------------------ +# LOCK MANAGEMENT +#------------------------------------------------------------------------------ + +#deadlock_timeout = 1s +#max_locks_per_transaction = 64 # min 10 + # (change requires restart) +#max_pred_locks_per_transaction = 64 # min 10 + # (change requires restart) +#max_pred_locks_per_relation = -2 # negative values mean + # (max_pred_locks_per_transaction + # / -max_pred_locks_per_relation) - 1 +#max_pred_locks_per_page = 2 # min 0 + + +#------------------------------------------------------------------------------ +# VERSION AND PLATFORM COMPATIBILITY +#------------------------------------------------------------------------------ + +# - Previous PostgreSQL Versions - + +#array_nulls = on +#backslash_quote = safe_encoding # on, off, or safe_encoding +#escape_string_warning = on +#lo_compat_privileges = off +#quote_all_identifiers = off +#standard_conforming_strings = on +#synchronize_seqscans = on + +# - Other Platforms and Clients - + +#transform_null_equals = off + + +#------------------------------------------------------------------------------ +# ERROR HANDLING +#------------------------------------------------------------------------------ + +#exit_on_error = off # terminate session on any error? +#restart_after_crash = on # reinitialize after backend crash? +#data_sync_retry = off # retry or panic on failure to fsync + # data? + # (change requires restart) +#recovery_init_sync_method = fsync # fsync, syncfs (Linux 5.8+) + + +#------------------------------------------------------------------------------ +# CONFIG FILE INCLUDES +#------------------------------------------------------------------------------ + +# These options allow settings to be loaded from files other than the +# default postgresql.conf. Note that these are directives, not variable +# assignments, so they can usefully be given more than once. + +#include_dir = '...' # include files ending in '.conf' from + # a directory, e.g., 'conf.d' +#include_if_exists = '...' # include file only if it exists +#include = '...' # include file + + +#------------------------------------------------------------------------------ +# CUSTOMIZED OPTIONS +#------------------------------------------------------------------------------ + +# Add settings for extensions here diff --git a/supporting-files/quarterly_processing/xx_aws_upload.sh b/supporting-files/quarterly_processing/xx_aws_upload.sh index 40deeda..e2e2f47 100644 --- a/supporting-files/quarterly_processing/xx_aws_upload.sh +++ b/supporting-files/quarterly_processing/xx_aws_upload.sh @@ -1,9 +1,9 @@ #!/usr/bin/env bash AWS_PROFILE="minus34" -OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202211" -OUTPUT_FOLDER_2020="/Users/$(whoami)/tmp/geoscape_202211_gda2020" +OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202302" +OUTPUT_FOLDER_2020="/Users/$(whoami)/tmp/geoscape_202302_gda2020" -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202211 --exclude "*" --include "*.dmp" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER} s3://minus34.com/opendata/geoscape-202302 --exclude "*" --include "*.dmp" --acl public-read -aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202211-gda2020 --exclude "*" --include "*.dmp" --acl public-read +aws --profile=${AWS_PROFILE} s3 sync ${OUTPUT_FOLDER_2020} s3://minus34.com/opendata/geoscape-202302-gda2020 --exclude "*" --include "*.dmp" --acl public-read diff --git a/supporting-files/restore-gnaf-admin-bdys.bat b/supporting-files/restore-gnaf-admin-bdys.bat index b202d84..2b9c9ee 100644 --- a/supporting-files/restore-gnaf-admin-bdys.bat +++ b/supporting-files/restore-gnaf-admin-bdys.bat @@ -1,7 +1,7 @@ psql -d geo -p 5432 -U postgres -c "CREATE EXTENSION IF NOT EXISTS postgis;" -"C:\Program Files\PostgreSQL\12\bin\pg_restore" -Fc -d geo -p 5432 -U postgres "C:\git\minus34\gnaf-202211.dmp" -"C:\Program Files\PostgreSQL\12\bin\pg_restore" -Fc -d geo -p 5432 -U postgres "C:\git\minus34\admin-bdys-202211.dmp" +"C:\Program Files\PostgreSQL\12\bin\pg_restore" -Fc -d geo -p 5432 -U postgres "C:\git\minus34\gnaf-202302.dmp" +"C:\Program Files\PostgreSQL\12\bin\pg_restore" -Fc -d geo -p 5432 -U postgres "C:\git\minus34\admin-bdys-202302.dmp" pause \ No newline at end of file diff --git a/supporting-files/restore-gnaf-admin-bdys.sh b/supporting-files/restore-gnaf-admin-bdys.sh index 4c46163..2f589af 100644 --- a/supporting-files/restore-gnaf-admin-bdys.sh +++ b/supporting-files/restore-gnaf-admin-bdys.sh @@ -10,13 +10,13 @@ psql -d geo -p 5432 -U postgres -c "CREATE EXTENSION IF NOT EXISTS postgis;" cd /Users/$(whoami)/Downloads -curl --insecure https://minus34.com/opendata/geoscape-202211/gnaf-202211.dmp --output ./gnaf-202211.dmp -/Applications/Postgres.app/Contents/Versions/14/bin/pg_restore -Fc -d geo -p 5432 -U postgres ./gnaf-202211.dmp -rm ./gnaf-202211.dmp +curl --insecure https://minus34.com/opendata/geoscape-202302/gnaf-202302.dmp --output ./gnaf-202302.dmp +/Applications/Postgres.app/Contents/Versions/14/bin/pg_restore -Fc -d geo -p 5432 -U postgres ./gnaf-202302.dmp +rm ./gnaf-202302.dmp -curl --insecure https://minus34.com/opendata/geoscape-202211/admin-bdys-202211.dmp --output ./admin-bdys-202211.dmp -/Applications/Postgres.app/Contents/Versions/14/bin/pg_restore -Fc -d geo -p 5432 -U postgres ./admin-bdys-202211.dmp -rm ./admin-bdys-202211.dmp +curl --insecure https://minus34.com/opendata/geoscape-202302/admin-bdys-202302.dmp --output ./admin-bdys-202302.dmp +/Applications/Postgres.app/Contents/Versions/14/bin/pg_restore -Fc -d geo -p 5432 -U postgres ./admin-bdys-202302.dmp +rm ./admin-bdys-202302.dmp duration=$SECONDS diff --git a/supporting-files/xx-export-both-gnaf-schemas.sh b/supporting-files/xx-export-both-gnaf-schemas.sh index bd30176..cf02752 100644 --- a/supporting-files/xx-export-both-gnaf-schemas.sh +++ b/supporting-files/xx-export-both-gnaf-schemas.sh @@ -1,4 +1,4 @@ #!/usr/bin/env bash -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202211 -p 5432 -U postgres -f /Users/$(whoami)/git/minus34/gnaf-202211.dmp -/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n raw_gnaf_202211 -p 5432 -U postgres -f /Users/$(whoami)/git/minus34/raw-gnaf-202211.dmp \ No newline at end of file +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n gnaf_202302 -p 5432 -U postgres -f /Users/$(whoami)/git/minus34/gnaf-202302.dmp +/Applications/Postgres.app/Contents/Versions/14/bin/pg_dump -Fc -d geo -n raw_gnaf_202302 -p 5432 -U postgres -f /Users/$(whoami)/git/minus34/raw-gnaf-202302.dmp \ No newline at end of file diff --git a/supporting-files/xx-test-run-gnaf-loader-locality-clean-both-datums.sh b/supporting-files/xx-test-run-gnaf-loader-locality-clean-both-datums.sh index 7164032..90981dc 100644 --- a/supporting-files/xx-test-run-gnaf-loader-locality-clean-both-datums.sh +++ b/supporting-files/xx-test-run-gnaf-loader-locality-clean-both-datums.sh @@ -11,7 +11,7 @@ SCRIPT_DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" >/dev/null 2>&1 && pwd )" # --------------------------------------------------------------------------------------------------------------------- AWS_PROFILE="minus34" -OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202211" +OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202302" GNAF_PATH="/Users/$(whoami)/Downloads/g-naf_feb22_allstates_gda94_psv_105" BDYS_PATH="/Users/$(whoami)/Downloads/FEB22_AdminBounds_GDA94_SHP" GNAF_2020_PATH="/Users/$(whoami)/Downloads/g-naf_feb22_allstates_gda2020_psv_105" @@ -21,12 +21,12 @@ echo "-------------------------------------------------------------------------- echo "Run gnaf-loader and locality boundary clean" echo "---------------------------------------------------------------------------------------------------------------------" -python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pgdb=geo --max-processes=6 --gnaf-tables-path="${GNAF_PATH}" --admin-bdys-path="${BDYS_PATH}" --gnaf-schema gnaf_202211 --admin-schema admin_bdys_202211 --previous-gnaf-schema gnaf_202208 --previous-admin-schema admin_bdys_202208 -python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER} --admin-schema admin_bdys_202211 +python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pgdb=geo --max-processes=6 --gnaf-tables-path="${GNAF_PATH}" --admin-bdys-path="${BDYS_PATH}" --gnaf-schema gnaf_202302 --admin-schema admin_bdys_202302 --previous-gnaf-schema gnaf_202211 --previous-admin-schema admin_bdys_202211 +python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER} --admin-schema admin_bdys_202302 echo "---------------------------------------------------------------------------------------------------------------------" echo "Run gnaf-loader and locality boundary clean - GDA2020" echo "---------------------------------------------------------------------------------------------------------------------" -python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pgdb=geo --max-processes=6 --gnaf-tables-path="${GNAF_2020_PATH}" --admin-bdys-path="${BDYS_2020_PATH}" --srid=7844 --gnaf-schema gnaf_202211_gda2020 --admin-schema admin_bdys_202211_gda2020 --previous-gnaf-schema gnaf_202211 --previous-admin-schema admin_bdys_202211 -python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER} --admin-schema admin_bdys_202211_gda2020 +python3 /Users/$(whoami)/git/minus34/gnaf-loader/load-gnaf.py --pgport=5432 --pgdb=geo --max-processes=6 --gnaf-tables-path="${GNAF_2020_PATH}" --admin-bdys-path="${BDYS_2020_PATH}" --srid=7844 --gnaf-schema gnaf_202302_gda2020 --admin-schema admin_bdys_202302_gda2020 --previous-gnaf-schema gnaf_202302 --previous-admin-schema admin_bdys_202302 +python3 /Users/$(whoami)/git/iag_geo/psma-admin-bdys/locality-clean.py --pgport=5432 --pgdb=geo --max-processes=6 --output-path=${OUTPUT_FOLDER} --admin-schema admin_bdys_202302_gda2020 diff --git a/supporting-files/xx_copy_table_between_databases.py b/supporting-files/xx_copy_table_between_databases.py index 4e147eb..80b239f 100644 --- a/supporting-files/xx_copy_table_between_databases.py +++ b/supporting-files/xx_copy_table_between_databases.py @@ -13,7 +13,7 @@ source_platform = "postgres" source_credentials = "localhost_super" -source_schema = "gnaf_202211" +source_schema = "gnaf_202302" source_table = "boundary_concordance" target_platform = "postgres" diff --git a/testing/covid19/xx_5km_within_lga.sql b/testing/covid19/xx_5km_within_lga.sql index 8815cd1..e66c821 100644 --- a/testing/covid19/xx_5km_within_lga.sql +++ b/testing/covid19/xx_5km_within_lga.sql @@ -13,12 +13,12 @@ DROP TABLE IF EXISTS testing.five_km_radius; CREATE TABLE testing.five_km_radius AS WITH pnt AS ( SELECT st_setsrid(st_makepoint(longitude, latitude), 4283) AS geom - FROM gnaf_202211.address_principals + FROM gnaf_202302.address_principals WHERE address = '/dev/null 2>&1 && pwd )" -OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202211/geoparquet" +OUTPUT_FOLDER="/Users/$(whoami)/tmp/geoscape_202302/geoparquet" mkdir -p "${OUTPUT_FOLDER}" cd "${OUTPUT_FOLDER}" @@ -17,7 +17,7 @@ cd "${OUTPUT_FOLDER}" # get list of tables to export rm tables.txt -for input_schema in "admin_bdys_202211" "gnaf_202211" +for input_schema in "admin_bdys_202302" "gnaf_202302" do QUERY="SELECT concat(table_schema, '.', table_name) FROM information_schema.tables diff --git a/testing/geoparquet/xx_setup_conda_env.sh b/testing/geoparquet/xx_setup_conda_env.sh index 8cd27f2..e8238dc 100644 --- a/testing/geoparquet/xx_setup_conda_env.sh +++ b/testing/geoparquet/xx_setup_conda_env.sh @@ -23,7 +23,7 @@ conda create -y -n gdal python=${PYTHON_VERSION} # activate and setup env conda activate gdal -conda env config vars set JAVA_HOME="/usr/local/opt/openjdk@11" +#conda env config vars set JAVA_HOME="/opt/homebrew/opt/openjdk@11" conda config --env --add channels conda-forge conda config --env --set channel_priority strict @@ -57,7 +57,7 @@ cd cd geoparquet/validator/python pip install --no-binary geoparquet_validator . # sample usage -#geoparquet_validator /Users/s57405/tmp/geoscape_202211/geoparquet/address_principals.parquet +#geoparquet_validator /Users/s57405/tmp/geoscape_202302/geoparquet/address_principals.parquet # -------------------------- # extra bits diff --git a/testing/geoparquet/xx_test_data_load.py b/testing/geoparquet/xx_test_data_load.py index 64fbbeb..7ffd423 100644 --- a/testing/geoparquet/xx_test_data_load.py +++ b/testing/geoparquet/xx_test_data_load.py @@ -1,7 +1,7 @@ # script to download and load a remote GeoParquet file for when Apache Sedona supports the emerging GeoParquet format # -# NOTE: as of 20221120 - geometry field currently loads as binary type; should be geometry type when supported +# NOTE: as of 20230220 - geometry field currently loads as binary type; should be geometry type when supported # import base64 @@ -18,7 +18,7 @@ # input path for parquet file # input_url = "https://storage.googleapis.com/open-geodata/linz-examples/nz-buildings-outlines.parquet" # input_path = "/Users/s57405/tmp/nz-building-outlines.parquet" -input_path = "/Users/s57405/tmp/geoscape_202211/geoparquet/address_principals.parquet" +input_path = "/Users/s57405/tmp/geoscape_202302/geoparquet/address_principals.parquet" # number of CPUs to use in processing (defaults to number of local CPUs) num_processors = cpu_count() diff --git a/testing/qgis/01_setup_conda_env.sh b/testing/qgis/01_setup_conda_env.sh index 6d5fde5..a145359 100644 --- a/testing/qgis/01_setup_conda_env.sh +++ b/testing/qgis/01_setup_conda_env.sh @@ -26,7 +26,7 @@ conda create -y -n ${ENV_NAME} python=${PYTHON_VERSION} # activate and setup env conda activate ${ENV_NAME} -#conda env config vars set JAVA_HOME="/usr/local/opt/openjdk@11" +#conda env config vars set JAVA_HOME="/opt/homebrew/opt/openjdk@11" conda config --env --add channels conda-forge conda config --env --set channel_priority strict diff --git a/testing/visualisation/02_create_view.sql b/testing/visualisation/02_create_view.sql index dddd829..993f23e 100644 --- a/testing/visualisation/02_create_view.sql +++ b/testing/visualisation/02_create_view.sql @@ -1,10 +1,10 @@ ---DROP TABLE IF EXISTS gnaf_202211.temp_address_principals; ---CREATE TABLE gnaf_202211.temp_address_principals AS +--DROP TABLE IF EXISTS gnaf_202302.temp_address_principals; +--CREATE TABLE gnaf_202302.temp_address_principals AS COPY ( SELECT longitude AS x, latitude AS y - FROM gnaf_202211.address_principals + FROM gnaf_202302.address_principals ) TO '/Users/hugh.saalmans/tmp/address_principals_point.csv' HEADER CSV; @@ -13,7 +13,7 @@ COPY ( --SELECT gid, -- longitude AS x, -- latitude AS y ---FROM gnaf_202211.address_principals; +--FROM gnaf_202302.address_principals; -- ---ALTER TABLE ONLY gnaf_202211.temp_address_principals +--ALTER TABLE ONLY gnaf_202302.temp_address_principals -- ADD CONSTRAINT temp_address_principals_pk PRIMARY KEY (gid); diff --git a/testing/weather/02_process_weather.py b/testing/weather/02_process_weather.py index 845fd73..72d99c5 100644 --- a/testing/weather/02_process_weather.py +++ b/testing/weather/02_process_weather.py @@ -134,7 +134,7 @@ def main(): # select GNAF coordinates - group by 3 decimal places to create a ~100m grid of addresses # sql = """SELECT latitude::numeric(5,3) as latitude, longitude::numeric(6,3) as longitude, count(*) as address_count - # FROM gnaf_202211.address_principals + # FROM gnaf_202302.address_principals # GROUP BY latitude::numeric(5,3), longitude::numeric(6,3)""" sql = """SELECT * FROM testing.gnaf_points_with_pop_and_height""" gnaf_df = pandas.read_sql_query(sql, pg_conn) diff --git a/testing/weather/xx_process_weather_rain.py b/testing/weather/xx_process_weather_rain.py index c587006..a5baae5 100644 --- a/testing/weather/xx_process_weather_rain.py +++ b/testing/weather/xx_process_weather_rain.py @@ -128,7 +128,7 @@ def main(): # select GNAF coordinates - group by 3 decimal places to create a ~100m grid of addresses # sql = """SELECT latitude::numeric(5,3) as latitude, longitude::numeric(6,3) as longitude, count(*) as address_count - # FROM gnaf_202211.address_principals + # FROM gnaf_202302.address_principals # GROUP BY latitude::numeric(5,3), longitude::numeric(6,3)""" # sql = """SELECT * FROM testing.gnaf_points_with_pop_and_height""" # gnaf_df = pandas.read_sql_query(sql, pg_conn) diff --git a/testing/web/flatgeobuf/export_to_flatgeobuf.sh b/testing/web/flatgeobuf/export_to_flatgeobuf.sh index 3aedaf3..6def337 100644 --- a/testing/web/flatgeobuf/export_to_flatgeobuf.sh +++ b/testing/web/flatgeobuf/export_to_flatgeobuf.sh @@ -6,23 +6,23 @@ output_folder="/Users/$(whoami)/tmp" # full addresses -ogr2ogr -f FlatGeobuf ${output_folder}/address-principals-202211.fgb \ -PG:"host=localhost dbname=geo user=postgres password=password port=5432" "gnaf_202211.address_principals(geom)" +ogr2ogr -f FlatGeobuf ${output_folder}/address-principals-202302.fgb \ +PG:"host=localhost dbname=geo user=postgres password=password port=5432" "gnaf_202302.address_principals(geom)" # just GNAF PIDs and point geometries ogr2ogr -f FlatGeobuf ${output_folder}/address-principals-lite-202102.fgb \ PG:"host=localhost dbname=geo user=postgres password=password port=5432" -sql "select gnaf_pid, ST_Transform(geom, 4326) as geom from gnaf_202102.address_principals" # display locality boundaries -ogr2ogr -f FlatGeobuf ${output_folder}/address-principals-202211.fgb \ -PG:"host=localhost dbname=geo user=postgres password=password port=5432" "admin_bdys_202211.locality_bdys_display(geom)" +ogr2ogr -f FlatGeobuf ${output_folder}/address-principals-202302.fgb \ +PG:"host=localhost dbname=geo user=postgres password=password port=5432" "admin_bdys_202302.locality_bdys_display(geom)" # OPTIONAL - copy files to AWS S3 and allow public read access (requires AWSCLI installed and your AWS credentials setup) cd ${output_folder} -for f in *-202211.fgb; +for f in *-202302.fgb; do - aws --profile=default s3 cp --storage-class REDUCED_REDUNDANCY ./${f} s3://minus34.com/opendata/geoscape-202211/flatgeobuf/${f}; - aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202211/flatgeobuf/${f} + aws --profile=default s3 cp --storage-class REDUCED_REDUNDANCY ./${f} s3://minus34.com/opendata/geoscape-202302/flatgeobuf/${f}; + aws --profile=default s3api put-object-acl --acl public-read --bucket minus34.com --key opendata/geoscape-202302/flatgeobuf/${f} echo "${f} uploaded to AWS S3" done diff --git a/testing/xx_export_to_csv.sh b/testing/xx_export_to_csv.sh index ace9dcb..14074a2 100644 --- a/testing/xx_export_to_csv.sh +++ b/testing/xx_export_to_csv.sh @@ -13,7 +13,7 @@ cd "${OUTPUT_FOLDER}" # convert Postgres table to CSV with CSVT field types file -input_schema="gnaf_202211" +input_schema="gnaf_202302" input_table="address_principals" echo "Exporting ${input_schema}.${input_table}"