Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MySQL Import script for NLExtract dumps #220

Open
holtkamp opened this issue Jun 13, 2017 · 2 comments
Open

MySQL Import script for NLExtract dumps #220

holtkamp opened this issue Jun 13, 2017 · 2 comments

Comments

@holtkamp
Copy link

holtkamp commented Jun 13, 2017

First of all: thanks a lot for making all this BAG data accessible for noobs 😄

To be able to easily import the .csv dumps listed at http://data.nlextract.nl/bag/csv/ into a MySQL database, the following SQL statements might be useful:

# Create the table to import the data to. Note that the default storage engine, character set and collation will be used
CREATE TABLE bagadres (
  object_id BIGINT(16) NOT NULL,
  openbareruimte VARCHAR(80) DEFAULT NULL,
  huisnummer INT(5) DEFAULT NULL,
  huisletter CHAR(1) DEFAULT NULL,
  huisnummertoevoeging VARCHAR(4) DEFAULT NULL,
  postcode CHAR(6) DEFAULT NULL,
  woonplaats VARCHAR(80) DEFAULT NULL,
  gemeente VARCHAR(80) DEFAULT NULL,
  provincie VARCHAR(16) DEFAULT NULL,
  object_type CHAR(3) NOT NULL,
  nevenadres VARCHAR(1) DEFAULT NULL,
  x DECIMAL(9,3) DEFAULT NULL,
  y DECIMAL(9,3) DEFAULT NULL,
  lon DECIMAL(10,8) NOT NULL,
  lat DECIMAL(11,8) NOT NULL,
  point POINT DEFAULT NULL
);

# Load the data from the CSV file into the table
LOAD DATA LOCAL INFILE '/absolute/path/to/your/file/bagadres.csv'
INTO TABLE bagadres
FIELDS TERMINATED by ';'
LINES TERMINATED BY '\n'
IGNORE 1 LINES;

# MySQL uses 0 and 1 to represent BOOLEAN values
UPDATE bagadres SET nevenadres = 0 WHERE nevenadres = 'f';
UPDATE bagadres SET nevenadres = 1 WHERE nevenadres = 't';
ALTER TABLE bagadres CHANGE nevenadres nevenadres BOOLEAN NOT NULL;

# Replace empty strings with NULL values
UPDATE bagadres SET huisnummertoevoeging = NULL WHERE huisnummertoevoeging = '';
UPDATE bagadres SET huisletter = NULL WHERE huisletter = '';

# Populate the "point" column with values from the "lat" and "lon" columns
UPDATE bagadres SET point = POINT(lat, lon);

# To increase performance of searching, some indices might be useful
ALTER TABLE bagadres ADD INDEX (postcode);
ALTER TABLE bagadres ADD INDEX (object_id);
#ALTER TABLE bagadres ADD UNIQUE INDEX (object_id); #Adding a unique index fails due to a duplicate object_id "363010000785105"...

I would have created a Pull Request, but I am not sure you guys intend to adopt MySQL related stuff in this repository...

@holtkamp holtkamp changed the title MySQL Import script voor NLExtract dumps MySQL Import script for NLExtract dumps Jun 14, 2017
@justb4
Copy link
Contributor

justb4 commented Jun 14, 2017

We welcome contributions like these. There are so many use-cases through which people use BAG data. I was also thinking of alternative DB formats like sqlite and geopackage. We need to find a proper place for the PR code. I suggest under bag/contributions/mysql

Remark: For most other key-registries like BRT-Top10NL, BGT and BRK, NLExtract uses Stetl as the ETL tool. A Stetl config, like for example BRK ETL allows for multiple Output destinations: PostGIS, Shapefile (any OGR) etc. For historic reasons NLExtract-BAG is a custom Python program. In the longer term we plan to rewrite NLExtract-BAG to Stetl. That would make alternative outputs more directly possibly, but this is a 'dot on the horizon'...

@holtkamp
Copy link
Author

holtkamp commented Dec 9, 2019

Even een update om de import wat efficiënter te laten verlopen:

CREATE TABLE bagadres (
  openbareruimte VARCHAR(80) NOT NULL,
  huisnummer INT(5) NOT NULL,
  huisletter CHAR(1) DEFAULT NULL,
  huisnummertoevoeging VARCHAR(4) DEFAULT NULL,
  postcode CHAR(6) NOT NULL,
  woonplaats VARCHAR(80) NOT NULL,
  gemeente VARCHAR(80) NOT NULL,
  provincie VARCHAR(16) NOT NULL,
  object_id BIGINT(16) NOT NULL,
  object_type CHAR(3) NOT NULL,
  nevenadres BOOLEAN NOT NULL,
  x DECIMAL(9,3) NOT NULL,
  y DECIMAL(9,3) NOT NULL,
  lon DECIMAL(10,8) NOT NULL,
  lat DECIMAL(11,8) NOT NULL,
  point POINT NOT NULL,
  KEY object_id (object_id),
  KEY postcode (postcode)
);

LOAD DATA INFILE '/absolute/path/to/your/file/bagadres.csv'
INTO TABLE bagadres
FIELDS TERMINATED by ';'
LINES TERMINATED BY '\n'
IGNORE 1 LINES
(openbareruimte,huisnummer,huisletter,huisnummertoevoeging,postcode,woonplaats,gemeente,provincie,object_id,object_type,@nevenadres,x,y,lon,lat)
SET huisletter = NULLIF(@huisletter, ''),
    huisnummertoevoeging = NULLIF(@huisnummertoevoeging, ''),
    nevenadres = CASE WHEN @nevenadres = 'f' THEN 0
                      WHEN @nevenadres = 't' THEN 1
        END,
    point = POINT(lat,lon);

Let op, minimaal MySQL 8.0.18 vereist omdat de point kolom geen default value heeft wat tot problemen leidt door https://bugs.mysql.com/bug.php?id=91893

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants