Git Product home page Git Product logo

bagconv's Introduction

bagconv

Convert the official Dutch building/dwelling administration into a simpler format. Also includes a small and fast web service for sharing this data.

These tools will allow you do straightforward queries to go from an address to a postcode and vice-versa. You also get the coordinates of an address, and if you want, the contours of the buildings the address is in. Using SQLite R*Tree tables, you can also quickly go from coordinates to an address.

STATUS: Pretty fresh - if you use this in production, do let me know ([email protected])

Also, see this related project from Bart Noordervliet which implements a more generic solution to convert the BAG (and other XML constructs) to PostgreSQL and PostGIS. Jan Derk also wrote a tool (in Python) that converts the BAG XML to SQLite.

It all starts out with the tremendously detailed set of XML files from the Dutch land registry: LV BAG with History. The official documentation (in Dutch) can be found on the GitHub page of the Dutch Land Registry and Mapping Agency.

The BAG files contain data on:

  • Almost every Dutch building (a Pand), and its 2D shape
  • All recognized dwellings (Verblijfsobject), including mooring points for boats and assigned places for mobile homes, and its 2D location
  • All public spaces (Openbare ruimte) that could be a street address
  • All addresses assigned to the dwellings (Nummeraanduiding)

These four entities have a complicated relation. For example, a building (pand) may contain multiple places of dwelling, but a place of dwelling might also extend over multiple buildings. A building might also not be a place of dwelling (a shed, for example). Also, a single place of dwelling might have multiple addresses.

The XML files contain data on over 30 million objects, and their relations. Oh, and the history. This is around 75GB of XML.

Making life simpler

If you have downloaded and unzipped the zip file containing zip files of the XML files you can run the code in this repository like this:

$ ./bagconv 9999{WPL,OPR,NUM,VBO,LIG,STA,PND}*.xml > txt

This will 1) create a text file with debugging output and 2) populate a database called bag.sqlite

Note: the WPL,OPR etc order is important for the tool to function correctly

If you run sqlite3 bag.sqlite < mkindx, this will add useful indexes and views.

This allows stuff like:

sqlite> select * from unilabel where straat ='Binnenhof' and huisnummer='19' and woonplaats="'s-Gravenhage";
|  straat   | huisnummer | huisletter | huistoevoeging |  woonplaats   | postcode |     x     |     y      | oppervlakte |  gebruiksdoel  
|-----------|------------|------------|----------------|---------------|----------|-----------|------------|-------------|----------------
| Binnenhof | 19         |            |                | 's-Gravenhage | 2513AA   | 81431.756 | 455218.921 | 7726        | kantoorfunctie 
Run Time: real 0.000 user 0.000000 sys 0.000681

(there are actually more fields which tell you the status of registrations, plus the BAG identifiers, but I've omitted these here).

Incidentally, this is an example of of the tens of thousands of addresses missing from the 'BAG Geopackage' which you can also download from the official URL above.

The x and y coordinates are according to the Dutch reference grid. Also included in the database are WGS84 coordinates.

The tables in this database correspond 1:1 to the BAG objects, where the relations between entities are described via the intermediate tables vbo_num (from Verblijfsobject (place of dwelling) to Nummeraanduiding (address)) and vbo_pnd (from Verblijfsobject to Pand (building)).

The tables are:

  • oprs - public spaces, like streets and squares.
  • vbos - places of dwelling, but also mooring points and mobile home locations. x and y are coordinates, in the Dutch specific coordinate system. Also, WGS84 latitude and longitude. In addition, includes an indication what this place of dwelling is for (housing, education, office etc)
  • pnds - Panden or buildings. Includes a 2D shape!
  • nums - Nummeraanduidingen - addresses, including postcode

In the mkindx file you'll find useful views that make querying easier.

Geographical tables

If you run the SQL in geo-queries, a new table gets populated using the SQLite R*Tree module. This table (geoindex) can be queried rapidly to find vbos within certain x and y coordinates, or within certain longitudes and lattitudes. Use the vbo_id field to find associated places of dwelling.

Sample query:

select x as rdX, y as rdY,straat,woonplaats,postcode, (lat-?)*(lat-?)+(lon-?)*(lon-?) as deg2dist 
from geoindex,alllabel where alllabel.vbo_id = geoindex.vbo_id and 
minLat > ? and maxLat < ? and minLon > ? and maxLon < ? 
order by deg2dist asc limit 10
{lat, lat, lon, lon, lat-0.005, lat+0.005, lon-0.005, lon+0.005})

The question marks need to be replaced with the things between {curly braces} below the SQL.

Validity periods

When comparing the output of this tool to commercial offerings or the excellent live official database, you can find small discrepancies, mostly related to the validity period.

The montly extract of the BAG pre-announces changes that will happen in the (near) future. When emlconv runs, it checks the validity period of all entries against the current date, and then emits to the CSV file and SQLite database data that is valid at that time.

This means that if you regenerate the CSV file and database after a few weeks, the contents will be different. Conversely, if you do not regenerate, the output created earlier will list data that is by now invalid.

Addresses which have existed, but are no longer valid, or addresses which will exist, are stored in inactnums.

Some painful words on case

In the database, houseletters can be both lowercase and uppercase. Some streets even have both upper and lower case letters. Users of addresses are mostly not aware if an address has an upper or lower case letter. However, the lower and upper case are officially different.

To deal with this, the sqlite database is provisioned with 'collate nocase' for huisletter and huistoevoeging (the additional extra field). This means that lookups for huisletter='a' and huisletter='A' deliver the same result.

Some examples

This gets you everything for the headquarters of the Dutch Kadaster agency, including the shape of their building:

sqlite> select * from alllabel where straat='Hofstraat' and huisnummer=110 and woonplaats='Apeldoorn';
        straat = Hofstraat
    huisnummer = 110
    huisletter = 
huistoevoeging = 
    woonplaats = Apeldoorn
      postcode = 7311KZ
             x = 194315.783
             y = 469449.074
           lon = 5.96244253360916
           lat = 52.2117344207437
   oppervlakte = 8870
gebruiksdoelen = ["kantoorfunctie"]
      bouwjaar = 1985
    num_status = Naamgeving uitgegeven
    vbo_status = Verblijfsobject in gebruik
      vbo_type = vbo
        num_id = 0200200000007079
        vbo_id = 0200010000090244
        opr_id = 0200300022471548
        pnd_id = 0200100000001088

If you just want to get from postcode to address:

sqlite> select * from unilabel where postcode='7311KZ' and huisnummer=110;
        straat = Hofstraat
    huisnummer = 110
    huisletter = 
huistoevoeging = 
    woonplaats = Apeldoorn
      postcode = 7311KZ
             x = 194315.783
             y = 469449.074
           lon = 5.96244253360916
           lat = 52.2117344207437
   oppervlakte = 8870
gebruiksdoelen = ["kantoorfunctie"]
    num_status = Naamgeving uitgegeven
    vbo_status = Verblijfsobject in gebruik
      vbo_type = vbo
        num_id = 0200200000007079
        vbo_id = 0200010000090244
        opr_id = 0200300022471548

Compiling

Make sure you have cmake and SQLite development files installed, and then run:

cmake .
make

The http server

If you run bagserv 2345, you can send it the following queries:

The answer in all cases is straightforward JSON with one or more addresses in there:

[
  {
    "bouwjaar": "1985",
    "gebruiksdoelen": ["kantoorfunctie"],
    "huisletter": "",
    "huisnummer": "110",
    "huistoevoeging": "",
    "lat": "52.2117344207437",
    "lon": "5.96244253360916",
    "num_status": "Naamgeving uitgegeven",
    "oppervlakte": "8870",
    "straat": "Hofstraat",
    "vbo_status": "Verblijfsobject in gebruik",
    "woonplaats": "Apeldoorn"
  }
]

Often, the testing instance on https://berthub.eu/pcode/ will be active. Send it queries like https://berthub.eu/pcode/7311KZ/110.

Fun testing addresses

  • Binnenhof 19, 's-Gravenhage: One VBO with several addresses (Nummerindicaties)
  • Schiedamseweg 56, Rotterdam: One VBO, one Nummerindicatie but 5 buildings (PNDs)

Docker

Building the Docker image requires at least 120GB of free disk space during build time and can take up to 15 minutes to build. The resulting bag.sqlite and docker image combined will be approximately 10GB in size.

Please make sure you are using a recent Docker engine version (for example 24.0.5 or newer).

$ docker build -t bagconv:latest .
$ docker run -p 1234:1234 bagconv:latest

$ curl -s http://127.0.0.1:1234/7311KZ/110 | jq
[
  {
    "bouwjaar": "1985",
    "gebruiksdoelen": ["kantoorfunctie"],
    "huisletter": "",
    "huisnummer": "110",
    "huistoevoeging": "",
    "lat": "52.2117344207437",
    "lon": "5.96244253360916",
    "num_status": "Naamgeving uitgegeven",
    "oppervlakte": "8870",
    "rdX": "194315.783",
    "rdY": "469449.074",
    "straat": "Hofstraat",
    "vbo_status": "Verblijfsobject in gebruik",
    "woonplaats": "Apeldoorn"
  }
]

To clean up build artifacts you can (after running docker run bagconv:latest at least once) run the following command. Warning: this will remove all cache and dangling images and containers from your host. This should free up approximately 110GB of space on your machine.

docker system prune

bagconv's People

Contributors

berthubert avatar lesander avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar

bagconv's Issues

Missing location of ships ("ligplaatsen") and maybe other objects as well

pcodes-geo.zip contains lots of invalid locations (x=-1, y=-1) and it appears that these are (mostly?) ships.
bagserv gives no output at all for such addresses. E.g. 2231BG 293.
(btw the official BAG viewer map also seems to struggle a bit: The location (blue dot) only appears on the map after a search but not after clicking on the object)

Adding zipcode to coordinate method

The method that returns an object given a latitude and longitude (bag.nl/x/y) is really useful, but would hugely benefit from also returning a zip or postcode per object. Perhaps there's some other way to get pcodes from coordinates?

Make it easier to find historical addresses

Sometimes e.g. house numbers are changed. It would be great if it would be easier to find historical addresses (by searching with e.g. the old house number) and also e.g. be able to find the current address for it.

No coordinates for non-vbo vbo_types

There are no coordinates (i.e. -1.0,-1.0) for e.g.:

straat huisnummer huisletter huistoevoeging woonplaats postcode x y lon lat oppervlakte gebruiksdoelen num_status vbo_status vbo_type num_id vbo_id opr_id
Piekenwaardweg 10 a 202 Kerkdriel 5331PD -1.0 -1.0 3.31357161339861 47.9747577455202 -1 [] Naamgeving uitgegeven Plaats aangewezen sta 0263200000366237 0263030000012451 0263300000000417

This seems to be the case for all non-vbo vbo_types, as can be confirmed by the following SQL query returning 0 results:

SELECT * FROM unilabel WHERE vbo_type <> 'vbo' AND x <> -1.0;

date filter is not correct

We now always believe the "newest" version. However, for example for num ID 0302200000188380 the BAG contains news that a house number will change per 1st of July (which it isn't yet). But we believe it already.

RD based coordinate search in bagserv

I think it would be neat if bagserv would also provide a way to search using RD-coordinates.

I've tried implementing it using this code, but I think something is wrong with the SQL query; which I can imagine since I just copied the WGS84 search and changed lat to x and lon to y. What would be the proper way to implement this?

diff --git a/bagserv.cc b/bagserv.cc
index e268c14..8f8dcf4 100644
--- a/bagserv.cc
+++ b/bagserv.cc
@@ -139,7 +139,25 @@ int main(int argc, char**argv)
     }
   });
 
-  
+  // Search based on RD coordinates in the form of /rd/{x}/{y}
+  svr.Get(R"(/rd/(\d*\.\d*)/(\d*\.\d*))", [&sqw, &sqwlock](const httplib::Request &req, httplib::Response &res) {
+    try {
+      auto xstr = (string)req.matches[1];
+      auto ystr = (string)req.matches[2];
+      double x = atof(xstr.c_str());
+      double y = atof(ystr.c_str());
+      cout<<"Query for "<< x <<", "<< y <<endl;
+      vector<unordered_map<string,string>> result;
+      {
+        std::lock_guard<mutex> l(sqwlock);
+        result=sqw.query("select x as rdX, y as rdY,straat,woonplaats,huisnummer,huisletter,huistoevoeging,oppervlakte,bouwjaar,lon,lat,gebruiksdoel,num_status,vbo_status, (x-?)*(x-?)+(y-?)*(y-?) as deg2dist from geoindex,alllabel where alllabel.vbo_id = geoindex.vbo_id and minX > ? and maxX < ? and minY > ? and maxY < ? order by deg2dist asc limit 1", {x, x, y, y, x-0.005, x+0.005, y-0.005, y+0.005});
+      }
+      res.set_content(packResults(result), "application/json");
+    }
+    catch(exception& e) {
+      cerr<<"Error: "<<e.what()<<endl;
+    }
+  });
   
   svr.listen("0.0.0.0", argc==1 ? 8080 : atoi(argv[1]));
 }

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.