marook / osm-read Goto Github PK

View Code? Open in Web Editor NEW

103.0 8.0 22.0 938 KB

an openstreetmap XML and PBF data parser for node.js and the browser

License: GNU Lesser General Public License v3.0

HTML 1.52% JavaScript 98.22% Shell 0.26%

osm-read's Introduction

osm-read - an openstreetmap XML and PBF parser for node.js and the browser

Introduction
Usage Examples
1. Simple Usage Example
2. Parse OSM XML from URL Example
3. PBF random access parser
Version Upgrade Guide
TODOs
License
Contact

Introduction

osm-read parses openstreetmap XML and PBF files as described in http://wiki.openstreetmap.org/wiki/OSM_XML and http://wiki.openstreetmap.org/wiki/PBF_Format

Continuous Integration

Simple Usage Example

The following code is used to parse openstreetmap XML or PBF files in a SAX parser like callback way.

var parser = osmread.parse({
    filePath: 'path/to/osm.xml',
    endDocument: function(){
        console.log('document end');
    },
    bounds: function(bounds){
        console.log('bounds: ' + JSON.stringify(bounds));
    },
    node: function(node){
        console.log('node: ' + JSON.stringify(node));
    },
    way: function(way){
        console.log('way: ' + JSON.stringify(way));
    },
    relation: function(relation){
        console.log('relation: ' + JSON.stringify(relation));
    },
    error: function(msg){
        console.log('error: ' + msg);
    }
});

// you can pause the parser
parser.pause();

// and resume it again
parser.resume();

Parse PBF in the browser

The browser bundle 'osm-read-pbf.js' provides a global variable 'pbfParser' with a 'parse' method.

Example, see also example/pbf.html:

<script src="../osm-read-pbf.js"></script>
<script>
    pbfParser.parse({
        filePath: 'test.pbf',
        endDocument: function(){
            console.log('document end');
        },
        node: function(node){
            console.log('node: ' + JSON.stringify(node));
        },
        way: function(way){
            console.log('way: ' + JSON.stringify(way));
        },
        relation: function(relation){
            console.log('relation: ' + JSON.stringify(relation));
        },
        error: function(msg){
            console.error('error: ' + msg);
            throw msg;
        }
    });
</script>

As an alternative to passing an URL in "filePath", the option "buffer" can be used to pass an already loaded ArrayBuffer object:

var buf = ... // e.g. xhr.response

pbfParser.parse({
    buffer: buf,
...

A third alternative is to let the user choose a local file using the HTML5 File API, passing the file object as "file" option:

<input type="file" id="file" accept=".pbf">
<script>
    document.getElementById("file").addEventListener("change", parse, false);

    function parse(evt) {
        var file = evt.target.files[0];

        pbfParser.parse({
            file: file,
        ...

Build

Build or update the browser bundle osm-read-pbf.js with browserify:

$ npm run browserify

To install browserify (http://browserify.org/):

$ npm install -g browserify

Parse OSM XML from URL Example

Currently you can only parse OSM data in XML from URLs. Here's an example:

osmread.parse({
    url: 'http://overpass-api.de/api/interpreter?data=node(51.93315273540566%2C7.567176818847656%2C52.000418429293326%2C7.687854766845703)%5Bhighway%3Dtraffic_signals%5D%3Bout%3B',
    format: 'xml',
    endDocument: function(){
        console.log('document end');
    },
    bounds: function(bounds){
        console.log('bounds: ' + JSON.stringify(bounds));
    },
    node: function(node){
        console.log('node: ' + JSON.stringify(node));
    },
    way: function(way){
        console.log('way: ' + JSON.stringify(way));
    },
    relation: function(relation){
        console.log('relation: ' + JSON.stringify(relation));
    },
    error: function(msg){
        console.log('error: ' + msg);
    }
});

PBF random access parser

The following code allows to create a random access openstreetmap PBF file parser:

osmread.createPbfParser({
    filePath: 'path/to/osm.pbf',
    callback: function(err, parser){
        var headers;

        if(err){
            // TODO handle error
        }

        headers = parser.findFileBlocksByBlobType('OSMHeader');

        parser.readBlock(headers[0], function(err, block){
            console.log('header block');
            console.log(block);

            parser.close(function(err){
                if(err){
                    // TODO handle error
                }
            });
        });
    }
});

Don't forget to close the parser after usage!

Version Upgrade Guide

Sometimes APIs change... they break your code but things get easier for the rest of us. I'm sorry if a version upgrade gives you some extra hours. To makes things a little less painfull you can find migration instructions in the file ChangeLog.

TODOs

XML parser:

parse timestamps

License

See file COPYING for details.

Contact

author: Markus Peröbner [email protected]

osm-read's People

Contributors

Stargazers

Watchers

osm-read's Issues

TypeError: Cannot read property 'position' of undefined

I have tried to run the simple code on a file stored here: http://planet.openstreetmap.nl/benelux/

I run the app using 'node app.js' and, after a few 2,5 seconds the current error message appears.

Do you have any idea what should I do?

Thanks in advance for your help

Error message

node_modules/osm-read/lib/pbfParser.js:419
            return readPBFElement(fd, fileBlock.blobHeader.position, fileBlock
                                                          ^
TypeError: Cannot read property 'position' of undefined
    at readBlob (/Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:419:59)
    at Object.readBlock (/Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:423:20)
    at Object.osmread.createPbfParser.callback (/Users/aboujraf/git/bitbucket/openstreetmap/app/app.js:13:16)
    at /Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:463:25
    at /Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:442:16
    at /Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:132:28
    at /Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:101:20
    at readPBFElementFromBuffer (/Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:63:12)
    at /Users/aboujraf/git/bitbucket/openstreetmap/node_modules/osm-read/lib/pbfParser.js:80:16
    at Object.wrapper [as oncomplete] (fs.js:454:17)
[1]+  Done                    clear

source code: app.js

'use strict';

var osmread = require('osm-read');

    osmread.createPbfParser({
    filePath: '/planet-benelux-131006.osm.pbf',
    callback: function(err, parser){
        if(err){
            // TODO handle error
        }

        parser.readBlock(parser.findFileBlocksByBlobType('OSMHeader'), function(err, block){
            console.log('header block');
            console.log(block);

            parser.close(function(err){
                if(err){
                    // TODO handle error
                }
            });
        });
    }
});

Would exposing an async iterator API suit this library?

I have used the random access features in osm-read to make a different way to iterate through records in the osm.pbf file. The problem I was having before was that since reading this OSM was faster than inserting it into SQLite, the reading got ahead of writing, and pause was not working (as I expected whereby it would immediately pause the output). I decided to implement an asynchronous iterator interface, not within the codebase of osm-read but using its API.

The code I use to iterate objects is as follows:

let c = 0;
let type_counts = {};
for await (const item of reader.objects) {
    //console.log(item);
    const {type} = item;
    if (!type_counts[type]) {
        type_counts[type] = 1
    } else {
        type_counts[type]++;
    }
    c++;
}
console.log('c', c);
console.log('type_counts', type_counts);

For guernsey-and-jersey I get the following output:

c 513686
type_counts { node: 461240, way: 51971, relation: 475 }

The code I have here is concise, will run quickly when the results are requested quickly, but will also run as slow as needed when the result processing takes more time.

Is this a feature that's worth incorporating into the library?

I would like to coordinate with @marook regarding including this and possibly other features in osm-read.

Relation members output format

The relation members format introduced in #17 uses a separate array for each member type:
relationsMembers = { nodes: [], ways: [] };

The problem is, that the members of a relation are an ordered list, regardless of type, that cannot be reconstructed from this output. That is, this output format loses information.

Therefore I suggest to use a single members array where each element contains a type property with one of node, way or relation as value. This is what other libs use as well: openstreetmap-json-schema (example), Overpass API JSON, osmtogeojson.

What do you think?

Allocation failed - process out of memory

Hi,

I'd like to use this module to gather data from the pbf file extract for europe (http://download.geofabrik.de/europe-latest.osm.pbf, 20Go), but the process quickly run out of memory and fail.
I tried to increase the memory available for nodejs to 8Go (--max-old-space-size=8192) but it fails anyway. Is there a way to parse large files with this module?

Here is the output:

<--- Last few GCs --->

  264433 ms: Scavenge 8195.2 (8358.6) -> 8195.2 (8358.6) MB, 8.8 / 0 ms (+ 1.8 ms in 1 steps since last GC) [allocation failure] [incremental marking delaying mark-sweep].
  270716 ms: Mark-sweep 8195.2 (8358.6) -> 8189.8 (8354.6) MB, 6283.4 / 0 ms (+ 2.5 ms in 2 steps since start of marking, biggest step 1.8 ms) [last resort gc].
  276850 ms: Mark-sweep 8189.8 (8354.6) -> 8190.2 (8358.6) MB, 6134.3 / 0 ms [last resort gc].


<--- JS stacktrace --->

==== JS stack trace =========================================

Security context: 0x12b629ab4629 <JS Object>
    1: node [/osmread-js/pbfTest.js:~24] [pc=0x17d45f2c517c] (this=0x2360fd08bf11 <an Object with map 0xbe1c1a7d6d9>,node=0x179a95efcaf9 <an Object with map 0xbe1c1a6a8b1>)
    2: visitPrimitiveGroup(aka visitPrimitiveGroup) [/osmread-js/lib/pbfParser.js:~158] [pc=0x17d45f2f64d7] (this=0x12b629a041b9 <undefined>,pg=0x2355483166a1 <JS Object>,o...

FATAL ERROR: CALL_AND_RETRY_LAST Allocation failed - process out of memory
Abandon (core dumped)

nodejs version: 4.2.6
cmd: nodejs --max-old-space-size=8192 ./pbfTest.js

Thanks.

Module not found: Error: Can't resolve '../../node_modules/zlibjs/bin/inflate.min.js'

How do I fix this issue? I want to use this lib in Angular via web-pack

Invalid typed array length in arrayBufferReader Line 12

Hi,
I want to parse a pbf generated by my geoserver and get a "Invalid typed array length in arrayBufferReader Line 12" the size variable in line 11 is 445776650.

I tried the same with a Mapbox pbf -> same issue.

Any ideas?

Pause/Resume

Hey Markus,

As we discussed briefly it would be advantageous for me if we could introduce a pause/resume feature to osm-read.

The use-case for this is (for instance) when a consuming service (such as a database) is becoming flooded by requests, you can either buffer those requests in-memory or ask the parser to slow down or stop for a short while.

Buffering in-memory can be problematic when the dataset is very large (ie the planet file) and flood control mechanisms are very important for streaming interfaces.

The way I see it; this can be achieved a couple of different ways:

deferred recursion

Since visitNextBlock is called recursively:

when pause() is called, recursion is stopped
when resume() is called, recursion is started again

explicit `next()`

The consuming service must call next() otherwise the iterator will not advance.

node: function(node,next){
    console.log('node: ' + JSON.stringify(node));
    next(); // this triggers the next recursion
}

Either way we will need to add pause() and resume() methods to the public API.

I'll leave this issue open so we can discuss it further.

Get tag "center" from overpass-api response xml

I have the following result after the call to overpass-api (for output 'way' i use 'out center;' command):

<way id="43989209">
    <center lat="68.9280397" lon="33.1139458"/>
    <nd ref="559363044"/>
    <nd ref="559362513"/>
    <nd ref="559362515"/>
    <nd ref="559362512"/>
    <nd ref="559363044"/>
    <tag k="addr:city" v="Мурманск"/>
    <tag k="addr:housenumber" v="110"/>
    <tag k="addr:street" v="Кольский проспект"/>
    <tag k="building" v="yes"/>
    <tag k="name" v="Олимп Авто"/>
    <tag k="shop" v="car"/>
    <tag k="website" v="http://olimp-avto.lada.ru/"/>
  </way>

Unfortunately, callback 'way' is not getting this tag... You may suggest a solution to this problem?

Add browser support to PBF parser

As discussed in #3, support for using pbfParser.js in the browser will be added by separating Node.js dependencies into an abstraction layer. This will be done in step-by-step pull requests, collected in this issue:

Cannot read property 'byteLength' of undefined

I'm plagued by this error when dealing with large pbf files.
...tried using the browser zlib and buffer but got a similar error.

The pbf files I'm using are the osm planet file ~25GB and the geofabrik continent files ~15GB.
I don't get any errors when using the mapzen metro extracts.

Any idea what might be causing this @marook?

path/osm-read/lib/nodejs/buffer.js:38
    for(offset = 0; offset < from.byteLength - 1; ++offset){
                                 ^
TypeError: Cannot read property 'byteLength' of undefined
    at Object.blobDataToBuffer (path/osm-read/lib/nodejs/buffer.js:38:34)
    at Object.inflateBlob (path/osm-read/lib/nodejs/zlib.js:5:22)
    at path/osm-read/lib/pbfParser.js:480:22
    at Object.readPBFElementFromBuffer (path/osm-read/lib/nodejs/buffer.js:15:12)
    at path/osm-read/lib/nodejs/fsReader.js:36:20
    at Object.wrapper [as oncomplete] (fs.js:454:17)

lat lon are not in WSG84 format

e.g.
node: {"id":"148133746","lat":0.0038205,"lon":-0.0039445,"tags":{},"version":4,"changeset":19425811,"uid":"741163","user":"JaLooNz"}

number instead of string for id, uid and ref?

Is there a reason why id, uid and way node/relation member id refs are returned as string instead of number?

The PBF Format defines them all as numbers:

int64 id
int32 uid
sint64 refs
sint64 memids

I think memory usage could be reduced by using numbers instead of strings.

pbfParser in node.js and "TypeError: Invalid non-string/buffer chunk"

Has anyone been able to use osm-read-pbf from node.js? By using the same code I was running successfully in the browser I get a "TypeError: Invalid non-string/buffer chunk" before any nodes, ways or relations are read (ie probably right after it starts reading the pbf file). If needed I can upload the pbf file I'm using but I'm guessing the problem is already there with example/test.pbf.

Tags unresolved

When I run the example/pbf.html the tags of any object is not computed. seem to stay in bytes.
Seems the same for user

{
  "id": "275452090",
  "lat": 51.5075933,
  "lon": -0.1076186,
  "tags": {
    "110,97,109,101": "74,97,109,39,115,32,83,97,110,100,119,105,99,104,32,66,97,114",
    "97,109,101,110,105,116,121": "99,97,102,101"
  },
  "version": 3,
  "timestamp": 1256818475000,
  "changeset": 2980587,
  "uid": "1697",
  "user": "110,105,99,107,98"
}

Vulnerability in protobuf dependencies

This seems to be already work in progress given a1530bf. Would you please post an update here when the fixed version gets published to npm?

Parser getting stuck

Is it possible to do any progress indication

I cant find the way to output something like 'processed 34534 of 998798 blocks (15%)'

primitivegroup.length - Is it always 1?

So far I have found primitivegroup to always be an array with a single item. Is this always the case?

Why not store IDs as BigInt?

Once I obtain the records containing string ids (including the referenced nodes in the ways) I create new BigInt objects to replace their string representations.

Has any consideration been made of parsing them into BigInt values within osm-read?

Perhaps it would be a useful option to have if it were not to be done by default. Making it an option would avoid breaking changes for those who expect string values.

Too Much Recursion

Hi,

I've been playing with the PBF parser for reading OSM data into a (experimental) web app. However, reading large files results in a "too much recursion error"

too much recursion osm-read-pbf.js:2269

This happens, for example, with the South Yorkshire PBF available here.

Is it possible to fix this, or is it an intrinsic problem with trying to read PBF in Javascript :)

Relations

Hi,
Is there a reason why you don't handle OSM relations?

It's a big issue, since some buildings are created with ways and some others with relations.

Example pbf.html is Broken

It is not possible to run the example/pbf.html.

I followed the instructions and run npm run browserify to create the file osm-read-pbf.js. When opening the example HTML it will throw the following errors:

Uncaught Error: Cannot find module 'bytebuffer' osm-read-pbf.js:1
Uncaught ReferenceError: pbfParser is not defined pbf.html:18
GET http://localhost:8080/inflate.min.js.map 404 (Not Found)