Git Product home page Git Product logo

osm-taxonomy's Introduction

Automatically Constructing a Geospatial Feature Taxonomy from OpenStreetMap Data

The osm-taxonomy tool enables the automatic generation of a lightweight and structured taxonomy for geographic features using OpenStreetMap (OSM) data. It leverages innovative algorithms and techniques to analyze OSM datasets and extract hierarchical relationships between tags, providing a comprehensive framework for categorizing and classifying various types of geospatial features. This tool streamlines the taxonomy construction process, addressing the limitations of unstructured tags, and offering a valuable resource for data organization, analysis, and understanding in the field of geospatial research.

Install requirements:

pip install -e .

Usage

usage: generate_taxonomy.py [-h] --input INPUT [--output OUTPUT] [--threshold THRESHOLD] [--blacklist BLACKLIST]

Automatically construct a lightweight taxonomy for geographic features using OpenStreetMap (OSM) data.

optional arguments:
  -h, --help            show this help message and exit
  --input INPUT         OSM dump (xml) input filename.
  --output OUTPUT       Taxonomy tree (json) filename.
  --threshold THRESHOLD
                        Minimum frequency threshold per tag.
  --blacklist BLACKLIST
                        (txt) file with tags to ignore (one per line, as seen on OSM).

Example

$ python generate_taxonomy.py --input data/osm_example.osm --threshold 10 --output example_tree.json --blacklist data/blacklist_example.txt
Loading additional terms to ignore (from file data/blacklist_example.txt)...
  the following tags will be ignored: [...]
Loading OSM (xml) file data/osm_example.osm...
   100% |████████████████████████████████|   37.55MB/s eta 00:00:00
Setting minimum threshold=10...
   100% |████████████████████████████████|   1865/1865 [00:00<00:00, 909367.24it/s]
Generating taxonomy tree...
node industrial --> [industrial__landuse (under landuse) & industrial__building (under building)]
removed node industrial
****************************************************************************************************
...
├── building
│   ├── house
│   └── industrial__building
├── highway
│   ├── residential
│   ├── service
│   │   ├── driveway
│   │   └── parking_aisle
├── landuse
│   └── industrial__landuse
├── natural
│   ├── coastline
│   └── tree
...
Saving to file: example_tree.json

Cite this work

If you would like to cite this work in a paper or a presentation, the following is recommended (BibTeX entry):

@inproceedings{shbita2024automatically,
  title={Automatically Constructing Geospatial Feature Taxonomies from OpenStreetMap Data},
  author={Shbita, Basel and Knoblock, Craig A},
  booktitle={2024 IEEE 18th International Conference on Semantic Computing (ICSC)},
  pages={208--211},
  year={2024},
  organization={IEEE}
}

License

This repository is licensed under the MIT License.

Other Repository Contents

This repository includes the following files:

  • data/osm_example.osm: example OpenStreetMap (OSM) dump file.
  • data/blacklist_example.txt: example text file with terms you can use that can be ignored.
  • data/california_taxonomy.202303.txt: Textual representation of the taxonomy generated from the California .osm dump from March 2023 (based on approximately 10 million tagged instances).
  • data/greece_taxonomy.202303.txt: Textual representation of the taxonomy generated from the Greece .osm snapshot from March 2023 (based on approximately 2 million tagged instances).

osm-taxonomy's People

Contributors

basels avatar

Stargazers

Charlie Robbins avatar Martin Böckling avatar  avatar Gabriel S. Gusmão avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.