Git Product home page Git Product logo

roaddetections's Introduction

Introduction

Bing Maps is releasing mined roads around the world. We have detected 48.9M km of all roads and 1165K km of roads missing from OSM. Mining is performed with Bing Maps imagery between 2020 and 2022 including Maxar and Airbus. The data is freely available for download and use under the Open Data Commons Open Database License (ODbL).

Data

Mining status

Date All ML derived Roads ML derived Roads missing from OSM
Region Length in '000 Km Region Length in '000 Km
20 May 2020 United States 9,308 United States 818
21 Mar 2021 South America 4,480 South America 98
21 Jan 2022 Caribbean Islands 232 Caribbean Islands 5
03 Mar 2022 Middle East 3,444 Middle East 84
05 Apr 2022 Central Asia 1,204 Central Asia 28
18 Apr 2022 Northern Africa 1,077 Northern Africa 24
28 Apr 2022 Western Africa 982 Western Africa 32
28 Apr 2022 Central Africa 324 Central Africa 6
12 May 2022 Eastern Africa 1,151 Eastern Africa 31
12 May 2022 Southern Africa 1,506 Southern Africa 40
08 Jun 2022 Europe 10,212 N/A N/A
03 Jul 2022 Oceania 1,947 N/A N/A
27 Jul 2022 Central America 1,376 N/A N/A
03 Aug 2022 Canada 1,832 N/A N/A
13 Aug 2022 South Asia 3,723 N/A N/A
12 Sep 2022 Southeastern Asia 2,744 N/A N/A
19 Sep 2022 North Asia 2,259 N/A N/A
27 Feb 2023 Japan 1,105 N/A N/A

FAQ

What is the GeoJson format?

GeoJSON is a format for encoding a variety of geographic data structures. For Intensive Documentation and Tutorials, Refer to GeoJson Blog

Data generation details:

The road extraction is done in four stages (full drop went through two stages and OSM missing set went through all four):

  1. Semantic Segmentation – Recognizing road pixels on the aerial image using Convolutional Neural Network (CNN).
  2. Geometry Generation - A series of algorithms and processes transforming output of semantic segmentation into roads in geometry format.
    • Image postprocessing
    • Thinning
    • Connectivity improvement
    • Graph construction
    • Finalizing road shapes and network quality
    • Stiching road geojsons between neighboring images where needed
  3. Conflation & Cutting - Excluding roads and parts of roads that already exist in the road network (OSM).
  4. Classification - A classifier to filter out low-confidence roads and predict a road type.

Neural network architecture and dataset

Our network was based on UNet and ResNet and the following papers [U-Net] (https://arxiv.org/abs/1505.04597), [Res U-Net] (https://arxiv.org/pdf/1512.03385.pdf), [Res U-Net] (https://arxiv.org/pdf/1711.10684.pdf). The model was trained on 512x512 images, it is fully-convolutional, which allows images of any size (that is divisable by 64) be processed by the model (constrained by GPU memory, 1088x1088 in our case). The training set consists of 20000 labeled images. Majority of the satellite images cover diverse areas all around the world. To achieve a good set representation, we have enriched the set with samples from various areas covering mountains, glaciers, forests, deserts, beaches, coasts, etc. Images in the set are of 1088x1088 pixel size with 100 cm/pixel resolution. The training is done with Keras toolkit.

Metrics

We measure intermediate stage metrics to track performance of our models. Pixel metric measures performance of the the Convolutional Neural Network and APLS metric (Average Path Length Similarity) measures overall connectivity after geometry generation stage.

Metric Precision Recall
Pixel 85.24% 82.81%
APLS 87.53% 79.33%

Data Vintage

The vintage of the roads depends on the vintage of the underlying imagery. Because Bing Imagery is a composite of multiple sources it is difficult to know the exact dates for individual pieces of data.

How good is the data?

The Osm Missing Data went through a final classifier to ensure that the precision is at least 95% (90% for USA now - to be updated to 95% in 2022). After classifier filters out potentially bad roads we remeasure the precision and make sure that it is 95% before releasing results

Why is the data being released?

Microsoft has a continued interest in supporting a thriving OpenStreetMap ecosystem.

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.opensource.microsoft.com.

When you submit a pull request, a CLA bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., status check, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact [email protected] with any additional questions or comments.

Legal Notices

Microsoft, Windows, Microsoft Azure and/or other Microsoft products and services referenced in the documentation may be either trademarks or registered trademarks of Microsoft in the United States and/or other countries. The licenses for this project do not grant you rights to use any Microsoft names, logos, or trademarks. Microsoft's general trademark guidelines can be found at http://go.microsoft.com/fwlink/?LinkID=254653.

Privacy information can be found at https://privacy.microsoft.com/en-us/

Microsoft and any contributors reserve all other rights, whether under their respective copyrights, patents, or trademarks, whether by implication, estoppel or otherwise.

roaddetections's People

Contributors

microsoft-github-operations[bot] avatar microsoftopensource avatar missingroadsdiscoverymicrosoft avatar usmissingroadsdiscovery avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

roaddetections's Issues

Consider partitioning countries at the file level rather than marking countries in a TSV row

Oceania-Full.zip is 282 MB at the moment. If its GeoJSON file was partitioned by country and sorted the ZIP file would be 244 MB instead. This would allow people to download the ZIP file faster. They would also use less space picking out the countries they're interested in. The GeoJSON would open right away in QGIS and other GIS software without first needing to ETL the TSV.

$ vi a.sh
sort AUS.geojson > AUS.sorted.geojson
sort NZL.geojson > NZL.sorted.geojson
sort PNG.geojson > PNG.sorted.geojson
sort VUT.geojson > VUT.sorted.geojson
sort FJI.geojson > FJI.sorted.geojson
sort SLB.geojson > SLB.sorted.geojson
sort TON.geojson > TON.sorted.geojson
sort WSM.geojson > WSM.sorted.geojson
sort FSM.geojson > FSM.sorted.geojson
sort KIR.geojson > KIR.sorted.geojson
sort PLW.geojson > PLW.sorted.geojson
sort MHL.geojson > MHL.sorted.geojson
sort TUV.geojson > TUV.sorted.geojson
sort NRU.geojson > NRU.sorted.geojson
$ cat a.sh | xargs -n1 -P4 -I% bash -xc '%'
$ zip -9 Oceania.sorted.zip \
    AUS.sorted.geojson \
    NZL.sorted.geojson \
    PNG.sorted.geojson \
    VUT.sorted.geojson \
    FJI.sorted.geojson \
    SLB.sorted.geojson \
    TON.sorted.geojson \
    WSM.sorted.geojson \
    FSM.sorted.geojson \
    KIR.sorted.geojson \
    PLW.sorted.geojson \
    MHL.sorted.geojson \
    TUV.sorted.geojson \
    NRU.sorted.geojson

$ unzip -l Oceania.sorted.zip
Archive:  Oceania.sorted.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
1071521607  2023-04-10 18:58   AUS.sorted.geojson
185466598  2023-04-10 18:57   NZL.sorted.geojson
 28007237  2023-04-10 18:57   PNG.sorted.geojson
  6470562  2023-04-10 18:57   VUT.sorted.geojson
  5832797  2023-04-10 18:57   FJI.sorted.geojson
  4423195  2023-04-10 18:57   SLB.sorted.geojson
  1047604  2023-04-10 18:57   TON.sorted.geojson
  1066450  2023-04-10 18:57   WSM.sorted.geojson
   307308  2023-04-10 18:57   FSM.sorted.geojson
   190892  2023-04-10 18:57   KIR.sorted.geojson
   242639  2023-04-10 18:57   PLW.sorted.geojson
   119872  2023-04-10 18:57   MHL.sorted.geojson
    44300  2023-04-10 18:57   TUV.sorted.geojson
    38006  2023-04-10 18:57   NRU.sorted.geojson
---------                     -------
1304779067                     14 files
$ unzip Oceania.sorted.zip NZL.sorted.geojson

For some of the largest datasets, like Canada and Japan, the 3-letter country identifier is redundant since every record in those ZIPs are for their respective countries.

tsv file issue

Its not a valid geojson file so i cannot open it any GIS software.

Any simpler way to convert downloaded tsv file into valid geojson file ?

Codes for Geometry Generation

Nice work! Would you be kind enough to share the codes for Geometry Generation? I will only use it for academy purpose. Thanks in advance.

geojson issues

I tried to bring into qgis as is, wouldn't open. renamed file extension to .geojson, still wouldn't open.

replaced all lines starting with AIA and trailing spaces with '' and it worked.

I don't know if these dumps are full region coverage as expected though. Can anyone assist?

Road widths

Any chance that the approach used here could generate estimates of additional attributes for road segments? I'm particularly interested in road widths as a starter, important for priortising locations for sustainable transport interventions, e.g. new cycleway protected from potentially fast moving motor traffic with light separation (wands, bollards etc).

Many thanks!

Point Geometry in USA data

The USA data here has the following issue:

  • Unlike all other regions' data, of the 54,484,737 GeoJSON entries in the USA data 204,789 are invalid as they consist of single-value zero-length LineString rather than Point geometry elements which causes issues when trying to process the files

There are then the following inconsistencies when compared to the other region data files

  1. The region file name is USA.zip where all other region files are called <region>-Full.zip, for example the East Africa region is AfricaEast-Full.zip
  2. The file-name in the zip-archive is _USA.tsv where all other region files are called <region>-Full.tsv, for example the East Africa region is AfricaEast-Full.tsv
  3. Unlike all other regions' data, the first column in the USA region TSV file does not contain a three-alpha country code, for example GBR for Great Britain

Outwith this, thank you for making this excellent data set available

Code for geometry generation from semantic segmentation

Is the code available for the process of geometry generation from semantic segmentation? In other applications I use watershed segmentation and other morphological operations to do the same. I'd be interested to experiment with your approach.

Dolt database import of this data

Hi, I work for DoltHub, the maintainers of Dolt, a version controlled database (think Git and MySQL had a baby).
This is shameless self-promotion, but we took the liberty of importing the data into Dolt and posting it on DoltHub.

By importing to Dolt, I was able to address the following issues:

  1. #6
  2. #11

The data is also easier to query, update, and modify.

My dream would be for you all to move this project to DoltHub, but short of that maybe put a link in your README to the Dolt version?

Image segmentation output

Would it be possible to access the semantic segmentation tiffs for specific locations? I would like to use it as an input in predicting neighboring objects in other EO segmentation tasks.

Mexico and China data?

Is the data for Mexico and China to be published? I see North America has been broken down into countries, but there are only files for the USA and Canada.

Border cutoffs not accurate

A lot of roads on the Latvian side of the border with Estonia are marked as Estonian.

ESTLVA

Most of Monaco's roads are marked as French. A very small part of Monaco is marked properly.

FRAMCO

Spanish data cuts off 2KM before Gibraltar leaving almost no data for La Línea de la Concepción.

ESPGIB

Roads well inside of Spain are marked as Portuguese.

PTRESP

Austria, Liechtenstein and Switzerland all have overlapping issues.

CH-AT

There is a stepping pattern from the French side as it overlaps much of western Switzerland.

FR-CH

Luxemburg and its neighbors.

LUX

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.