Git Product home page Git Product logo

nyc-openstreetmap-project's Introduction

NYC OpenStreetMap

This is the Data Wrangling of NYC OpenStreetMap data. The map area consists of Manhattan, New York, NY, USA. It is my project of Udacity Data Analyst Nanodegree Program Project: Wrangle OpenStreetMap Data.

Introduction

In the report, I will wrangle the OpenStreetMap data of Manhattan, New York, United States.

First, I will audit the dataset to find out if there is any problem within the dataset that needs to be fixed. Next, I will use SQL queries to obtain an overview of the dataset. Last, I will provide some ideas to further improve and analyze the dataset.

Map Area

New York (Manhattan), New York, United States I've obtained a custom map that includes the Manhattan borough of New York City through Mapzen. I have chosen this area because I am living in Jersey City and I use to visit NYC during my weekends. I would like to find out if I will be able to find some interesting facts about the city I love by investigating the OpenStreetMap data.

Overview Statistics of the Dataset

File Size

File Name File Size (MB)
NYC.osm 477.565
manhattan.db 280.372
nodes.csv 169.271
nodes_tags.csv 10.277
ways.csv 21.063
ways_tags.csv 64.331
ways_nodes.csv 54.527

Number of Unique Users: 2163

Number of Nodes: 1884748

Number of Ways: 320048

Number of Subway Stations: 378

Top 10 cuisines

Cuisine Types Num
0 italian 134
1 pizza 88
2 american 85
3 mexican 79
4 chinese 62
5 japanese 46
6 french 44
7 indian 43
8 thai 43
9 burger 42

Top 10 cafes

Cafe Shop Names Num
0 Starbucks 114
1 Dunkin' Donuts 33
2 Starbucks Coffee 24
3 Le Pain Quotidien 13
4 Cafe Grumpy 4
5 Dunkin Donuts 4
6 Piccolo Cafe 4
7 Pinkberry 4
8 Pret a Manger 4
9 The Coffee Bean & Tea Leaf 4

Other Ideas about the Datasets

User Ratings One piece of crucial information missing from the dataset is the ratings of places. By incorporating a node tag with user ratings can help user answer questions such as "What are some of the best restaurants in town?", "Which doctor in my neighborhood should I go to?".

I can think of two ways to gather this rating information:

  1. User contribution. It's easy to implement this, but the problem is the number of active contributing users for our OpenStreetMap data is low, so the ratings will not have a sample size large enough to be representative.
  2. Aggregate from other web sources. This approach can get good ratings information fast and more accurate than the first approach. But the problem is how to get permissions from other sources to provide their rating data, not to mention that those sources are probably OpenStreetMap's direct and indirect competitors.

Conclusion

This analysis of OpenStreetMap Manhattan extract has helped me dig into the problems and inconsistency of the OpenStreetMap data. After cleaning zip codes, address types and phone numbers of this dataset, I imported this dataset into a SQL database for further exploration. I obtained some statistics and answered some questions using SQL queries, but I also found some questions that couldn't be anwsered without incorporating user ratings into our dataset.

I really liked this project, and if all our Udacians can incorporate our cleaned data and other ideas to improve the dataset of OpenStreepMap, I believe it will make OpenStreepMap cleaner and more popular.

nyc-openstreetmap-project's People

Contributors

bkumar080 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.