Git Product home page Git Product logo

whosonfirst-sources's Introduction

whosonfirst-sources

Where things come from in Who's On First.

Click here to see a full list of Who's On First sources.

Adding a new source

  1. Create a new source {SOURCE}.json file using the template file where {SOURCE} should be the same as the prefix property of the source itself.
  2. Fill out all required properties and optional properties, if available.
  3. Run the Makefile using the make all command.

Source Properties

While a source .json file in the whosonfirst-sources repository does not require all properties listed below, the more information we are able to gather about a source, the better. When adding a new source, please provide as much current, available information about that specific source as possible.

  • "id": A unique numeric integer identifier, typically derived from Brooklyn Integers (integer, required property).

  • "fullname": The full name of the source (string, required property).

  • "name": The user-derived, abbrviated name of a given source (string, required property).

  • "prefix": The user-derived, prefix a given source. This value is typically two to ten characters in length (string, required property).

  • "key": A list of data properties used from the source. Optional and typically left empty (string, optional property).

  • "url": An http link to the source, preferably the homepage (string, optional property).

  • "license": A link to the license or terms of service page, if available, for the source (string, optional property).

  • "license_type": The license or equivalent license type for the source's data (string, optional property).

  • "license_text": A one to two sentence description of what the license allows (string, optional property).

  • "license_text_eng": A one to two sentence description of what the license allows, in English. Used when the license_text is non-English (string, optional property).

  • "src:via": A list of sources used by a source. A list of key/value pairs that includes the source context, source name, link to the source website, and a note about the source (list, optional property). See the template file for an example.

  • "usage_concordance": Represents whether or not this source is used for concordance values (integer, required property). 1 value indicates use, 0 value indicates no use, -1 value indicates unsure of use.

  • "usage_property": Represents whether or not this source is used for property values (integer, required property). 1 value indicates use, 0 value indicates no use, -1 value indicates unsure of use.

  • "usage_geometry": Represents whether or not this source is used for geometries (integer, required property). 1 value indicates use, 0 value indicates no use, -1 value indicates unsure of use.

  • "description": A one to two sentence description of the source (string, optional property).

  • "mz:is_current": Represents whether of not a source is currently in use (integer, optional property). 0 signifies "not current".

  • "mz:associated": Represents a source associated with works at Mapzen (integer, optional property). 1 signifies "Mapzen associated".

  • "edtf:deprecated": Indicates the date when a source was determined to be invalid, was never a "going concern" (string, optional property). Format: YYYY-MM-DD (though these dates can be encoded with any valid EDTF syntax).

  • "edtf:inception": Indicates the date when a source was added to Who's On First. (string, required property). Format: YYYY-MM-DD (though these dates can be encoded with any valid EDTF syntax).

See also

whosonfirst-sources's People

Contributors

botsonfirst avatar dphiffer avatar mfogel avatar nvkelso avatar riordan avatar stepps00 avatar thisisaaronland avatar tomtaylor avatar vicchi avatar zbsingleton avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

whosonfirst-sources's Issues

Reconcile WOF license page with sources page

Reconcile the WOF license page with the WOF sources README file. The information on the sources page is machine-generated, via the makefile, and updated each time a new source is added to the sources repo. The WOF license page, though, is a static page and not updated each time a source is added to Who's On First.

Each time a source is added to the whosonfirst-sources repo, it should also be added to the list of sources in the README file.

What is simplegeo?

I cannot determine what the source called "simplegeo" is because their domain name leads to a GoDaddy parking page.

Steps to replicate:

  1. Click on random item in Who's On First (okay, this won't actually work for you but it's how I started)
  2. Arrive on this personal injury attorney venue's page.
  3. Note that its source is called "simplegeo" and that its source page is on the simplegeo.com domain.
  4. Find a GoDaddy domain parking page.

screenshot 2015-11-28 02 47 16

Indicate if source uses geometry, properties, &/or concordances

Sometimes we include a source just for concordance IDs because the license is otherwise locked down for geometry &/or properties. But right now that's not documented, and the license may make it seem like we shouldn't use it (but concordances alone is fine).

Add a hasc.json source file

That way it can be imported in to py-mapzen-whosonfirst-sources which is used to by the Spelunker to expand prefixes in to somethings humans can understand.

screen shot 2016-11-30 at 11 09 58

Add LocalWiki?

I haven't researched how projects are included here, but it seems LocalWiki would be a good addition to this database. It contains ~100k pages, most of which are places, and almost all of which aren't on Wikipedia. All map data is ODbL and written content & images is CC-BY. We don't have database dumps at the moment, but it's easy to paginate through the API and get whatever you want (e.g. https://localwiki.org/api/v4/maps/)

Standardize properties and values

When preparing a license table for #60, I noticed a few issues in the source json files. File a PR to fix the following:

  • Change any "assumed" to "(assumed)" in the license_type property
  • Change any "equivalent" to "(equivalent)" in the license_type property
  • Add a license_type and license_text props/values to amsgis.json, figov.json, companieshouse.json
  • Update the license_type property in can-vicodc.json (remove bunk dash)
  • Update any "CC-BY" license to "CC BY"
  • We have both "ODC-By v1.0" and "ODC-By, v1.0" license types, prefer "ODC-By, v1.0"
  • Standardize the OGL licenses - we have a mix of commas, dashes, and "license" vs "licence"

Add source information for upcoming concordance additions

Add source files for any concordances that we decide to import from the upcoming Wikidata work.
Only add them if we decide to import concordances and licensing permits.

List of existing concordance sources:

  • Deutsche National Bank
  • Virtual International Authority
  • Historical Dictionary of Switzerland
  • National Library of Israel
  • STW Thesaurus for Economics
  • Quora
  • Getty Thesaurus of Geographic Names
  • Bibliothèque Nationale de France
  • National Diet Library of Japan
  • International Standard Name Identifier
  • French Collaborative Library Catalog
  • MusicBrainz
  • NUTS
  • National Library of Australia
  • Facebook Places
  • WorldCat's “FAST Linked Data” Authority
  • General Finnish Upper Ontology YSO

Create records for all sources during for the mesoshapes import

The recent import of admin1 and admin2 features included sources from AOTM, as well as additional authoritative sources. Scrub all "meso:source" property values in countries that had new meso features added and create new source files, as needed.

Example:

The county feature of Samba has a "meso:source" value of "EDP", though no EDP.json file is currently in the sources folder.

Add sources used in new Statoids concordance work

Ensure that we have each source catalogued correctly. See this PR for the new concordance values:

whosonfirst-data/whosonfirst-data#893

Notes:
ITU: Codes assigned by the International Telecommunications Union
GEC: Codes from the U.S. standard GEC
IOC: Codes assigned by the International Olympics Committee . These codes identify the nationality of athletes and teams during Olympic events.
FIFA: Codes assigned by the Fédération Internationale de Football Association
DS: Distinguishing signs of vehicles in international traffic (oval bumper sticker codes)
WMO: Country abbreviations used in weather reports from the World Meteorological Organization
GAUL: Global Administrative Unit Layers from the Food and Agriculture Organization
MARC: MAchine-Readable Cataloging codes from the Library of Congress
Dial: Country code from ITU-T recommendation E.164 (international dialing code), sometimes followed by area code

Deprecate LocalWiki source

This wasn't actually imported into Who's On First and it's license is causes confusion. Let's mark it as deprecated.

Standardize on unknown instead of missing

We've used two different "sources" to indicate missing &/or unknown information in Who's On First. We should standardize on just one, I propose keeping unknown and marking missing deprecated with a related PR in the data repo to toggle them around.

It would also be helpful to report how many records (count and as % of project) and what type of properties they're associated with.

Related: #65.

Updates to urbis-adm source license and desc

For this source:

https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/README.md#urbis-adm

  • Set license_text to: Realized by means of Brussels UrbIS®© - Distribution & Copyright CIRB
  • Set license_type to: CC-BY.
  • Modify the description to emphasize that This source excludes data from UrbIS-P&B, which is governed by a different license.

Research:

3.4.1. Rights, licence and copyright
Since 1 April 2013, access to UrbIS products has not only been free, their use is now subject to
an Open Data licence.
Access to the UrbIS-P&B product is strictly limited to the administrations of the Brussels-Capital
Region. However, these data may be obtained under certain conditions. Access authorisation may
be granted by the General Administrator of Heritage Documentation in response to a reasoned
request submitted to him or her.
The user agrees to include the BRIC logo (downloadable from the UrbIS-Solution pages of the
BRIC website) as well as the following message in any information, application programmes or
third-party product it is authorised to transmit to a third party, regardless of the type of carrier
used to transmit the data:
« Réalisé avec Brussels UrbIS®© - Distribution & Copyright CIRB »
or
« Verwezenlijkt door middel van Brussels UrbIS®© - Verdeling & Copyright CIBG »
or
« Realized by means of Brussels UrbIS®© - Distribution & Copyright CIRB »

Recaste Mapzen as Metazen

Keep the prefix as mz but rename the fullname to Metazen. Long live Mapzen!

    "description": "Mapzen is an open, sustainable, and accessible mapping platform. Our tools let you display, search, and navigate your world.", 
    "fullname": "Mapzen", 

Then also update the description to say was instead of `is.

Metazen is the generic WOFism for property prefixes related to Mapzen and original data sourced via Mapzen. Mapzen was an open, sustainable, and accessible mapping platform.

x-publish generated sources README content also to whosonfirst.org

A condensed version of https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/README.md should probably go here: https://whosonfirst.org/docs/licenses/. Probably requires work in this repo and the www repo.

Proposed condensed text:

Where we link to the sources repo README into for that full source name, list the license type, list the usages, and provide link back to original data (whatever is in our src json, which is often a general landing page and sometimes a specific download page).

WOE DB airport campuses credit

Looks like the geometries are coming from Alphashapes + Flickr to me. We should indicate that in the sources src:via tag.

Indicate original source of data (and via what aggregator)

Right now we have data from Quattroshapes which is actually originates from multiple difference sources. Each source needs to be credited, so we need a consistent WOF property to deal with this.

I propose a new property like src:via (was src_via originally) where the src should state the original source, and then we should credit the data aggregator in src:via as well.

Examples:

  • Quattroshapes:
    • The city of San Francisco has a "qs:source" value of "AUS Census" (should just be US Census, oops) and "src:geom" of quattroshapes.
    • Propose that the "src:geom" should be uscensus instead, with "src_via" set to quattroshapes
  • Mesoshapes:
    • The county feature of Samba has a "meso:source" value of "EDP", though no EDP.json file is currently in the sources folder.
    • Propose that the "src:geom" should be eep instead, with "src_via" set to meso

Add better credit text for Flickr

For qs, ys, and zs we list a src:via for Flickr. In the description property for that item we need to add:

This product uses the Flickr API but is not endorsed or certified by Flickr.

Update hasc source file

Update the hasc source file to include a note about the hasc:id concordance value being a "variable" property mapping.

Depending on the placetype, Statiods will either call this the "hasc" code (admin0/admin1) or the "statoid" code (admin2).

Add 'source_data' and 'alt' properties

Like the uscensus.json source file, go through each source file and determine if more than one data file was used from that source. If so, do the following (if necessary):

  • Add a source_data property that lists out the direct URL for each data file.
  • Add an alt property that lists what extras and functions are being used from that source, if being used for alt geometries.

Look at the uscensus file for an example.

Source count followup

We say 146 sources now, but that doesn't include our "source via" sources.

Let's add language and stats that says something like "X total sources (146 direct and X secondary sources)" (eg for Mesoshapes, etc)

Port over license info from whosonfirst-data/LICENSE.md

We note the equivalent license "code" (e.g.: CC-BY) and provide a copy of the actual license text in the whosonfirst-data/LICENSE.md file. Let's move that info here (and point to this repo instead of duplicating the info).

For instance:

And for GeoNames:

GeoNames database is licenced as follows:

The GeoNames geographical database is available for download free of
charge under a creative commons attribution license. 

Follow-up standardizing, consistent info with wof-data records

  • Source is named ar-caba, but WOF uses arg-caba in records.
  • Untangle:
    • mz vs mapzen vs whosonfirst (not to be confused with wof:)
    • qs_pg vs quattroshapes_pg
    • gn vs geonames
    • tmpgis vs tmpgov
  • Source is incorrectly named ico.json, should be ioc.json
  • Add SIJ source, or use it as src:via for meso and update wof-data records
  • Add URLs to any src:via properties

Add license table

Add license table for CC equivalences and allowed / disallowed use-cases. Maybe this needs new properties in the source JSONs, too so it can be auto generated like the big README.md.

Example:

license type license equivalent details
OGL, british colombia CC0 you can do:...
OGL Flanders CC0 you can do:...
CC0 CC0 you can do:…
Public Domain CC0 you can do:…
CC-BY CC-BY you can do:...
restricted concordance only you can only...

Create new Statoids source

Add email permission statement from Gwillam Law to the hasc.json file:

Yes. As far as I'm concerned, HASC codes are in the public domain - to encourage people or organizations to use them for data communication.

Zetashapes providing some shady-af geometries

I appreciate that neighborhood-level gazetteer information that's not restrictively-licensed (scowling at you, Zillow) is hard to come by, but I've come across some examples in Zetashapes that throw the entire credibility of the scale level into question. I might not have noticed at all if whosonfirst didn't claim that this monstrosity is my home neighborhood:

screenshot 2016-05-08 20 37 50

Zetashapes does not appear to be accepting edits. @blackmad is a busy guy, so I don't have a lot of hope for the todo list getting addressed. Is there any downside to Mapzen adopting the zetashapes backend, at least insofar as allowing users to fix nonsense like this?

Don't put Markdown in description fields

This breaks code that needs to load the spec dynamically. For example:

$> cd go-whosonfirst-sources                                                                                                                     $>make test
if test -d pkg; then rm -rf pkg; fi
if test -d src/github.com/whosonfirst/go-whosonfirst-sources; then rm -rf src/github.com/whosonfirst/go-whosonfirst-sources; fi
mkdir -p src/github.com/whosonfirst/go-whosonfirst-sources/sources
cp sources/*.go src/github.com/whosonfirst/go-whosonfirst-sources/sources
cp *.go src/github.com/whosonfirst/go-whosonfirst-sources/
# github.com/whosonfirst/go-whosonfirst-sources/sources
src/github.com/whosonfirst/go-whosonfirst-sources/sources/spec.go:6: syntax error: unexpected re after top level declaration
src/github.com/whosonfirst/go-whosonfirst-sources/sources/spec.go:6: missing '

This is caused by typos like "...This means that youre free..." and stuff like "...and three-character feature code with a -deliminator (likeCC-az1`)..."

Also, look at the difference between the raw text of the example above and the way it gets formatted by GitHub.

The values of the .json files should be plain text with no additional (or specific) formatting.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.