Git Product home page Git Product logo

documentation's Introduction

A modular, open-source search engine for our world.

Pelias is a geocoder powered completely by open data, available freely to everyone.

Local Installation · Cloud Webservice · Documentation · Community Chat

What is Pelias?
Pelias is a search engine for places worldwide, powered by open data. It turns addresses and place names into geographic coordinates, and turns geographic coordinates into places and addresses. With Pelias, you’re able to turn your users’ place searches into actionable geodata and transform your geodata into real places.

We think open data, open source, and open strategy win over proprietary solutions at any part of the stack and we want to ensure the services we offer are in line with that vision. We believe that an open geocoder improves over the long-term only if the community can incorporate truly representative local knowledge.

Pelias

A modular, open-source geocoder built on top of Elasticsearch for fast and accurate global search.

What's a geocoder do anyway?

Geocoding is the process of taking input text, such as an address or the name of a place, and returning a latitude/longitude location on the Earth's surface for that place.

geocode

... and a reverse geocoder, what's that?

Reverse geocoding is the opposite: returning a list of places near a given latitude/longitude point.

reverse

What are the most interesting features of Pelias?

  • Completely open-source and MIT licensed
  • A powerful data import architecture: Pelias supports many open-data projects out of the box but also works great with private data
  • Support for searching and displaying results in many languages
  • Fast and accurate autocomplete for user-facing geocoding
  • Support for many result types: addresses, venues, cities, countries, and more
  • Modular design, so you don't need to be an expert in everything to make changes
  • Easy installation with minimal external dependencies

What are the main goals of the Pelias project?

  • Provide accurate search results
  • Work equally well for a small city and the entire planet
  • Be highly configurable, so different use cases can be handled easily and efficiently
  • Provide a friendly, welcoming, helpful community that takes input from people all over the world

Where did Pelias come from?

Pelias was created in 2014 as an early project at Mapzen. After Mapzen's shutdown in 2017, Pelias is now part of the Linux Foundation.

How does it work?

Magic! (Just kidding) Like any geocoder, Pelias combines full text search techniques with knowledge of geography to quickly search over many millions of records, each representing some sort of location on Earth.

The Pelias architecture has three main components and several smaller pieces.

A diagram of the Pelias architecture.

Data importers

The importers filter, normalize, and ingest geographic datasets into the Pelias database. Currently there are six officially supported importers:

We are always discussing supporting additional datasets. Pelias users can also write their own importers, for example to import proprietary data into your own instance of Pelias.

Database

The underlying datastore that does most of the query heavy-lifting and powers our search results. We use Elasticsearch. Currently versions 7 and 8 are supported.

We've built a tool called pelias-schema that sets up Elasticsearch indices properly for Pelias.

Frontend services

This is where the actual geocoding process happens, and includes the components that users interact with when performing geocoding queries. The services are:

  • API: The API service defines the Pelias API, and talks to Elasticsearch or other services as needed to perform queries.
  • Placeholder: A service built specifically to capture the relationship between administrative areas (a catch-all term meaning anything like a city, state, country, etc). Elasticsearch does not handle relational data very well, so we built Placeholder specifically to manage this piece.
  • PIP: For reverse geocoding, it's important to be able to perform point-in-polygon(PIP) calculations quickly. The PIP service is is very good at quickly determining which admin area polygons a given point lies in.
  • Libpostal: Pelias uses the libpostal project for parsing addresses using the power of machine learning. We use a Go service built by the Who's on First team to make this happen quickly and efficiently.
  • Interpolation: This service knows all about addresses and streets. With that knowledge, it is able to supplement the known addresses that are stored directly in Elasticsearch and return fairly accurate estimated address results for many more queries than would otherwise be possible.

Dependencies

These are software projects that are not used directly but are used by other components of Pelias.

There are lots of these, but here are some important ones:

  • model: provide a single library for creating documents that fit the Pelias Elasticsearch schema. This is a core component of our flexible importer architecture
  • wof-admin-lookup: A library for performing administrative lookup using point-in-polygon math. Previously included in each of the importers but now only used by the PIP service.
  • query: This is where most of our actual Elasticsearch query generation happens.
  • config: Pelias is very configurable, and all of it is driven from a single JSON file which we call pelias.json. This package provides a library for reading, validating, and working with this configuration. It is used by almost every other Pelias component
  • dbclient: A Node.js stream library for quickly and efficiently importing records into Elasticsearch

Helpful tools

Finally, while not part of Pelias proper, we have built several useful tools for working with and testing Pelias

Notable examples include:

  • acceptance-tests: A Node.js command line tool for testing a full planet build of Pelias and ensuring everything works. Familiarity with this tool is very important for ensuring Pelias is working. It supports all Pelias features and has special facilities for testing autocomplete queries.
  • compare: A web-based tool for comparing different instances of Pelias (for example a production and staging environment). We have a reference instance at pelias.github.io/compare/
  • dashboard: Another web-based tool for providing statistics about the contents of a Pelias Elasticsearch index such as import speed, number of total records, and a breakdown of records of various types.

Documentation

The main documentation lives in the pelias/documentation repository.

Additionally, the README file in each of the component repositories listed above provides more detail on that piece.

Here's an example API response for a reverse geocoding query
$ curl -s "search.mapzen.com/v1/reverse?size=1&point.lat=40.74358294846026&point.lon=-73.99047374725342&api_key={YOUR_API_KEY}" | json
{
    "geocoding": {
        "attribution": "https://search.mapzen.com/v1/attribution",
        "engine": {
            "author": "Mapzen",
            "name": "Pelias",
            "version": "1.0"
        },
        "query": {
            "boundary.circle.lat": 40.74358294846026,
            "boundary.circle.lon": -73.99047374725342,
            "boundary.circle.radius": 500,
            "point.lat": 40.74358294846026,
            "point.lon": -73.99047374725342,
            "private": false,
            "querySize": 1,
            "size": 1
        },
        "timestamp": 1460736907438,
        "version": "0.1"
    },
    "type": "FeatureCollection",
    "features": [
        {
            "geometry": {
                "coordinates": [
                    -73.99051,
                    40.74361
                ],
                "type": "Point"
            },
            "properties": {
                "borough": "Manhattan",
                "borough_gid": "whosonfirst:borough:421205771",
                "confidence": 0.9,
                "country": "United States",
                "country_a": "USA",
                "country_gid": "whosonfirst:country:85633793",
                "county": "New York County",
                "county_gid": "whosonfirst:county:102081863",
                "distance": 0.004,
                "gid": "geonames:venue:9851011",
                "id": "9851011",
                "label": "Arlington, Manhattan, NY, USA",
                "layer": "venue",
                "locality": "New York",
                "locality_gid": "whosonfirst:locality:85977539",
                "name": "Arlington",
                "neighbourhood": "Flatiron District",
                "neighbourhood_gid": "whosonfirst:neighbourhood:85869245",
                "region": "New York",
                "region_a": "NY",
                "region_gid": "whosonfirst:region:85688543",
                "source": "geonames"
            },
            "type": "Feature"
        }
    ],
    "bbox": [
        -73.99051,
        40.74361,
        -73.99051,
        40.74361
    ]
}

How can I install my own instance of Pelias?

To try out Pelias quickly, use our Docker setup. It uses Docker and docker-compose to allow you to quickly set up a Pelias instance for a small area (by default Portland, Oregon) in under 30 minutes.

Do you offer a free geocoding API?

You can sign up for a trial API key at Geocode Earth. A commercial service has been operated by the core development team behind Pelias since 2014 (previously at search.mapzen.com). Discounts and free plans are available for free and open-source software projects.

What's it built with?

Pelias itself (the import pipelines and API) is written in Node.js, which makes it highly accessible for other developers and performant under heavy I/O. It aims to be modular and is distributed across a number of Node packages, each with its own repository under the Pelias GitHub organization.

For a select few components that have performance requirements that Node.js cannot meet, we prefer to write things in Go. A good example of this is the pbf2json tool that quickly converts OSM PBF files to JSON for our OSM importer.

Elasticsearch is our datastore of choice because of its unparalleled full text search functionality, scalability, and sufficiently robust geospatial support.

Contributing

Gitter

We built Pelias as an open source project not just because we believe that users should be able to view and play with the source code of tools they use, but to get the community involved in the project itself.

Especially with a geocoder with global coverage, it's just not possible for a small team to do it alone. We need you.

Anything that we can do to make contributing easier, we want to know about. Feel free to reach out to us via Github, Gitter, email, or Twitter. We'd love to help people get started working on Pelias, especially if you're new to open source or programming in general.

We have a list of Good First Issues for new contributors.

Both this meta-repo and the API service repo are worth looking at, as they're where most issues live. We also welcome reporting issues or suggesting improvements to our documentation.

The current Pelias team can be found on Github as missinglink and orangejulius.

Members emeritus include:

documentation's People

Contributors

abbe98 avatar acaloiaro avatar acondrat avatar annadeu3 avatar bradh avatar dianashk avatar dominikkukacka avatar easherma avatar glennon avatar jeremy-rutman avatar joxit avatar kat09kat09 avatar krizleebear avatar louh avatar migurski avatar mihneadb avatar missinglink avatar msmollin avatar orangejulius avatar pastcompute avatar riordan avatar rmglennon avatar steventwheeler avatar trescube avatar waxpancake avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

documentation's Issues

Use of predicates

Hi!

Can I search with Pelias OSM tag places (ex tourism=camp_site)? I'd like to search for example, near Gas Stations, Campings... to a point.

Thanks in advance!

Specification driven API documentation

In addition to narrative documentation, we should write a specification based documentation, giving maximum and minimums, types, etc to serve as quick documentation and to form the basis for an API explorer.

Search / Autocomplete / UX Guidelines are incorrect

https://mapzen.com/documentation/search/autocomplete/#user-experience-guidelines

There should be more accurate rate limits than a maximum of one or two requests per second. The Mapzen search demo does not throttle at all, and the API seems to support this just fine. If there is a rate limit, it should be lower than 500-1000ms, because that is very high for a typing speed.

Secondly, the requested_at parameter is not returned in the API response. There is a response.geocoding.timestamp attribute, but it does not seem to act in the same manner as requested_at in the docs.

Loading data in the browser without jQuery.

I am trying to load data in the browser as outline in the documentation.

I use xhr requests pretty much every day and never had a problem with the structure.

    let xhr = new XMLHttpRequest();
    xhr.open('GET', 'http://matrix.mapzen.com/isochrone?json={%22locations%22:[{%22lat%22:52.62972886718355,%22lon%22:-1.0107421875000002}],%22costing%22:%22multimodal%22,%22contours%22:[{%22time%22:45}]}&api_key=xxx');
    xhr.setRequestHeader('Content-Type', 'application/json');
    xhr.onload = function () {
        if (xhr.status === 200) {
            console.log(xhr.responseText);
        }
    };
    xhr.send();

Yet, the Mapzen API throws me {"error_code":101,"error":"Try a POST or GET request instead","status_code":405,"status":"Method Not Allowed"}.

image

I do use 'GET', so I am not sure what can be done here. Surely there must be a way to use the API without jQuery or Angular?

Add fancy images for search topic

Putting in a "to do" to update and resize the images in the main search topic (now called search.md). These are large and there's work going on with graphics folks.

See also #11.

Rename this repository to `documentation`

Having the repeated pelias in pelias/pelias-doc feels redundant, doesn't it? It's a bit of a pain to rename repos, but GitHub handles redirects for us, so not too much should break. pelias/documentation feels a lot nicer to me. Thoughts?

Docs show different example results than what you see in the service now

These are my notes as I went through documentation as part of #193.

Some examples in the documentation list the results that are returned from the request. Because these were added ages ago, they may differ from what is returned by the service right now. It's not too big of a deal in most cases and people should expect these to change over time, but the reverse with Tour Eiffel and autocomplete in Union Square are different enough to cause confusion if someone tried to compare the docs with the current results.

place
old href
https://search.mapzen.com/v1/place?
new href
https://mapzen.github.io/search-sandbox/?query=place&

reverse
old href
https://search.mapzen.com/v1/reverse?
new href
https://mapzen.github.io/search-sandbox/?query=reverse&

Tour Eiffel - # 3 (not # 1) result now for https://mapzen.github.io/search-sandbox/?query=reverse&point.lat=48.858268&point.lon=2.294471

search
old href
https://search.mapzen.com/v1/search?
new href
https://mapzen.github.io/search-sandbox/?query=search&

https://mapzen.github.io/search-sandbox/?query=search&text=YMCA gives different top 10 results (and 1 now is in Slovakia)

Every query on page has different top 10 result list

These results are a lot different: https://mapzen.github.io/search-sandbox/?query=search&text=YMCA&sources=oa

structured
old href
http://search.mapzen.com/v1/search/structured?
new href
https://mapzen.github.io/search-sandbox/?query=search/structured&

Note: + in URL is converted straight to + https://mapzen.github.io/search-sandbox/?query=search/structured&address=Rue+de+Rivoli&locality=Paris&region=France (and search is for Rue+de+Rivoli, so changed + to space so they show up correctly in text box)... I recall we used the + originally because of cached results.

so many examples in this section

autocomplete
old href
https://search.mapzen.com/v1/autocomplete?
new href
https://mapzen.github.io/search-sandbox/?query=autocomplete&

Union Square from SF is not returned at all
https://mapzen.github.io/search-sandbox/?query=autocomplete&focus.point.lat=37.7&focus.point.lon=-122.4&text=union square

Now gives all results in NY
https://mapzen.github.io/search-sandbox/?query=autocomplete&focus.point.lat=40.7&focus.point.lon=-73.9&text=union square

Different results with and without focus point (no Malta) https://mapzen.github.io/search-sandbox/?query=autocomplete&focus.point.lat=52.5&focus.point.lon=13.3&text=hard rock cafe

Different results
https://mapzen.github.io/search-sandbox/?query=autocomplete&sources=openaddresses&text=pennsylvania

Different results
https://mapzen.github.io/search-sandbox/?query=autocomplete&text=starbuck&layers=venue

Incorrect list of sources in /reverse documentation

I am not sure whether is an issue of the documentation or the service itself, but:

As of now, when trying to request reverse geocoding using geonames source, I get following message:

/reverse does not support geonames

even though documentation specifies geonames as one of the supported sources.

Whole response:

{
    "geocoding": {
        "version": "0.2",
        "attribution": "https://search.mapzen.com/v1/attribution",
        "query": {
            "layers": [
                "locality",
                "county",
                "region",
                "country"
            ],
            "sources": [
                "geonames"
            ],
            "size": 5,
            "private": false,
            "point.lat": 19.062198992675675,
            "point.lon": 72.86285161972046,
            "boundary.circle.radius": 1,
            "boundary.circle.lat": 19.062198992675675,
            "boundary.circle.lon": 72.86285161972046,
            "lang": {
                "name": "English",
                "iso6391": "en",
                "iso6393": "eng",
                "defaulted": false
            },
            "querySize": 20
        },
        "errors": [
            "/reverse does not support geonames"
        ],
        "engine": {
            "name": "Pelias",
            "author": "Mapzen",
            "version": "1.0"
        },
        "timestamp": 1498680013867
    },
    "type": "FeatureCollection",
    "features": []
}

Document production caching

While it's not a core part of Pelias, the Mapzen Search implementation uses caching to help our performance. There's a few user visible aspects of this and to avoid any confusion we should write a small note about it.

Add section explaining when to use different endpoints

This should be largely non-technical and explain the use cases, not so much the inner workings of, the search, autocomplete and reverse endpoints. Possibly the nearby endpoint as well, although it's not finished and officially supported AFAIK.

Add instructions to start up libpostal + placeholder before starting pelias

Start the API

After installing placeholder, start the service by navigating to the directory you installed it in, and running:

PORT=3000 npm start;

The port number 3000 corresponds to the port you specified in api.services.placeholder.url.

After installing go-whosonfirst-libpostal, start the service by navigating to the directory you installed it in, and running:

./bin/wof-libpostal-server -port 8080

The port number 8080 corresponds to the port you specified in api.services.libpostal.url (defaults to 8080).

Now that the API knows how to connect to Elasticsearch and all other Pelias services, all that is
required to start the API is:

npm start

What does the confidence score represent?

And can you provide some context / examples for interpreting confidence scores? Some results that appear to be dead on share very similar scores in the mid-80s with results 100 miles+ off.

Document nearby endpoint

The nearby endpoint is mysterious and undocumented. It is definitely in an alpha state, but should still be documented as such.

One problem is that I cannot find much in the way of a design document or even the original PR which added the nearby endpoint.

We may have to reverse engineer the functionality from the acceptance tests

Placement of Terminology section

The Terminology section is currently the 4th from last section, where it introduces terms already used. It would be much more helpful to put it in the first few sections.

Document Confidence Scores

Confidence scores are calculated by the Pelias API for individual records, and displayed with results as value from 0 - 1. They are separate from the sorting and scoring done by Elasticsearch and other services, and have specific meanings for different types of queries.

These meanings and the rationale behind confidence scores, and in particular why records returned from Pelias often have confidence scores that do not follow the same sorting order as the records, have caused considerable confusion.

Update, combine and improve dev instance docs

We have the start of some documentation on how to install everything Pelias related for development purposes on one's local machine, but it's a bit scattered. Part is in pelias/pelias, part is in each of the importer docs. It would be awesome to combine it all and brush it up, here in the pelias-doc repo.

Some search query examples are broken in the docs

The documentation has examples of building different queries throughout. Some work and some do not.

On the responses page: https://mapzen.com/documentation/search/response/

This query
https://search.mapzen.com/v1/search?api_key=search-XXXXXXX&text=YMCA&size=25

results in 'search-XXXXXXX' is not a valid key.

Some other queries do work, like this one:
https://search.mapzen.com/v1/reverse?api_key=search-XXXXXXX&point.lat=48.858268&point.lon=2.294471

Perhaps it's a caching thing, where the cached locations work? Either way, let's fix.

Note Geonames Admin Lookup caveat

Geonames is not part of the administrative hierarchy lookup tool (it's full of points representing whole countries, how can that get a point-based lookup??) and therefore is excluded from layer-based lookups.

Note this in the documentation. Perhaps in a Caveats section.

Replace most uses of Mapzen words

This documentation contains numerous references to the Mapzen Search product, the search.mapzen.com domain, and Mapzen-specific concerns like API keys.

I’m working on a geocoder project for OpenAddresses using Pelias, and I’d like to include this documentation as part of the build process. I would like to modify the documentation here to treat Pelias as a standalone project, and use a post-processing step in mapzen-docs-generator to add the Mapzen-specific words back in.

Does this seem like a desirable change?

glossary and terms and clarifications

build a glossary! here are a few to start with

  • coarse
  • fine
  • geocoding
  • forward geocoding (from Rhonda - this term is in the get started topic now)
  • reverse geocoding
  • bounding box
  • endpoint (API)
  • place (API)
  • gazetteer
  • CORS
  • layer(s)
  • point
  • latitude
  • longitude
  • Coordinate System / WGS84
    ...

Multiple boundaries GIF image has incorrect final boundary

multiple boundaries

I hate to criticize this otherwise amazing GIF, but in the final frame, the red shape for the final searched area extends below the southern end of the bounding box (green). It has the effect of suggesting that we would ignore the bounding box in this case, which is of course false.

Update docs to reflect changes for Who's on First dataset

Update existing topics on data sources, endpoints with sources, place, etc. to cover quattroshapes and geonames changes

Add new content about WoF - including description, adding geometry, bounding box polygons for search and autocomplete

Add what's new section with deprecation notice + new source

Update data sources topic for more on Mapzen, licensing

See the comments in this closed PR: #9

  1. Some background on each data source is good (as some people may not have heard of all of these), but what info is Mapzen Search using from them? For example, what are some common items that we pull from OSM?

  2. We should also add descriptions of the licensing regime for each data source as well as a general license descriptor.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.