Git Product home page Git Product logo

cheep's People

Contributors

pnorman avatar

Watchers

 avatar  avatar  avatar

cheep's Issues

Build indexes and finalize DB

After import some tasks will need doing

  • UNIQUE indexes built on id columns
  • Set id columns as PRIMARY KEYs
  • If necessary, build indexes on ways.nodes and relations.members
    • How do we do this for the typed column
  • ANALYZE all tables
  • reset autovacuum
  • set id indexes as cluster indexes, but don't actually cluster

Load data

cheep needs to load data into the nodes/ways/relations tables. This must be done with COPY statements for performance reasons. It should also be multi-threaded.

Manage database schemas

We should have some way of migrating database schemas. Ruby has ways to do this with ActiveRecord::Migration, but what does it in Python?

Create database tables

Before importing data, there need to be tables. The first draft is

CREATE TABLE nodes (
  id bigint NOT NULL,
  long int NOT NULL,
  lat int NOT NULL,
  tags hstore);
CREATE TABLE ways (
  id bigint NOT NULL,
  nodes bigint[] NOT NULL,
  tags hstore);
CREATE TYPE relation_type AS ENUM ('node','way','relation');
CREATE TYPE relation_member AS (type relation_type, id bigint);
CREATE TABLE relations (
  id bigint NOT NULL,
  members relation_member[],
  tags hstore);

The tables should be created WITH (autovacuum_enabled = false); which then needs to be reset post-import.

node to way mapping

With a nodes bigint[] column in the ways table going from ways to child nodes is easy, but going from nodes to parent ways is harder. There are three popular ways to do it.

  1. Have a way_nodes table with way_id, node_id, sequence and an index on node_id and SELECT based on node_id. The nodes column could be removed if there was also an index on way_id, but this would be slower than a ways.nodes column. This is what the pgsnapshot schema does.
    Advantages: Fastest for a single lookup.
    Disadvantages: Amazingly large table + index. Table with ~4b rows. Bad for cache contention.
  2. Have a GIN index on ways.nodes bigint[]. Lookups can be done with a && operator, but should be structured to minimize the number of queries. osm2pgsql slim tables do this.
    Advantages: Much less disk space for the index than a way_nodes table. Avoids duplicate data.
    Disadvantages: A 170 GB GIN index with fastupdate off is slower to update and bloats quickly.
  3. Have a ways bigint[] column in the nodes table and put IDs of parent ways there. Lookups can be done by querying nodes.ways. AFAIK, this is untested.
    Advantages: Uses an already existing table and index. Smallest cache contention impact with the indexes. Probably fastest.
    Disadvantages: Untested. Requires more management of data to update the nodes table every time a way is created. Makes an already large nodes table even larger.

Parse OSM PBF

We know we're going to need to be able to import an OSM PBF. This should be isolated enough from the database to be testable by itself. Ideally it will be multi-threaded.

Make available on PyPi

We should plan from the start to put cheep on pypi. Other software like MapProxy does this and it makes installing much easier.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.