Git Product home page Git Product logo

Comments (7)

3nprob avatar 3nprob commented on June 11, 2024 1

So I think as long as endpoints are straightforward enough to configure both at buildtime runtime, offline geocoding and routing are not that crucial from a pure privacy perspective, as those whose threat models are strict enough that this is a concern can also figure out something that works.

Not having to trust any server is less important if it's easy enough to set up a new server or use a friends'.

(There are still reasons why these, including routing, are interesting features)

from headway.

ellenhp avatar ellenhp commented on June 11, 2024

I got carmen to ingest some sample data, which is good. Next I'm going to try and ingest a small OSM extract.

from headway.

ellenhp avatar ellenhp commented on June 11, 2024

Carmen seems to be doing geocoding in my test setup, but I'm really unhappy with the size of the index. The fuzzy phrase store and grid store combined are 1 MB compressed for the Seattle metro area.

from headway.

ellenhp avatar ellenhp commented on June 11, 2024

I have fuzzy_phrase building for wasm, complete with dynamic loading of the search index, which is a very unexpected turn of events. https://github.com/ellenhp/fuzzy-phrase/tree/wasm

This was only possible because of the work done here: https://github.com/phiresky/tantivy-fst
I'm so thankful that I found this fork. :)

Currently working on converting carmen-core to use sqlite instead of rocksdb so we can leverage sql.js-httpvfs to dynamically load the gridstore too. After that I'll perform some analysis to determine how ping time to the server affects geocoding performance. I'm expecting tens of serial HTTP range requests, so ping time will probably be critical. If this does end up working well enough to deploy, it will probably make sense to use a CDN to serve the index. As much as I hate letting cloudflare MITM my TLS connections, there might not be any other way to have a good user experience. And even though access patterns would leak information to the CDN operator, it beats the heck out of sending a free-text geocoding query to $MAPS_COMPANY.

from headway.

ellenhp avatar ellenhp commented on June 11, 2024

carmen-core is working with a sqlite backend so now in theory lazy loading is unblocked, but I'm not convinced sqlite is the correct path forward. I'd like to avoid it because simultaneous interop between javascript, C/C++ and Rust sounds kind of hard and I don't understand the emscripten virtual filesystem stuff. Also sql.js is like a megabyte of wasm. I want to build my own key-value store that will eagerly download the index then lazily download the data blocks. It also gives me much more control over latency that way compared to implementing a lazy filesystem for sqlite.

After that I think all that remains is building a new wasm_bindgen interface for carmen core, building a lazy fst::FakeArr, then building vtquery with emscripten and using that instead of the vtquery node package. Inevitably there will be issues but this doesn't seem like more than another week of work. I have a 10 day vacation coming up though so my original estimate of 1 month might end up being accurate after all.

from headway.

ellenhp avatar ellenhp commented on June 11, 2024

At this point I'm pretty convinced that Mapbox Carmen won't work as-is, which is a bummer. I've started exploring other options but I think it makes sense to get Headway into a working state as originally scoped. A lot of people were excited about it as originally scoped and I don't think I want to block its completion on me writing a geocoder from scratch.

from headway.

ellenhp avatar ellenhp commented on June 11, 2024

I'm sure if I spent a few months on this I could get it to tech demo levels of functionality but I want more for for this project than that, so I'm going to move forward with a traditional geocoder stack. Expectations for privacy can be managed in some other way. I think it may eventually be reasonable to build a privacy-preserving replacement for nominatim but it is not reasonable IMO to try to replicate the performance or usability characteristics of photon. There's just like, so much work that's gone into making that fast, generalized and typo-tolerant.

I'm going to keep pursuing offline routing though. There are a few user stories that could preserve privacy better if offline routing were to work (route me home, or to any other location I've cached the lat/lng for)

from headway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.