Git Product home page Git Product logo

geobed's Introduction

Geobed

Build Status Coverage Status

This Golang package contains an embedded geocoder. There are no major external dependendies other than some downloaded data files. Once downloaded, those data files are stored in memory. So after the initial load there truly are no outside dependencies. It geocodes and reverse geocodes to a city level detail. It approximates and takes educated guesses when not enough detail is provided. See test cases for more examples.

Why?

To keep it short and simple, the reason this package was built was because geocoding services are really expensive. If city level detail is enough and you don't need street addresses, then this should be completely fine. It's also nice that there are no HTTP requests being made to do this (after initial load - and the data files can be copied to other places).

Performance is pretty good, but that is one of the goals. Overtime it should improve, but for now it geocodes a string to lat/lng in about 0.0125 - 0.0135 seconds (on a Macbook Pro).

Usage

You should re-use the GeoBed struct as it contains a LOT of data (2.7+ million items). On this struct are the functions to geocode and reverse geocode. Be aware that this also means your machine will need a good bit of RAM since this is all data held in memory (which is also what makes it fast too).

g := NewGeobed()
c := g.Geocode("london")

In the above case, c should end up being:

{London london City of London,Gorad Londan,ILondon,LON,Lakana,Landen,Ljondan,Llundain,Londain,Londan,Londar,Londe,Londen,Londinium,Londino,Londn,London,London City,Londona,Londonas,Londoni,Londono,Londonu,Londra,Londres,Londrez,Londri,Londye,Londyn,Londýn,Lonn,Lontoo,Loundres,Luan GJon,Lunden,Lundra,Lundun,Lundunir,Lundúnir,Lung-dung,Lunnainn,Lunnin,Lunnon,Luân Đôn,Lùng-dŭng,Lākana,Lůndůn,Lọndọnu,Ranana,Rānana,The City,ilantan,landan,landana,leondeon,lndn,london,londoni,lun dui,lun dun,lwndwn,lxndxn,rondon,Łondra,Λονδίνο,Горад Лондан,Лондан,Лондон,Лондонъ,Лёндан,Լոնդոն,לאנדאן,לונדון,لندن,لوندون,لەندەن,ܠܘܢܕܘܢ,लंडन,लंदन,लण्डन,लन्डन्,লন্ডন,લંડન,ଲଣ୍ଡନ,இலண்டன்,లండన్,ಲಂಡನ್,ലണ്ടൻ,ලන්ඩන්,ลอนดอน,ລອນດອນ,ལོན་ཊོན།,လန်ဒန်မြို့,ლონდონი,ለንደን,ᎫᎴ ᏗᏍᎪᏂᎯᏱ,ロンドン,伦敦,倫敦,런던 GB ENG 51.50853 -0.12574 7556900 gcpvj0u6yjcm}

So you can get lat/lng from the GeobedCity struct real easily with: c.Latitude and c.Longitude.

You'll notice some records are larger and contain many alternate names for the city. The free data sets come from Geonames and MaxMind. MaxMind has more but less details. Geonames has more details, but it only contains cities with populations of 1,000 people or greater (about 143,000 records).

If you looked up a major city, you'll likely have information such as population (c.Population).

You can reverse geocode as well.

c := g.ReverseGeocode(30.26715, -97.74306)

This would give you Austin, TX for example.

Data Sets

The data sets are provided by Geonames and MaxMind. These are open source data sets. See their web sites for additional information.

geobed's People

Contributors

tmaiaroto avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

geobed's Issues

New Delhi

First of all, thanks for this. It's exactly what I needed!

Oddly, I noticed that I can't find New Delhi or New Delhi, IN or any variation I can think of when I search by exact match. Thoughts?

Improve lookup speed

It takes a while to loop the millions of data points when looking for the best match. This results in a forward lookup taking about 2-3 seconds. Quite slow. However, without the MaxMind data set (only using Geonames - cities with a population of 1,000 people or more) the lookups take about 0.0125 seconds. MUCH faster. At the cost of being unable to geocode certain cities and towns.

Maxmind's data set is very large. Dupes have already been removed, but there's still over 2 million records.

I'm going to look into concurrency to improve the speed...But initial naive attempts resulted in longer lookups. I do know setting MAXPROCS has helped a bit, but I don't want to just do that in the package because it may decrease performance for an application depending on other things it may do.

I tried fuzzy string comparisons. I've tried embedded search engines like belve. All of those attempts didn't work out for various reasons.

Getting the lookup down to about a second with 2.8 million records or so would be great and would be my goal.

Set "score" threshold

Much of the scoring is to weed out same name cities in multiple locations. However, if the passed string query truly isn't a location, it may still match something superficially. So a threshold should be set so that if a score isn't of a certain amount, return an empty location value.

Make data sets optional

The NewGeobed() function will need the ability to have an option passed that only uses certain data sets. For example, dropping the use of MaxMind's set and only using Geonames increases performance at the cost of not being able to geocode certain cities.

Optional "exact" match

It would be nice to have another function for exact matches. This would increase the lookup speed a little bit and could really help with ensuring accuracy. The data coming in would need to be pretty clean and reliable for this to work though.

Can I use it for every country in the world?

Hi! This library is AWESOME thanks a lot!!
I don't even need the precision of getting cities, I am already okay with getting countries!
Does this library work with EVERY country?

thanks a lot!!! (because then I won't have to build stuff myself, thanks a lot!!!! https://www.reddit.com/r/webdev/comments/tlfq17/comment/i1vamqc/?utm_source=share&utm_medium=web2x&context=3 )

BTW: just out of interest: how good is the city coverage actually? Will cities in asia or africa with over 1000 population get shown too?

Return multiple results

Add the confidence score, but also return multiple results. It need not only return one. Especially when there's a vague query; ie. "New York"

Make option to use different or limited or parts of data sets

The data files were put into a slice so that they could easily be configured and processed. Well, unfortunately they weren't consistent in format. So it kinda doesn't make sense to keep them like that, but it also kinda does. Especially if more will be added in the future.

This will need to be re-addressed and the more pressing issue is that there's quite a bit of memory usage for both sets as is. It would be nice to choose which sets are used because it can sacrifice accuracy for speed.

The Geonames set is far smaller and great for larger cities. The MaxMind set contains a LOT of data, but it may not necessarily be required for certain apps. It would be nice to allow the application to decide.

It might also be nice to allow certain cities to be included from the MaxMind set. For example, any with a population. Or cities from particular countries. So an option to limit the amount of data stored in memory would be great.

Add country info lookup

It would be nice to be able to look up countries by searching as well. The CountryInfo struct has some additional details such as population too.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.