Git Product home page Git Product logo

geoip.jl's Introduction

GeoIP

IP Geolocalization using the Geolite2 Database

Documentation Build Status
StableDev BuildCoverage

Installation

The package is registered in the General registry and so can be installed at the REPL with

julia> using Pkg
julia> Pkg.add("GeoIP")

Usage

Data files

You can use MaxMind geolite2 csv files downloaded from the site. Due to the MaxMind policy, GeoLite.jl does not distribute GeoLite2 files and does not provide download utilities. For automated download it is recommended to use MaxMind GeoIP Update program. For proper functioning of GeoIP.jl you need to download GeoLite2 City datafile, usually it should have a name like GeoLite2-City-CSV_20191224.zip.

Files processing and loading provided with load() call. Directory where data is located should be located either in ENV["GEOIP_DATADIR"] or it can be passed as an argument to load function. Zip file location can be passed as an argument or it can be stored in ENV["GEOIP_ZIPFILE"]. For example

using GeoIP

geodata = load(zipfile = "GeoLite2-City-CSV_20191224.zip", datadir = "/data")

If ENV["GEOIP_DATADIR"] is set to "/data" and ENV["GEOIP_ZIPFILE"] is set to "GeoLite2-City-CSV_20191224.zip" then it is equivalent to

using GeoIP

geodata = load()

Example

You can get the ip data with the geolocate function or by using []

using GeoIP

geodata = load(zipfile = "GeoLite2-City-CSV_20191224.zip")
geolocate(geodata, ip"1.2.3.4")        # returns dictionary with all relevant information

# Equivalent to
geodata[ip"1.2.3.4"]

# Equivalent, but slower version
geodata["1.2.3.4"]

geolocate form is useful for broadcasting

geolocate.(geodata, [ip"1.2.3.4", ip"8.8.8.8"])  # returns vector of geo data.

Localization

It is possible to use localized version of geo files. To load localized data, one can use locales argument of the load function. To switch between different locales is possible with the help of setlocale function.

using GeoIP

geodata = load(zipfile = "GeoLite2-City-CSV_20191224.zip", locales = [:en, :fr])

geodata[ip"201.186.185.1"]
# Dict{String, Any} with 21 entries:
#   "time_zone"                     => "America/Santiago"
#   "subdivision_2_name"            => missing
#   "accuracy_radius"               => 100
#   "geoname_id"                    => 3874960
#   "continent_code"                => "SA"
#   "postal_code"                   => missing
#   "continent_name"                => "South America"
#   "locale_code"                   => "en"
#   "subdivision_2_iso_code"        => missing
#   "location"                      => Location(-72.9436, -41.4709, 0.0, "WGS84")
#   "v4net"                         => IPv4Net("201.186.185.0/24")
#   "subdivision_1_name"            => "Los Lagos Region"
#   "subdivision_1_iso_code"        => "LL"
#   "city_name"                     => "Port Montt"
#   "metro_code"                    => missing
#   "registered_country_geoname_id" => 3895114
#   "is_in_european_union"          => 0
#   "is_satellite_provider"         => 0
#   "is_anonymous_proxy"            => 0
#   "country_name"                  => "Chile"
#   "country_iso_code"              => "CL"

geodata_fr = setlocale(geodata, :fr)
geodata_fr[ip"201.186.185.1"]
# Dict{String, Any} with 21 entries:
#   "time_zone"                     => "America/Santiago"
#   "subdivision_2_name"            => missing
#   "accuracy_radius"               => 100
#   "geoname_id"                    => 3874960
#   "continent_code"                => "SA"
#   "postal_code"                   => missing
#   "continent_name"                => "Amérique du Sud"
#   "locale_code"                   => "fr"
#   "subdivision_2_iso_code"        => missing
#   "location"                      => Location(-72.9436, -41.4709, 0.0, "WGS84")
#   "v4net"                         => IPv4Net("201.186.185.0/24")
#   "subdivision_1_name"            => missing
#   "subdivision_1_iso_code"        => "LL"
#   "city_name"                     => "Puerto Montt"
#   "metro_code"                    => missing
#   "registered_country_geoname_id" => 3895114
#   "is_in_european_union"          => 0
#   "is_satellite_provider"         => 0
#   "is_anonymous_proxy"            => 0
#   "country_name"                  => "Chili"
#   "country_iso_code"              => "CL"

During load procedure, it is possible to use either Symbol notation, i.e. locales = [:en, :fr] or one can pass Vector of Pair, where first argument is the locale name and second argument is a regular expression, which defines the name of the CSV file, which contains necessary localization. For example locales = [:en => r"Locations-en.csv%", :fr => r"Locations-fr.csv"]. By default, following locales are supported :en, :de, :ru, :ja, :es, :fr, :pt_br, :zh_cn.

Default locale, which is used in getlocale response can be set with the help of deflocale argument of the load function. For example, to get :fr locale by default

geodata = load(zipfile = "GeoLite2-City-CSV_20191224.zip", locales = [:en, :fr], deflocale = :fr)

Acknowledgements

This product uses, but not include, GeoLite2 data created by MaxMind, available from http://www.maxmind.com.

geoip.jl's People

Contributors

arkoniak avatar github-actions[bot] avatar iainnz avatar johnmyleswhite avatar randyzwitch avatar sbromberger avatar tkelman avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

geoip.jl's Issues

[PkgEval] GeoIP may have a testing issue on Julia 0.4 (2014-10-08)

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.

On Julia 0.4

  • On 2014-10-05 the testing status was Tests pass.
  • On 2014-10-08 the testing status changed to Package doesn't load.

Tests pass. means that PackageEvaluator found the tests for your package, executed them, and they all passed.

Package doesn't load. means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using failed.

Special message from @IainNZ: This change may be due to breaking changes to Dict in JuliaLang/julia#8521, or the removal of deprecated syntax in JuliaLang/julia#8607.

This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.

Test log:

>>> 'Pkg.add("GeoIP")' log
INFO: Installing ArrayViews v0.4.6
INFO: Installing DataArrays v0.2.2
INFO: Installing DataFrames v0.5.9
INFO: Installing GZip v0.2.13
INFO: Installing GeoIP v0.1.0
INFO: Installing Reexport v0.0.1
INFO: Installing SortingAlgorithms v0.0.2
INFO: Installing StatsBase v0.6.6
INFO: Package database updated
INFO: METADATA is out-of-date a you may not have the latest version of GeoIP
INFO: Use `Pkg.update()` to get the latest versions of your packages

>>> 'using GeoIP' log

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:98.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:122.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Float64)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:162.
Use "Dict{T,Float64}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:192.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>W)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:193.
Use "Dict{T,W}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:66.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:77.
Use "Dict{T,Int}()" instead.

WARNING: deprecated syntax "[a=>b, ...]" at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/RDA.jl:11.
Use "Dict(a=>b, ...)" instead.
ERROR: `Dict{Symbol,Union(Real,AbstractArray{Real,1})}` has no method matching Dict{Symbol,Union(Real,AbstractArray{Real,1})}(::(Symbol,Symbol,Symbol,Symbol,Symbol,Symbol), ::(Int64,Int64,Int64,Int64,Int64,Int64))
 in builddf at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:649
 in readtable! at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:783
 in readtable at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:868
 in readtable at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:935
 in loaddatacountry at /home/idunning/pkgtest/.julia/v0.4/GeoIP/src/GeoIP.jl:31
 in include at ./boot.jl:245
 in include_from_node1 at ./loading.jl:128
 in reload_path at ./loading.jl:152
 in _require at ./loading.jl:67
 in require at ./loading.jl:52
 in require_3B_3964 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
 in include at ./boot.jl:245
 in include_from_node1 at loading.jl:128
 in process_options at ./client.jl:293
 in _start at ./client.jl:362
 in _start_3B_3789 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/GeoIP/src/GeoIP.jl, in expression starting on line 55
while loading /home/idunning/pkgtest/.julia/v0.4/GeoIP/testusing.jl, in expression starting on line 2
Julia Version 0.4.0-dev+998
Commit e24fac0 (2014-10-07 22:02 UTC)
Platform Info:
  System: Linux (x86_64-unknown-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3


>>> test log
no tests to run
>>> end of log

Remove IPNets dependancy

Since IPNets.jl is rather outdated, and used only internally, it is better for the time being to push its contents inside GeoIP.jl.

It can be factored out later.

BoundsError() when using city database functions

When I added in the code for all of the city database functions (i.e. all functions beyond original getcountryname and getcountrycode functions), I just copied over the searchsorted logic. But searchsorted requires the underlying array/df to be sorted, correct? If so, the current code for the city functions is incorrect, since when I create the full df on line 102, I do an inner join which doesn't guarantee returning a sorted df.

Luckily, the version that gets installed from METADATA.jl is still the old version, not the updated one, so unless someone clones this repo it shouldn't have affected anyone.

Given the state of flux with DataFrames sorting JuliaData/DataFrames.jl#389, how should I handle this? The reason why I chose to do the join inside the package was so that the MaxMind files could be swapped out as-is, rather than pre-processing them outside the package.

BoundsError()
while loading In[7], in expression starting on line 1
 in getindex at bitarray.jl:363
 in getindex at /Users/randyzwitch/.julia/DataArrays/src/dataarray.jl:311
 in getregionname at /Users/randyzwitch/.julia/GeoIP/src/GeoIP.jl:126

Create a small dataset for testing

Right now in order to test one needs to download the entire geoip datastore. There should be a way to specify an existing datastore so that we can create a minimal one for testing / proof of concept.

geolocate is slow

geolocate is extremely slow.

With the following setup

using BenchmarkTools
using GeoIP
using StableRNGs
import Sockets: IPv4

db = load(zipfile = "GeoLite2-City-CSV_20210427.zip")

rng = StableRNG(2021)
smp = rand(rng, db.db.v4net, 100)
ips = map(smp) do net
    IPv4(net.netaddr + 1)
end

I have the following results

julia> ip = ips[1]
ip"201.186.185.1"

julia> @btime geolocate($db, $ip)
  278.274 ms (12858984 allocations: 196.22 MiB)

julia> @time geolocate.(db, ips)
 16.092770 seconds (666.95 M allocations: 9.939 GiB, 8.59% gc time)

I use @time because there is no need in BenchmarkTools precise measurements, 16 seconds is way too long.

ERROR: `geolocate` has no method matching geolocate(::Type{IPv4})

Version 0.3.6 (2015-02-17 22:12 UTC)
Official http://julialang.org/ release
x86_64-w64-mingw32

julia> Pkg.add("GeoIP")
INFO: Cloning cache of ArrayViews from git://github.com/JuliaLang/ArrayViews.jl.git
INFO: Cloning cache of DataArrays from git://github.com/JuliaStats/DataArrays.jl.git
INFO: Cloning cache of DataFrames from git://github.com/JuliaStats/DataFrames.jl.git
INFO: Cloning cache of GZip from git://github.com/JuliaLang/GZip.jl.git
INFO: Cloning cache of GeoIP from git://github.com/JuliaWeb/GeoIP.jl.git
INFO: Cloning cache of IPNets from git://github.com/JuliaWeb/IPNets.jl.git
INFO: Cloning cache of Reexport from git://github.com/simonster/Reexport.jl.git
INFO: Cloning cache of SortingAlgorithms from git://github.com/JuliaLang/SortingAlgorithms.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Cloning cache of ZipFile from git://github.com/fhs/ZipFile.jl.git
INFO: Installing ArrayViews v0.4.8
INFO: Installing DataArrays v0.2.11
INFO: Installing DataFrames v0.6.1
INFO: Installing GZip v0.2.13
INFO: Installing GeoIP v0.2.0
INFO: Installing IPNets v0.1.3
INFO: Installing Reexport v0.0.2
INFO: Installing SortingAlgorithms v0.0.3
INFO: Installing StatsBase v0.6.12
INFO: Installing ZipFile v0.2.3
INFO: Building HttpParser
INFO: Building LibCURL
INFO: Building WinRPM
INFO: Downloading http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_13.1//repodata/repom
ml
INFO: Downloading http://download.opensuse.org/repositories/windows:/mingw:/win64/openSUSE_13.1//repodata/repom
ml
INFO: Building Nettle
INFO: Building GnuTLS
INFO: Package database updated

julia> using GeoIP
Warning: using StatsBase.midpoints in module Main conflicts with an existing identifier.
Warning: using StatsBase.histrange in module Main conflicts with an existing identifier.

julia> GeoIP.geolocate(IPv4)
ERROR: geolocate has no method matching geolocate(::Type{IPv4})

julia> GeoIP.geolocate(IPv6)
ERROR: geolocate has no method matching geolocate(::Type{IPv6})

julia> a = ip"1.2.3.4"
ip"1.2.3.4"

julia> geolocate(a)
ERROR: type Response has no field data
in geolocate at C:\Users\SAMSUNG2.julia\v0.3\GeoIP\src\geoip-module.jl:126

julia>

Paul

High memory usage

It might be good to have an option to query the database without loading the entire thing into ram. Its seems like a waste of time and memory for smaller queries. Using something like SQLite as a data store would be helpful to facilitate this.

Info about upcoming removal of packages in the General registry

As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.

To transition to the new Pkg system using Project.toml, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.

If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.

Test hangs?

cd ~/.julia/v0.3/GeoIP/test
julia GeoIP.jl 

README: database workflow status

README should be updated to reflect current state of the project.

As it was mentioned in the #47 (comment) due to the fact that maxmind no longer provides direct access to it's database files, update and other functions are not working.

README should contain following information:

  1. MaxMind data is not downloaded anymore from the package and should be downloaded externally.
  2. MaxMind alternative is https://db-ip.com/db/
  3. GeoIP.jl should be used as a wrapper for downloaded data and nothing else.

Request to Maintain

This package seems to have fallen off the radar a bit. I'd like write permissions to perform some updates and maintenance if possible (add appveyor, increase performance, complete some of @sbromberger 's ideas, write documentation etc).

[PackageEvaluator.jl] Your package GeoIP may have a testing issue.

This issue is being filed by a script, but if you reply, I will see it.

PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their test (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3).

The results of this script are used to generate a package listing enhanced with testing results.

The status of this package, GeoIP, on...

  • Julia 0.2 is 'Package doesn't load.' PackageEvaluator.jl
  • Julia 0.3 is 'Package doesn't load.' PackageEvaluator.jl

'No tests, but package loads.' can be due to their being no tests (you should write some if you can!) but can also be due to PackageEvaluator not being able to find your tests. Consider adding a test/runtests.jl file.

'Package doesn't load.' is the worst-case scenario. Sometimes this arises because your package doesn't have BinDeps support, or needs something that can't be installed with BinDeps. If this is the case for your package, please file an issue and an exception can be made so your package will not be tested.

This automatically filed issue is a one-off message. Starting soon, issues will only be filed when the testing status of your package changes in a negative direction (gets worse). If you'd like to opt-out of these status-change messages, reply to this message.

Fix codecov statistics

For some reason, codecov shows 0% coverage and keeps telling that it is unable to find commit. I wonder how it can be fixed.

Package does not seem to work

Not sure what I am missing:

julia> using GeoIP, Sockets

julia> a = ip"1.2.3.4";

julia> geolocate(a)
[ Info: Geolocation data not in memory. Loading...
┌ Error: Geolocation data cannot be read. Data directory may be corrupt...
└ @ GeoIP ~/.julia/packages/GeoIP/ct0la/src/data.jl:96
ERROR: UndefVarError: blocks not defined

Browsing the source I found the update function:

julia> GeoIP.update()
┌ Error: Failed to download checksum file from MaxMind, check network connectivity
└ @ GeoIP ~/.julia/packages/GeoIP/ct0la/src/data.jl:37
ERROR: MethodError: no method matching dldata(::Nothing)

Add Maxminddb support

Geolite2 datasets are also available in a custom binary format described here. Might be worth implementing a reader as it is probably more efficient than using the csv version of the database. There is a lot of prior art (much due to the authors of the specification itself) in other languages we can use as reference:

TagBot trigger issue

This issue is used to trigger TagBot; feel free to unsubscribe.

If you haven't already, you should update your TagBot.yml to include issue comment triggers.
Please see this post on Discourse for instructions and more details.

If you'd like for me to do this for you, comment TagBot fix on this issue.
I'll open a PR within a few hours, please be patient!

Package restructuring

I want to summarize here problems that I see with current implementation and some ideas how to overcome it.

Behind the scenes data downloading

In current implementation, data is loaded invisibly for the user. Moreover, it is not only loaded invisibly, it also downloads invisibly.

It leads to the following issues:

  1. Unpredictable times of first geolocate call, it can take from milliseconds (actual lookup) to seconds or even minutes (when data is loaded).
  2. Uncontrollable behaviour: user can't choose whether he wants to load data from an existing file, whether he wants to update the database, or even which base to use.
  3. It is hard to switch from IPv4 to IPv6, an application should load both bases behind the scene and then somehow choose which one to use. See #21
  4. It is hard to change localization, once again, all necessary files should be loaded behind the scene. It generates additional memory pressure. See #22
  5. It is hard to switch from CSV to MaxMindDB, because it is not quite clear which base to use. See #26

Solution to all of these problems is the following methods which are accessible by user:

  1. load: it should accept various parameters and modes. User can choose between local and internet data loading, between different database formats and localization
  2. update!: it should accept parameters similar to `load, but it should validate the current state of the database and update database if new version is available.
  3. geolocate should be changed to geolocate(::DB, ::IP). For convenience, getindex method can be added db[IP] which works as geolocate.

Loaded Data structure and results

In the current implementation DataFrame is used as a storage format, and Dict{String, Any} used as a return query format.

It leads to the following issues

  1. DataFrame is type unstable by construction, so improper use can lead to unnecessary allocations and overall slowness.
  2. Row construction is rather slow
  3. Output is type unstable, making it the reason of slowdown in final application.

Possible solution:

  1. Use StructArray or Vector of GeoResult structs.
  2. Return GeoResult, which should be concretely typed and have a fixed number of fields. Use sentinel values instead of missing data.

Localization

Geolite2 already has localization support for a handful of other languages. Might be nice to extend our support to these other languages, especially given the nice UTF8 support built into Julia.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.