juliaweb / geoip.jl Goto Github PK
View Code? Open in Web Editor NEWA Julia package to estimate the geographic location of IP addresses
License: Other
A Julia package to estimate the geographic location of IP addresses
License: Other
geolocate
is extremely slow.
With the following setup
using BenchmarkTools
using GeoIP
using StableRNGs
import Sockets: IPv4
db = load(zipfile = "GeoLite2-City-CSV_20210427.zip")
rng = StableRNG(2021)
smp = rand(rng, db.db.v4net, 100)
ips = map(smp) do net
IPv4(net.netaddr + 1)
end
I have the following results
julia> ip = ips[1]
ip"201.186.185.1"
julia> @btime geolocate($db, $ip)
278.274 ms (12858984 allocations: 196.22 MiB)
julia> @time geolocate.(db, ips)
16.092770 seconds (666.95 M allocations: 9.939 GiB, 8.59% gc time)
I use @time
because there is no need in BenchmarkTools
precise measurements, 16 seconds is way too long.
Geolite2 already has localization support for a handful of other languages. Might be nice to extend our support to these other languages, especially given the nice UTF8 support built into Julia.
This issue is being filed by a script, but if you reply, I will see it.
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their test (if available) on both the stable version of Julia (0.2) and the nightly build of the unstable version (0.3).
The results of this script are used to generate a package listing enhanced with testing results.
The status of this package, GeoIP, on...
'No tests, but package loads.' can be due to their being no tests (you should write some if you can!) but can also be due to PackageEvaluator not being able to find your tests. Consider adding a test/runtests.jl
file.
'Package doesn't load.' is the worst-case scenario. Sometimes this arises because your package doesn't have BinDeps support, or needs something that can't be installed with BinDeps. If this is the case for your package, please file an issue and an exception can be made so your package will not be tested.
This automatically filed issue is a one-off message. Starting soon, issues will only be filed when the testing status of your package changes in a negative direction (gets worse). If you'd like to opt-out of these status-change messages, reply to this message.
As described in https://discourse.julialang.org/t/ann-plans-for-removing-packages-that-do-not-yet-support-1-0-from-the-general-registry/ we are planning on removing packages that do not support 1.0 from the General registry. This package has been detected to not support 1.0 and is thus slated to be removed. The removal of packages from the registry will happen approximately a month after this issue is open.
To transition to the new Pkg system using Project.toml
, see https://github.com/JuliaRegistries/Registrator.jl#transitioning-from-require-to-projecttoml.
To then tag a new version of the package, see https://github.com/JuliaRegistries/Registrator.jl#via-the-github-app.
If you believe this package has erroneously been detected as not supporting 1.0 or have any other questions, don't hesitate to discuss it here or in the thread linked at the top of this post.
This issue is used to trigger TagBot; feel free to unsubscribe.
If you haven't already, you should update your TagBot.yml
to include issue comment triggers.
Please see this post on Discourse for instructions and more details.
If you'd like for me to do this for you, comment TagBot fix
on this issue.
I'll open a PR within a few hours, please be patient!
Right now in order to test one needs to download the entire geoip datastore. There should be a way to specify an existing datastore so that we can create a minimal one for testing / proof of concept.
README should be updated to reflect current state of the project.
As it was mentioned in the #47 (comment) due to the fact that maxmind no longer provides direct access to it's database files, update
and other functions are not working.
README should contain following information:
cd ~/.julia/v0.3/GeoIP/test
julia GeoIP.jl
Geolite2 datasets are also available in a custom binary format described here. Might be worth implementing a reader as it is probably more efficient than using the csv version of the database. There is a lot of prior art (much due to the authors of the specification itself) in other languages we can use as reference:
PackageEvaluator.jl is a script that runs nightly. It attempts to load all Julia packages and run their tests (if available) on both the stable version of Julia (0.3) and the nightly build of the unstable version (0.4). The results of this script are used to generate a package listing enhanced with testing results.
Tests pass.
Package doesn't load.
Tests pass.
means that PackageEvaluator found the tests for your package, executed them, and they all passed.
Package doesn't load.
means that PackageEvaluator did not find tests for your package. Additionally, trying to load your package with using
failed.
Special message from @IainNZ: This change may be due to breaking changes to Dict
in JuliaLang/julia#8521, or the removal of deprecated syntax in JuliaLang/julia#8607.
This issue was filed because your testing status became worse. No additional issues will be filed if your package remains in this state, and no issue will be filed if it improves. If you'd like to opt-out of these status-change messages, reply to this message saying you'd like to and @IainNZ will add an exception. If you'd like to discuss PackageEvaluator.jl please file an issue at the repository. For example, your package may be untestable on the test machine due to a dependency - an exception can be added.
Test log:
>>> 'Pkg.add("GeoIP")' log
INFO: Installing ArrayViews v0.4.6
INFO: Installing DataArrays v0.2.2
INFO: Installing DataFrames v0.5.9
INFO: Installing GZip v0.2.13
INFO: Installing GeoIP v0.1.0
INFO: Installing Reexport v0.0.1
INFO: Installing SortingAlgorithms v0.0.2
INFO: Installing StatsBase v0.6.6
INFO: Package database updated
INFO: METADATA is out-of-date a you may not have the latest version of GeoIP
INFO: Use `Pkg.update()` to get the latest versions of your packages
>>> 'using GeoIP' log
WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:98.
Use "Dict{T,Int}()" instead.
WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/scalarstats.jl:122.
Use "Dict{T,Int}()" instead.
WARNING: deprecated syntax "(T=>Float64)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:162.
Use "Dict{T,Float64}()" instead.
WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:192.
Use "Dict{T,Int}()" instead.
WARNING: deprecated syntax "(T=>W)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/counts.jl:193.
Use "Dict{T,W}()" instead.
WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:66.
Use "Dict{T,Int}()" instead.
WARNING: deprecated syntax "(T=>Int)[]" at /home/idunning/pkgtest/.julia/v0.4/StatsBase/src/misc.jl:77.
Use "Dict{T,Int}()" instead.
WARNING: deprecated syntax "[a=>b, ...]" at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/RDA.jl:11.
Use "Dict(a=>b, ...)" instead.
ERROR: `Dict{Symbol,Union(Real,AbstractArray{Real,1})}` has no method matching Dict{Symbol,Union(Real,AbstractArray{Real,1})}(::(Symbol,Symbol,Symbol,Symbol,Symbol,Symbol), ::(Int64,Int64,Int64,Int64,Int64,Int64))
in builddf at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:649
in readtable! at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:783
in readtable at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:868
in readtable at /home/idunning/pkgtest/.julia/v0.4/DataFrames/src/dataframe/io.jl:935
in loaddatacountry at /home/idunning/pkgtest/.julia/v0.4/GeoIP/src/GeoIP.jl:31
in include at ./boot.jl:245
in include_from_node1 at ./loading.jl:128
in reload_path at ./loading.jl:152
in _require at ./loading.jl:67
in require at ./loading.jl:52
in require_3B_3964 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
in include at ./boot.jl:245
in include_from_node1 at loading.jl:128
in process_options at ./client.jl:293
in _start at ./client.jl:362
in _start_3B_3789 at /home/idunning/julia04/usr/bin/../lib/julia/sys.so
while loading /home/idunning/pkgtest/.julia/v0.4/GeoIP/src/GeoIP.jl, in expression starting on line 55
while loading /home/idunning/pkgtest/.julia/v0.4/GeoIP/testusing.jl, in expression starting on line 2
Julia Version 0.4.0-dev+998
Commit e24fac0 (2014-10-07 22:02 UTC)
Platform Info:
System: Linux (x86_64-unknown-linux-gnu)
CPU: Intel(R) Xeon(R) CPU E5-2650 0 @ 2.00GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
LAPACK: libopenblas
LIBM: libopenlibm
LLVM: libLLVM-3.3
>>> test log
no tests to run
>>> end of log
This package seems to have fallen off the radar a bit. I'd like write permissions to perform some updates and maintenance if possible (add appveyor, increase performance, complete some of @sbromberger 's ideas, write documentation etc).
There are artifacts, which left from pre 0.7 era in IPNets
module.
Base: start, next, done
should be changed to a single iterate
I want to summarize here problems that I see with current implementation and some ideas how to overcome it.
In current implementation, data is loaded invisibly for the user. Moreover, it is not only loaded invisibly, it also downloads invisibly.
It leads to the following issues:
geolocate
call, it can take from milliseconds (actual lookup) to seconds or even minutes (when data is loaded).Solution to all of these problems is the following methods which are accessible by user:
load
: it should accept various parameters and modes. User can choose between local and internet data loading, between different database formats and localizationupdate!
: it should accept parameters similar to `load, but it should validate the current state of the database and update database if new version is available.geolocate
should be changed to geolocate(::DB, ::IP)
. For convenience, getindex
method can be added db[IP]
which works as geolocate
.In the current implementation DataFrame
is used as a storage format, and Dict{String, Any}
used as a return query format.
It leads to the following issues
DataFrame
is type unstable by construction, so improper use can lead to unnecessary allocations and overall slowness.Row
construction is rather slowPossible solution:
StructArray
or Vector
of GeoResult
structs.GeoResult
, which should be concretely typed and have a fixed number of fields. Use sentinel values instead of missing data.Not sure what I am missing:
julia> using GeoIP, Sockets
julia> a = ip"1.2.3.4";
julia> geolocate(a)
[ Info: Geolocation data not in memory. Loading...
┌ Error: Geolocation data cannot be read. Data directory may be corrupt...
└ @ GeoIP ~/.julia/packages/GeoIP/ct0la/src/data.jl:96
ERROR: UndefVarError: blocks not defined
Browsing the source I found the update
function:
julia> GeoIP.update()
┌ Error: Failed to download checksum file from MaxMind, check network connectivity
└ @ GeoIP ~/.julia/packages/GeoIP/ct0la/src/data.jl:37
ERROR: MethodError: no method matching dldata(::Nothing)
It might be good to have an option to query the database without loading the entire thing into ram. Its seems like a waste of time and memory for smaller queries. Using something like SQLite as a data store would be helpful to facilitate this.
Version 0.3.6 (2015-02-17 22:12 UTC)
Official http://julialang.org/ release
x86_64-w64-mingw32
julia> Pkg.add("GeoIP")
INFO: Cloning cache of ArrayViews from git://github.com/JuliaLang/ArrayViews.jl.git
INFO: Cloning cache of DataArrays from git://github.com/JuliaStats/DataArrays.jl.git
INFO: Cloning cache of DataFrames from git://github.com/JuliaStats/DataFrames.jl.git
INFO: Cloning cache of GZip from git://github.com/JuliaLang/GZip.jl.git
INFO: Cloning cache of GeoIP from git://github.com/JuliaWeb/GeoIP.jl.git
INFO: Cloning cache of IPNets from git://github.com/JuliaWeb/IPNets.jl.git
INFO: Cloning cache of Reexport from git://github.com/simonster/Reexport.jl.git
INFO: Cloning cache of SortingAlgorithms from git://github.com/JuliaLang/SortingAlgorithms.jl.git
INFO: Cloning cache of StatsBase from git://github.com/JuliaStats/StatsBase.jl.git
INFO: Cloning cache of ZipFile from git://github.com/fhs/ZipFile.jl.git
INFO: Installing ArrayViews v0.4.8
INFO: Installing DataArrays v0.2.11
INFO: Installing DataFrames v0.6.1
INFO: Installing GZip v0.2.13
INFO: Installing GeoIP v0.2.0
INFO: Installing IPNets v0.1.3
INFO: Installing Reexport v0.0.2
INFO: Installing SortingAlgorithms v0.0.3
INFO: Installing StatsBase v0.6.12
INFO: Installing ZipFile v0.2.3
INFO: Building HttpParser
INFO: Building LibCURL
INFO: Building WinRPM
INFO: Downloading http://download.opensuse.org/repositories/windows:/mingw:/win32/openSUSE_13.1//repodata/repom
ml
INFO: Downloading http://download.opensuse.org/repositories/windows:/mingw:/win64/openSUSE_13.1//repodata/repom
ml
INFO: Building Nettle
INFO: Building GnuTLS
INFO: Package database updated
julia> using GeoIP
Warning: using StatsBase.midpoints in module Main conflicts with an existing identifier.
Warning: using StatsBase.histrange in module Main conflicts with an existing identifier.
julia> GeoIP.geolocate(IPv4)
ERROR: geolocate
has no method matching geolocate(::Type{IPv4})
julia> GeoIP.geolocate(IPv6)
ERROR: geolocate
has no method matching geolocate(::Type{IPv6})
julia> a = ip"1.2.3.4"
ip"1.2.3.4"
julia> geolocate(a)
ERROR: type Response has no field data
in geolocate at C:\Users\SAMSUNG2.julia\v0.3\GeoIP\src\geoip-module.jl:126
julia>
Paul
Package stuck at 0.10, 0.11 with missings support needs implementation.
Do you support IP2Location LITE database?
For some reason, codecov shows 0% coverage and keeps telling that it is unable to find commit. I wonder how it can be fixed.
Since IPNets.jl
is rather outdated, and used only internally, it is better for the time being to push its contents inside GeoIP.jl
.
It can be factored out later.
When I added in the code for all of the city database functions (i.e. all functions beyond original getcountryname
and getcountrycode
functions), I just copied over the searchsorted
logic. But searchsorted
requires the underlying array/df to be sorted, correct? If so, the current code for the city functions is incorrect, since when I create the full
df on line 102, I do an inner join which doesn't guarantee returning a sorted df.
Luckily, the version that gets installed from METADATA.jl is still the old version, not the updated one, so unless someone clones this repo it shouldn't have affected anyone.
Given the state of flux with DataFrames sorting JuliaData/DataFrames.jl#389, how should I handle this? The reason why I chose to do the join inside the package was so that the MaxMind files could be swapped out as-is, rather than pre-processing them outside the package.
BoundsError()
while loading In[7], in expression starting on line 1
in getindex at bitarray.jl:363
in getindex at /Users/randyzwitch/.julia/DataArrays/src/dataarray.jl:311
in getregionname at /Users/randyzwitch/.julia/GeoIP/src/GeoIP.jl:126
We should use Geodesy.jl for point location coordinates, instead of the custom structures.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.