Git Product home page Git Product logo

geolite2citydb_summary's Introduction

How to run the code:

On a standard Ubuntu system (Ubuntu 20.04 or less), at first, we need to check whether Python is already installed or not:

Ubuntu version 20.04:
python3 --version

Prior Ubuntu versions:
python --version

If python3 is not installed, please run the following commands:
sudo apt-get update
sudo apt-get install python3
Once, python3 is installed, we need to install the "geoip2" module by following command:
ย  pip install geoip2 [1]

If pip (Pip Install Packages) is not installed, please execute the following command:
sudo apt install python3-pip

We are now ready to run the code by the following command (Ubuntu 20.04):
python3 parseGeoLiteCityDB.py access.log

For prior Ubuntu version:
python parseGeoLiteCityDB.py access.log

Files needed to run the program:

  1. access.log
  2. GeoLite2-City.mmdb

Both of the files need to be present in the same directory where parseGeoLiteCityDB.py is located.

While executing the run command (python3 parseGeoLiteCityDB.py access.log), we can use different log files to get different outputs.

If you want to try a different database, please change the variable name in line no. 15 in the code.

I can code in such a way so that we can have both (access.log file and database file) as an input, however, since the homework question says "Include a command-line program to run your code against an arbitrary file", I limited the input argument to only access.log file.

If needed, please download the "GeoLite2-City.mmdb" from here: https://drive.google.com/drive/folders/1Squ0xtr2QCDPoGq6TyIkS-_0HjA2yMib?usp=sharing [2]

Sample Outputs:

Output 1:

ubuntu@ip-172-31-45-47:~/maxmind$ python3 parseGeoLiteCityDB.py access.log

Most Viewed Country:
Country :: #Most View :: "The most viewed page" (#viewCount)
============================================================
United States :: 13905 :: "/region/1" (61)
Netherlands :: 3216 :: "/search/by-lat-long/9.250043,-83.859123/filter/category_id=1;category_id=2;category_id=3;category_id=4;category_id=5;category_id=6;category_id=7;category_id=8;category_id=9?limit=10;unit=km;distance=10" (11)
China :: 1466 :: "/entry/" (9)
Germany :: 1244 :: "/entry/20252" (26)
France :: 702 :: "/entry/2299" (4)
Russia :: 658 :: "/region/659" (3)
United Kingdom :: 304 :: "/region/52" (7)
Canada :: 221 :: "/entry/6843" (3)
Mexico :: 120 :: "/region/1" (2)
Israel :: 66 :: "/site/recent.atom?entries_only=1" (11)

Most Viewed US States:
States :: #Most View :: "The most viewed page" (#viewCount)
============================================================
Washington :: 2400 :: "/region/1" (17)
Virginia :: 2278 :: "/entry/4628" (8)
California :: 410 :: "/location/most_recent_vendors.rss?location_id=5" (22)
New York :: 174 :: "/region/1" (5)
Delaware :: 171 :: "/region/1503" (2)
Michigan :: 153 :: "/region/447" (2)
Texas :: 152 :: "/region/218" (10)
Minnesota :: 131 :: "/region/13" (21)
Illinois :: 116 :: "/region/1766" (4)
New Jersey :: 96 :: "/entry/near/40.7458%2C-74.0321/filter/category_id=1;veg_level=2;allow_closed=0?limit=10;order_by=distance;address=Your+location" (4)

Summary:
Total valid IP processed: 22454

Unknown Country list: (total 3)
['193.202.255.201', '66.249.93.72', '66.249.81.72']

Unknown states found: 790

Total execution time: 13.11 seconds.

I also experimented with an altered access.log file, at first reducing the total number of lines by half, and then only having the first 500 lines. So my code handles the situation correctly: "where there are less than 10 states or countries with visitors, only show those which have at least one visitor". Corresponding outputs:

Output 2:

Using 25037 lines of access.log file:
ubuntu@ip-172-31-45-47:~/maxmind$ python3 parseGeoLiteCityDB.py access_half.log
Most Viewed Country:
Country :: #Most View :: "The most viewed page" (#viewCount)
============================================================
United States :: 5844 :: "/region/1" (36)
Netherlands :: 1272 :: "/search/by-lat-long/53.214297,-1.738481/filter/category_id=1;category_id=2;category_id=3;category_id=4;category_id=5;category_id=6;category_id=7;category_id=8;category_id=9?limit=10;unit=km;distance=10" (4)
China :: 815 :: "/site/help" (8)
Germany :: 531 :: "/entry/20252" (16)
United Kingdom :: 245 :: "/region/659" (6)
Russia :: 189 :: "/site/help" (2)
France :: 186 :: "/entry/2299" (3)
Mexico :: 104 :: "/region/1" (2)
Canada :: 34 :: "/entry/3613" (2)
Israel :: 25 :: "/site/recent.atom?entries_only=1" (4)

Most Viewed US States:
States :: #Most View :: "The most viewed page" (#viewCount)
============================================================
Washington :: 1070 :: "/region/1" (12)
Virginia :: 439 :: "/entry/20253/reviews" (5)
California :: 191 :: "/location/most_recent_vendors.rss?location_id=5" (10)
New York :: 84 :: "/region/1" (3)
Illinois :: 82 :: "/region/1766" (4)
Texas :: 72 :: "/region/218" (6)
Ohio :: 70 :: "/entry/5023" (3)
Pennsylvania :: 70 :: "/region/599" (3)
Minnesota :: 60 :: "/region/13" (18)
Florida :: 51 :: "/api-explorer/" (4)

Summary:
Total valid IP processed: 9506

Unknown Country list: (total 2)
['193.202.255.201', '66.249.93.72']

Unknown states found: 487

Total execution time: 4.86 seconds.

Output 3:

Using only first 500 lines of access.log file:
ubuntu@ip-172-31-45-47:~/maxmind$ python3 parseGeoLiteCityDB.py access_500linesOnly.log
Most Viewed Country:
Country :: #Most View :: "The most viewed page" (#viewCount)
============================================================
United States :: 162 :: "/entry/5023" (3)
Netherlands :: 42 :: "/entry/near/0%2C0/filter?unit=mile;distance=25;sort_order=ASC;page=;order_by=distance;address=34034;limit=" (1)
China :: 21 :: "/entry/15205" (1)
Switzerland :: 6 :: "/entry/15603" (2)
Germany :: 4 :: "/region/60" (1)
France :: 3 :: "/entry/656" (1)
Canada :: 2 :: "/entry/2708" (1)
Israel :: 1 :: "/site/recent.atom?entries_only=1" (1)

Most Viewed US States:
States :: #Most View :: "The most viewed page" (#viewCount)
============================================================
Washington :: 41 :: "/entry/1817" (1)
Ohio :: 5 :: "/entry/5023" (3)
California :: 3 :: "/location/view.html?location_id=174&new_query=1" (1)
Texas :: 3 :: "/region/2" (2)
Arizona :: 2 :: "/site/help" (1)
Virginia :: 1 :: "/entry/18992" (1)

Summary:
Total valid IP processed: 242

Unknown Country list: (total 1)
['193.202.255.201']

Unknown states found: 33

Total execution time: 0.14 seconds.

User can mistakently give different inputs while running the program, I handled those situation in my code. Followings are the different case scenario:

  1. No input file given:
    ubuntu@ip-172-31-45-47:~/maxmind$ python3 parseGeoLiteCityDB.py
    Please provide ONLY the 'access.log' file as the first argument.

  2. More than 1 input file given:
    ubuntu@ip-172-31-45-47:~/maxmind$ python3 parseGeoLiteCityDB.py access_500linesOnly.log GeoLite2-City.mmdb
    Please provide ONLY the 'access.log' file as the first argument.

  3. Typo while executing the command (spelling mistake of the access.log file):
    ubuntu@ip-172-31-45-47:~/maxmind$ python3 parseGeoLiteCityDB.py access_500linesOnly.log GeoLite2-City.mmdbasdfasdf Please provide ONLY the 'access.log' file as the first argument.

Dependencies

I imported the following modules:

import re
import sys
import time
import os.path
import webbrowser
import geoip2.database

Generally, all the above modules come with the installing of Python3 and geoip2

References:
[1] https://www.makeuseof.com/install-python-ubuntu/
[2] https://dev.maxmind.com/geoip/geolite2-free-geolocation-data?lang=en

geolite2citydb_summary's People

Contributors

safiur-mahdi avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.