Git Product home page Git Product logo

domain_stats's Introduction

#domain_stats.py #Version 1.0 #Written by Mark Baggett @markbaggett #Under direction of Justin Henderson @securitymapper #Thanks to Justin Henderson for being the complete creative force behind th program. He said what he wanted. I wrote it.

domain_stats.py is a web API to deliver domain information from whois and alexa. Some security enterprise management systems are capable of querying web APIs for additional information. This API provides easy access to that information for those systems.

The API is simple. Once the server is started you can query either the Alexa ranking of a domain:

http://:/alexa/

Example: student@SEC573:~$ curl http://127.0.0.1:8000/alexa/sans.org 25646

This tells us that SANS.ORG is the 25646th most popular domain on the internet. So it probably isn't a phishing site. NOTE: The alexa option is only available with the alexa database is provided as a command line option. You can download a copy of the data at this url http://s3.amazonaws.com/alexa-static/top-1m.csv.zip

You can also query whois domain information for a domain. This query will return the entire whois record for sans.org

student@SEC573:~$ curl http://127.0.0.1:8000/domain/sans.org

Alternatively you can query individual entries in the whois record by including field names in the path. For example:

student@SEC573:~$ curl http://127.0.0.1:8000/domain/creation_date/sans.org 1995-08-04 04:00:00;

The fields that can be queried include: updated_date, status, name, dnssec, city, expiration_date, zipcode, domain_name, country, whois_server, state, registrar, referral_url, address, name_servers, org, creation_date, emails and alexa

You can query more than one field by simply listing the additional fields in the path. The domain is always the last entry in the path. For example:

student@SEC573:~$ curl http://127.0.0.1:8000/domain/creation_date/state/zipcode/city/sans.org 1995-08-04 04:00:00; MD; 20814; Bethesda;

Some fields such as name_servers or updated_date may contain multiple values. By default the server will only return one of those values (the last one in the list). If you would like all of the values you can place an asterisk after the field name. Consider these to examples:

student@SEC573:~$ curl http://127.0.0.1:8000/domain/name_servers/google.com ns4.google.com;

student@SEC573:~$ curl http://127.0.0.1:8000/domain/name_servers*/google.com [u'NS1.GOOGLE.COM', u'NS2.GOOGLE.COM', u'NS3.GOOGLE.COM', u'NS4.GOOGLE.COM', u'ns3.google.com', u'ns1.google.com', u'ns2.google.com', u'ns4.google.com'];

The first query returns a single name server where the second returns all the name servers. This works the same for all the fields with multiple values. If the --all command line option is specified when the server is started it will always return all the fields. Now lets look at the command line options for the server.

student@SEC573:~$ python domain_stats.py --help usage: domain_stats.py [-h] [-ip ADDRESS] [-c CACHE_TIME] [-v] [-a ALEXA] [--all] [--preload PRELOAD] [--delay DELAY] [--garbage-cycle GARBAGE_CYCLE] port

positional arguments: port You must provide a TCP Port to bind to

optional arguments: -h, --help show this help message and exit -ip ADDRESS, --address ADDRESS IP Address for the server to listen on. Default is 127.0.0.1 -c CACHE_TIME, --cache-time CACHE_TIME Number of seconds to hold a whois record in the cache. Default is 3600 (1 hour). Set to 0 to save forever. -v, --verbose Print verbose output to the server screen. -vv is more verbose. -a ALEXA, --alexa ALEXA Provide a local file path to an Alexa top-1m.csv --all Return all of the values in a field if multiples exist. By default it only returns the last value. --preload PRELOAD preload cache with this number of the top Alexa domain entries. Default 1000 --delay DELAY Delay between whois lookups while staging the initial cache. Default is 0.1 --garbage-cycle GARBAGE_CYCLE Delete entries in cache older than --cache-time at this iterval (seconds). Default is 86400

Most of these arguments are optional. The only thing you MUST specify is which port you want it to listen on. The other options you probably want to use are --alexa (or -a), --delay and --preload. --alexa is followed by the path to the alexa top 1 million csv file discussed earlier. --preload is followed by an integer that says how many of those alexa domains you want to automatically do whois looks for to store in the cache on the server. --delay can be used to cause control the delay between whois queries when preloading the servers cache. Most of the other options control that cache. By default when an domain is queried from a whois server it is cached for 1 hour. You can change this with --cache-time. --cache-time is followed by the number of seconds to store an entry in the cache. 86400 seconds is one day. Setting the --cache-time to 0 will keep entries in memory forever (or until all memory is consumed and the server crashes) The --garbage-cycle option specifies how long to delay between cycles of cleaning up the cached entries older than --cache-time.

Here is an example of starting the server on port 8000 and only loading the top 100 most common alexa entries

student@SEC573:~$ python domain_stats.py --preload 100 -a ~/Downloads/top-1m.csv 8000

domain_stats's People

Contributors

markbaggett avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.