Git Product home page Git Product logo

crystal_ball's Introduction

CRYSTAL BALL

The purpose of this tool is to attempt to perform automated enumeration of an organization's subsidiaries using freely available information. This tool is NOT meant to provide a comprehensive analysis of subsidiaries but rather augment a manual analysis to help improve operational efficiency.

For more information about the methods and theoretical foundation of how this tool operates check out my blog post on the topic of performing enumeration of subsidiaries "The Art of Footprinting - Part I - Finding subsidiaries":

Setup and Configuration

Install the Python dependencies: pip3 install -r requirements.txt

You will also need to configure the tool prior to using it. There are several free API keys which you must obtain. These links will provide you with sufficient information to perform the setup:

  1. Clearbit API
  2. Crunchbase Open Data Map (ODM)
  3. Google Knowledge Graph API

Populate the config.sample.json file with the keys and then mv config.sample.json config.json to complete the setup.

Usage

There are two different tools provided in this repo:

DNB Extractor

  1. The first is an extraction script scripts/dnb_extractor.py which processes sitemap files from the Dun & Bradstreet website. This information can be used for offline analysis as shown my blog post linked above.

Example usage for this would simply be: python3 scripts/dnb_extractor.py. The process will take quite some time to run, the end result is a data dump totaling at ~6 GB. The results will be stored in the path dnb_indexes/ and from there you can perform offline analysis as needed.

DNB Extractor

CRYSTAL BALL

  1. The second is the crystal_ball.py script which performs the automated enumeration of subsidiaries provided with some basic information about the company you are trying to target.

You can read the command line help of the tool to see all of the avaliable options. The most important are shown here:

  • -c provides the tool with a canonical name of a company. This name is used in SEC filings data and can typically be somewhat loose. It does not have to
    be a perfect match.

  • -w provides the tool with a WikiData identifier which allows for the enumeration of subsidiaries using the WikiData knowledge graph. I'd strongly recommend using this option in addition to -c.

  • -i provides a supplemental input list of canonical subsidiary names which were acquired externally from this tool. These names will be used in an attempt to try and resolve the canonical names in domain names for the subsidiary organizations.

An example usage on the Walt Disney company would look like:

CB Usage

crystal_ball's People

Contributors

cramppet avatar

Stargazers

 avatar  avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.