Git Product home page Git Product logo

fpdetective's Introduction

fpdetective

A framework for conducting large scale web privacy studies.

Installation

git clone https://github.com/fpdetective/fpdetective.git
cd fpdetective

Then follow instructions for setting up VM to run FPDetective in a virtual machine

Get Started

Command line parameters

Below we give a description of the parameters that are passed to the agents.py module.

  • --index_url: path to the file containing the list of URLs to crawl
  • --stop: index of the url_file where the crawl will stop
  • --start (optional): index of the url_file where the crawl will start
  • --type: the agent can be:
    • lazy: uses phantomjs and visits homepages
    • clicker: uses phantomjs and clicks a number of links
    • chrome_lazy: uses chrome and visits homepages
    • chrome_clicker: uses chromium and clicks a number of links
    • dnt: visits homepages with a DNT header set to 1
    • screenshot: visits homepages and takes a screenshot
  • --max_proc: maximum number of processes that will run in parallel
  • --fc_debug: boolean to set the system environment variable that logs the OS font requests

How to launch a simple crawl

You can use following command to crawl the homepages of Alexa top 100 sites with 10 browsers running in parallel:

  • Change to the FPDetective source directory: (~/fpbase/src/crawler) and run the command:
python agents.py --url_file ~/fpbase/run/top-1m.csv --stop 100 --type lazy --max_proc 10

Once the crawl is finished, you can check the log in run/logs/latest or connect to the DB using Phpmyadmin (the password for the root user is: fpdetective).

Patches for Chromium & PhantomJS browser

You can use following patches to build modified Chromium and PhantomJS browsers from source. Please consult the instructions for further explanation.

fpdetective's People

Contributors

gunesacar avatar mjuarezm avatar

Watchers

James Cloos avatar RG avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.