Git Product home page Git Product logo

dapps-scraping's Introduction

DApps-Scraping

The main objective of this project is to study and analyze the quality of the decentralised applications available in some public repositories. This project scrapes the DApps websites and repositories such as State of the DApps and DAppRadar.

The extracted datasets are available in Zenodo: https://zenodo.org/record/3382127. A long-term observation dataset is tracked here: https://github.com/serviceprototypinglab/dapps-dataset

Installation

Use the package manager pip to install the required packages. To acquire the required DApps metrics from websites, we have used the package Selenium which along with a custom driver and a corresponding web browser will have to be set up correctly.

pip install -r requirements.txt

The used python package Selenium requires a chrome driver to be downloaded. Please downloaded following the below URL.

(https://chromedriver.chromium.org/downloads)

For Linux and Mac OS users, run the following script to download the chrome driver:

./download.sh 

Please make sure that the version of you chrome matches, otherwise update your chrome or install the chrome driver for your version; see downloadchrome.doc.

For Mac users, please make sure you have wget installed in your system, use the following command to install it:

brew install wget 

Once the driver is downloaded, please check the path of the chrome driver in the common script (common.py). If you have downloaded the chrome driver manually, please change the path specified in the codes to your own path. You don't have to change the path if you have used the script to download your driver.

Note: Google Chrome must also be present in the corresponding version. See the text file downloadchrome.doc for details. Use Chrome for debugging as well - in case the web pages change structure, type Ctrl+Shift+I and use the pointer tool with Ctrl+Shift+C to find out the new XPath expressions.

Usage

There is one Python script per DApps website, and a common script for typical crawling and pagination actions.

  • DappRadar.py: for dappradar.com
  • stateDapps.py: for www.stateofthedapps.com
  • dappcom.py: for dapp.com (-> dappcomall.py exists as workaround)

To run the script (shown with the example of DappRadar), use the following command:

python DappRadar.py

For testing purposes you can specify the number of pages you want to scrape. The command below crawls only three pages.

python DappRadar.py 3 

You may specify fractional pages; i.e., as there are typically 50 entries per page, specifying 0.1 will fetch metrics on 5 DApps.

The scraping time depends on the number of the pages, and it may take 1 to 2 hours to fully run the script. Once the extractions are done, the scripts will generate plots from the extracted data and automatically save them in a folder with the website name and date of the run.

You can customise the scraping by adding additional parameters: 'nosocial', 'noplot' This is again primarily of interest for testing.

Disclaimer

Be aware that web scraping is considered a bad practice. Please be advised that this was created for research and education purposes only. Ask us to share our crawling data rather than crawling on your own without strong reason.

dapps-scraping's People

Contributors

ilhamqasse avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.