Git Product home page Git Product logo

site-lab's Introduction

Site Lab

Site Lab aims to be an open-source replacement for website analysis tools such as BuiltWith, NerdyData, and DataNyze.

Site Lab is a Ruby on Rails application. It uses PostgreSQL as its database and Redis + Sidekiq for background processing.

How Does it Work?

Right now, it's fairly simple:

  • The MetaInspector Gem retrieves some basic info about the site/URL
  • There is a "Technology" model which stores regular expressions
  • Technologies are matched against the source of the sites/URLs
  • Much of the processing now happens in the background (via Sidekiq)

More complex analysis is in the works.

Installation

It's a Rails 4.1 app, so you'll need a dev environment that supports that (prolly RVM). You'll also need Redis installed and running (probably via Homebrew)

  • Clone the repo
  • Edit the database.yml file with your info
  • Run bundle install to install gems
  • Run bundle exec rake db:create to create the DB(s)
  • Run bundle exec rake db:seed to load the seed data
  • Run foreman start -p 3000 to start the rails server & sidekiq locally on port 3000

Importing Data

While you can surely add sites/URLs one-by-one in the app, most use-cases will involve importing large sets of URLs from files or external sites. With that in mind, I've started a set of Rake tasks for importing URLs. Currently, it includes:

  • Importing all startups from AngelList for a given market
  • Importing all startup/product URLs listed on Producthunt
  • Importing URLs from a text file (placed in app/import)
  • Importing all startup URLs from VCDelta

Run a rake -T to see the tasks and required parameters. There is also a sample text file in app/import.

Screenshot

Screenshot

site-lab's People

Contributors

callmeed avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.