Git Product home page Git Product logo

gitsecure's Introduction

GitSecure

GitSecure is a web application which scrapes repositories from Github in order to detect API keys that have been accidently uploaded by repository owners. After GitSecure detects a private API key in the public domain it will alert the repository owner via email regarding their potential security vulnerability.

Service Runner

app.js is the file that is responsible for coordinating execution of all the services. basic-server.js is the file that is responsible for serving web pages. For this application to be fully functional 2 distinct node instances need to be ran simultaneously. There needs to be a node process which runs the web server. There also needs to be another node process which runs the Github API key detection services. The key distinction here is that both app.js and basic-server.js need to be ran simultaneously for the application to be fully functional.

Services

  1. Scrape Git repository meta data
  2. Download Git repositories based on meta data
  3. Parse Git repository content to detect security flaws

Scraping

The scraping service is responsible for downloading the most recently updated Github repository metadata using Github API. It then persists those repositories metadata to a MongoDB data store. After the scraping service is done downloading repository metadata to MongoDb the downloading service takes over and downloads the actual repositories associated with the metadata that the scraping service acquired.

The current query for this API call is:

https://api.github.com/search/repositories?q=pushed:>=' + dateString +
'&order=desc&per_page=100

Downloading

The downloading service is responsible for downloading github repositories whose metadata was retrieved by the scraping service. The downloading service pulls information from the metadata MongoDb collection. Once downloading service has the metadata collection, it gets the git_url property from each instance of metadata. It then uses the nodegit module to download the contents of the git_url from Github. Downloaded repositories are stored in the git_data directory. After the downloading service is finished downloading repositories to the git_data directory the parsing service becomes activated and API key detection begins.

Parsing

The parsing service initiates after the downloading service finishes acquiring repositories. When this occurs the parsing service pulls repositories from the database and uses bash and regex to scan for API keys.

As repositories are downloaded to the database they are attributed a processed property which is initialized to false. Once the parsing service pulls out a repository from the database, it immediately marks its processed property to true. This makes it so the parsing service will never process duplicate instances of any single document.

If the parsing service detects an API key it registers the violation in the hitdata MongoDb collection. Once all repositories are scanned garbage collection occurs via the fileSystem subservice and then the entire cycle of scraping, downloading, and parsing is recursively restarted.

Database

Rather than each service creating it's own database connection, each service shares a single connection. This single connection is established in app.js and is accessible to all services via GLOBAL.db

Currently, the services are architectured in such a way whereby MongoDB is the first thing to be initialized. All services are being passed into MongoDb's connection callback function. Basing our service architecture around callbacks is not ideal, but it is stable for our current data load and necessary to reach MVP.

Ideally, services would use EventEmitters instead of callbacks as a means of communicating with each other asynchronously. Using EventEmitters instead of callbacks would decouple services and allow for much greater code flexibility. Until MVP is reached however, the current service architecture will be utilized.

Technology Stack

gitsecure's People

Contributors

amzotti avatar joshuanewman10 avatar marcbalaban avatar mrblueblue avatar wettowelreactor avatar

Watchers

 avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.