Git Product home page Git Product logo

oss-github-benchmark's Introduction

Swiss Open Source Software Benchmark

The Swiss Open Source Software Benchmark (ossbenchmark.com) continuously crawls repositories from Swiss GitHub organizations, aggregates statistics and provides an updated ranking of institutions, repositories and contributors. GitHub organizations are grouped into institutions from government, research, private sector, and NGOs. Additional GitHub organizations can be added by following the contributing instructions.

OSS Benchmark is an initiative by the Institute for Public Sector Transformation at Bern University of Applied Sciences. The initiative is managed by Matthias Stürmer and implemented by Anina Hold and Alexandre Thomas from the Digital Sustainability Lab.

Now some technical details about OSS Benchmark:

The Crawler: How does it work?

We have two different services:

  • The DataService
  • The CrawlerService

The CrawlerService startsup each hour and makes 5000 calls to the github api, saving everything in timestamped files.

The DataService also starts each hour. It loads all the crawled files from the last hour and saves all the data to the database. So the saved data is around one hour old when it is saved.

There can be 3 different states for the CrawlerService: no data, partial data, full data.

No data state

Start: If there is no data in the database besides the todoInstitutions.

The Crawler just starts with the first Institution that it gets. When an organisation is finished, it gets a timestamp in the todoInstiution collection. When a whole instituion is finished it also gets a timestamp in the collection. The crawler skips all organisations and institutions which timestamps are younger than 7 Days.

Partial data state

Start: If there are already some crawled institutions and organisations

The crawler gets the next institution and/or organisation that was never crawled or which timestamps are older than 7 days. It may happen that 7 Days are not enougth to crawl all data, so there may be some re-crawls of already crawled repos before new ones are crawled.

Full data state

Start: All institutions and organisations were already crawled at least once.

The crawler will just update the data, starting with the oldest timestamp.

System Diagram

System Diagram

Database Stucture

System Diagram

Update Institutions (read also Contributing.md)

Once someone has updated the github_repos.md file and the pull request was merged, the new or updated insitution must be added to the database using the Input Mask for new Institutions. This Input Mask is protected.

oss-github-benchmark's People

Contributors

lionelsemion avatar codeonetwo avatar maemst avatar holdan-8 avatar hairmare avatar roblesjoel avatar saegi95 avatar inthemill avatar dominicschweizer avatar alexandre-dsl avatar dependabot[bot] avatar dominiwe avatar veracahrim avatar fforootd avatar brawer avatar karras avatar digisuslab avatar maennchen avatar unocelli avatar olibrian avatar sbaerlocher avatar peschee avatar rnckp avatar oliveregger avatar ponsfrilus avatar markustiede avatar mritzmann avatar icebeariscold avatar cimnine avatar arska avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.