Git Product home page Git Product logo

mateys-ahoy's People

Contributors

missaugustina avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

mateys-ahoy's Issues

Research how open source foundations are defining community participation metrics

TL;DR: How do others currently define individual participation metrics?

This is part of a larger effort to understand how individuals are contributing to open source projects. Ultimately we need to quantify this activity. Open source projects are often run by non-profit open source foundations that need to justify their existence in order to procure funding. One thing many open source foundations care about is tracking contributor activity. This might include looking at the rate of people joining the community (new contributors) and the rate of people leaving the community, as well as how long someone has been involved in the community. They might also provide some way of ranking or rating the contributors. The goal of this issue is to discover how open source communities are currently quantifying contributor activity.

For each article you look at

  • Create a wiki page to capture research summaries/ideas/etc for this ticket (https://github.com/BonnyCI/mateys-ahoy/wiki -> New Page)
  • For each article you read, put a bullet point/link in the wiki
  • some kind of summary of what you thought the main point (relevant to us) (if it's worth it)
  • ideas you maybe got
  • ways you might improve on what they suggested (if you have any)
  • questions - "what are they even talking about?" "why is this important to them?" "what question are they really trying to answer here?" with links to any relevant information that inspired

Some suggestions:

Propose contributor participation categories

Rather than just considering "top" or "most active" develop categories of contributors based on their activity profiles.

One idea: take random samples of contributors to the project in a small enough size for manual analysis (n=10), manually identify which ones appear to be the most "valuable" to a given project and then come up with a list of parameters that defines their "value". Then we can do further analysis using the parameters that are most easily available given the data we have to see if we can find any obvious correlations that could help us build event data queries.

What types of event activity are most strongly correlated with higher valued contributors (if any)? We could iterate on this to incorporate the event payload field (which has to be parsed). The goal here would be to maximize the probability of identifying key contributors while attempting to minimize the amount of "crud" we have to analyze to find them.

Manually build a social networking profile for a small sample of contributors

Initially use a known set, contributors that have already been identified and see where they have profiles and what identifying information is available. Once that's done, take a small sample of unknowns and try to identify them as well. Ideally this should be automated even if just in one-off scripts to document the paths taken.

Twitter Bot or Not

More interesting than Issue #18 would be a way to determine if Twitter accounts are bots and to perform demographic analysis on the Twitter bots themselves to see how they are evolving.

Find Existing "Innovation Trackers"

This issue is to collect information about existing innovation trackers. What metrics are they tracking and how are they collecting data?

Areas of Expertise demographics

In addition to considering what companies are represented, look for areas of expertise (contributors can have more than one). How does this correlate with other "innovation" factors?

Questions:

  • What fields of expertise are represented for each project among top contributors?
  • What is the typical diversity of expertise per top contributor?
  • Is there any correlation between projects identified as highly innovative/active and expertise diversity?

Automate web searching by name and email address

Explore options for automatically building profiles.

  • DuckDuckGo API search (try different combinations of terms, need a way to rank match likelihood)
  • LinkedIn API (apply to do email searches)
  • Explore Social Network services

What companies are engaged the most in R code contributions?

  • R core language source code analysis will not yield results (uses SVN, small group of contributors)
  • Commit history on packages will yield better results, see other issues in this milestone for CRAN analysis
  • also ROpenSci and Bioconductor

Commit Authors Lookup

Develop a better method for identifying who a commit author is. Keep track of their email addresses for given time frames and determine a way to "rank" them. If there is a choice between a hosted email address and a company one for the same time period, the company one gets priority.

author -> min_date,max_date -> authoritative email_address

Given an email address + date, find the author
email_address + date -> author

Use DuckDuckGo for Exploration

This came from Issue #9 which was originally going to just use DuckDuckGo. On further consideration, it makes more sense to write a script that takes a list of inputs and a query argument (or an argument indicating some pre-determined query pattern).

This would just provide some insight on a) what combinations of search terms yield the best results and b) what our match rate is.

These results would need be manually analyzed until a better method was determined. Right now other than manually searching one at a time, it's hard to get a sense of what proportion of the contributors are able to be matched.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.