Git Product home page Git Product logo

programming-language-subreddits-and-their-choice-of-words's Introduction

Programming language subreddits and their choice of words

While reading about various programming languages, I developed a hunch about how often different languages are mentioned by other communities and about the average conversational tones used by relative members.

To examine if it was just selective perception on my site, an unconscious confirmation of stereotypes, or a valid observation I collected and analysed some data, i.e. all comments (about 300k) written to submissions (about 40k) in respective programming language subreddits from 2013-08 to 2014-07 using PRAW and SQLite.

In this article I will present some selected results. (If you want you can also download the code I wrote/used as well as the raw data generated by it.)

Mutual mentions

The following chord graph (click it for an interactive version) shows how often a programming language is mentioned in communities (subreddits) not belonging to them:

(mutual mentions)

(The size of a language is set by how often the others talk about it in sum. One connection represents the mutual mentions of two communities. The widths on each end is determined by the relative frequency of the mentionee being referenced by the respective other community. So PHP talks more about SQL than SQL talks about PHP. The labels of some smaller communities might be missing in the graph due to some opaque d3.js behavior ¯\_(ツ)_/¯.)

The "big" languages are the ones most talked about, yawn.

Sure, measuring programming language popularity accurately is nearly impossible, but if we still simply take some values from TIOBE it gets interesting, because one can see how much is talked about a language relatively to how much it is supposedly used.

mentions relative to tiobe

Here was the first time I said "Ha! I knew it!".

haskell tweet

(No Haskell bash intended. I love it and its little web cousin Elm and use them for projects and also write articles about it.)

Word usage

If we now divide the number of comments in a subreddit containing a chosen word by the overall subreddit comment count (and multiply by 10000 to have a nice integer value), we get more ... well, diagrams. But most results like the obsession with abstract concepts by the Haskell people and the consideration of hardware issues by people using C and C++ are not that surprising.

abstract concepts

hardware

Cursing

This part here is quite comforting, because a conjecture many of us probably have is confirmed.

cursing

Happiness

To finish with something positive: The lispy guys seem to be the most cheerful people.

happy

But what is up with the Visual Basic community? They are neither angry nor happy. They just ... are? :)

Other subjects

On editgym.com/subreddits-and-their-choice-of-words you can find more analyses of this kind applied to different topics/subreddits like gaming, music, sports, operating systems, etc.

Disclaimer

As you probably already noticed, this is not hard science. It was just a small fun project and contains several possibilities for errors. I tried to only choose big communities and frequent words so that there is at least a bit of statistical significance. (btw If you remove this constraint Elm is the most happy and coolest language. ^_-) But potential errors in my parser and interpretation (e.g. no taking negations into account etc.) are not to exclude fully as well. ;)

Also, positive correlation (e.g cursing <-> PHP) does not imply one causing the other. But if somebody wants to repeat this experiment to confirm/refute the results with more fancy tools like nltk or something, I would be happy if you could drop me an email.

programming-language-subreddits-and-their-choice-of-words's People

Contributors

dobiasd avatar kevinji avatar trevors avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

programming-language-subreddits-and-their-choice-of-words's Issues

A request for more details...

Could you create a new graph that show who is liking/hating on who? I'd be interested in which community is the most tolerant.

Conjecture is not actually confirmed.

This part here is quite comforting, because a conjecture many of us probably have is confirmed.

There's a false assumption in this statement, in that usage of swear words is somehow indicative of the quality of the language or professionalism of its users.

  1. Programmers tend to swear more in public spaces than the general population, particularly in environments where discussion of real world problems and code are prevalent. See the Linux mailing list for a prominent example.
  2. /r/php is heavily weighted towards discussions of real world usage and troubleshooting despite the constant reminders that it is not a support subreddit, as well as discussions regarding future directions of the language itself and an innumerable number of frameworks and design patterns that are constantly being compared.
  3. The conjecture linked to is itself extensively flawed, where the author misunderstands a number of implementation details, constantly sets up straw men arguments, and overlooks or downplays a number of language strengths in favor of a clearly biased viewpoint misconstrued as being fair-minded.

Normalize keyword representation

Hey, the graphs look great but a suggestion to increase readability and interpretation is to normalize the population graphs by % rather than count. This will show representation of the keyword over the community size since the graph over-visualizes the population size. Basically, divide the count of a single term with how many total terms there are: each bar will then be proportional to each other and we will be able to see whether one community, in general, uses one word more than another. This can make the graph easier to digest since not all communities are equal in size on reddit.

Another representation graph can show the "versus" of positive words and negative words, to demonstrate which community uses this more than the other.

Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.