R and python usages
This repo is an attempt to use data to explore the claims in Python Displacing R As The Programming Language For Data Science and The homogenization of scientific computing, or why Python is steadily eating other languages’ lunch.
The individual files contain the R code that I used to gather data from each source, and the results are summarised below. I've made no attempt to separate python for data analysis from other uses of python, but hopefully the signals are still indicative. If you think my methodology is wrong, or you have other ideas for data sets, please send a pull request and I'll merge it in.
Stackoverflow questions
Using the stackexchange data explorer, I calculated the number of questions asked by month for both python and R. Overall, both R and python questions are growing explosively over time:
A little further exploration (not shown) indicates that this is very close to being exponential growth.
If we standardise the number of R questions by the number of python questions, we see that the number of R questions is increasing more rapidly than python. Currently, about 1 question about R is asked for every three questions asked about python.
Github repos
Again we see exponential growth in both repos containing R code and repos containing python code (these number don't include forks), but R repo's are relatively less common than R questions. The big jump in repo creation in 2014 is probably due the JHU coursera course.
If we standardise the number of R repos by the number of python repos, we see that R has been decreasing since the big jump in 2015.
Google trends
Looking at google trends data for people searching for language tutorials, both languages are relatively flat. Growth in searches for R tutorials is relatively flat, perhaps with a slight increases, while growth for python searches has been considerably more variable over time.
Some Python Data (but not much)
This is the data of monthly downloads made available from the Python PyPi Package Index. The plot shows the growth in several data analysis packages for Python. Somethig happens in March, 2013 when the growth explodes.
Other ideas
- Look at use of mailing lists. Is there a pydata specific mailing list?
- Compare twitter hashtags: rstats, python, pydata?
- Compare package downloads?
- Number of Kaggle solution scripts written in R versus Python.
- Number of Machine Learning courses on MOOC sites that use R versus Python.
- Compare attendees at big R versus big Python data conferences year-over-year.