Predicting the age of Americans based on first names (as of the end of 2014).
I'm pretty sure I originally heard about this technique a few years back at a data conference (either Strata or PyData). I couldn't find the specific talk however I was able to find this paper which mentions similar techniques and this article from fivethirtyeight which itself references this blog article. Anyway, my point is, I didn't invent this idea, I just wanted to see if I could reproduce the results.
I decided to try the exact approach (and dataset) from the fivethirtyeight.
I grabbed the baby name data from the Social Security Administration which contains the first names of every baby born, "with at least 5 occurrences," since 1880.
To properly predict how many people are alive today with any given name, we also need actuarial tables (more plesently referred to as "Life Tables" by the SSA) to predict how many deaths have occured.
My results are almost exactly identical to the results from fivethirtyeight. I attribute the slight differences I'm seeing in my results to the fact that I'm predicting out to January 1st, 2014 whereas fivethirtyeight predicted out to January 1st, 2013. It's possible our techniques varied but I can't really tell since fivethirtyeight does not show their work.
Note that the order of the X axis is reversed in my graphs