This report is about the Modern Summer Olympics from 1956 to 2016. The aim is to investigate whether there is equal distribution and provide insight into the following questions:
- Does hosting at home increase chances of a medal?
- Does countries population/GDP impact chances?
- Is there a difference in physical characteristics for medallists?
- What is the difference between men/women?
- Is Olympics fair representation of talent?
- How has the athlete changed over time?
In addition to the report a presentation was given providing an overview.
- athlete_events.csv - 120 years of Olympic history - https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
- noc_regions.csv - 120 years of Olympic history - https://www.kaggle.com/heesoo37/120-years-of-olympic-history-athletes-and-results
- gdp.csv - World Bank - https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
- population.csv - World Bank - https://data.worldbank.org/indicator/NY.GDP.MKTP.CD
- host_cities.csv - Manually made from https://architectureofthegames.net/olympic-host-cities/\
Start with athlete_events.csv:
- Remove art competitions from sports
- Remove NAME and TEAM
- Make NOC consistent (SIN, RUS, TPE, CHN, GER, CZE, SRB)
- Add column COUNTRY by matching with NOC in noc_regions.csv
- Update host CITY to match common in host_cities.csv (Athens, Rome, Antwerp, Moscow, Turin, St Moritz)
- Add column HOST_NOC by matching NOC from host_cities.csv
- Add column BMI with weight(kg)/height^2(m)
- Add column with Boolean whether medallist
- Add column with GDP from gdp.csv
- Add column POPULATION from pop.csv
- Remove years 1896-1952
- Remove winter season
- Remove SEASON and GAMES columns