rumgroup / brexit_data_challenge Goto Github PK
View Code? Open in Web Editor NEWR and Python codes used in the analysis for the shortlisted R Uni of Manchester & MMU R User Group submission to the CDRC GISRUK 2018 Brexit Data Challenge
R and Python codes used in the analysis for the shortlisted R Uni of Manchester & MMU R User Group submission to the CDRC GISRUK 2018 Brexit Data Challenge
The Economist seem to use Local Authority Districts as their unit of analysis, which is going to fall into the Modifiable Areal Unit Problem discussion. There are smaller geographic units available (nested within Districts) that are likely to have between-unit variance which is masked when aggregated to the District level.
But to demonstrate this, we need data at a geographic scale which is (1) smaller and (2) nested within districts. The ward-level referendum data provided is great but it's missing a lot, so we'd have to use a subset? Lots of other interesting data (including the ethnicity/citizenship data provided) is at LSOA, which is geographically nested within other units for comparison, including districts, I think.
Maybe this could be a pre-analysis demonstration to justify what we do (or a discussion point) rather than the main topic.
Apologies to anyone who has had issues with Covariates.rda, we had a problem with null value rows. Please download the most recent version from the dropbox (csv version also now added), and note that some variables are only available for London.
File 2 at https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015 gives a breakdown of the 2005 indices of multiple deprivation for each LSOA, including healthcare provision, employment, crime and access to social housing. These will be good variables to include in our models. Have a go at adding this onto your table using the merge() command described at https://www.statmethods.net/management/merging.html.
Kathryn Simpson from University of Manchester recently proposed some great free data resources for analysing Brexit data:
http://blog.ukdataservice.ac.uk/making-sense-of-brexit-the-data-you-need-to-analyse/
Using additional resources should help our application to stand out from the rest.
These include:
The British Election Study
The British Social Attitudes Survey
Annual population survey
Census data
Could be interesting to explore not only about immigration, but also some level of civic engagement too. Mysociety.org have quite a bit of data around this: http://data.mysociety.org/datasets/?category=united-kingdom
For example, there is a data set on EU-related FOI requests (some summary analysis of this here: https://www.mysociety.org/2016/10/19/what-do-we-know-about-the-eu-referendum/)
Or could consider MPs declaraions of conflict of interest/sources of income, see if there is anything interesting there - but I am only fishing now ๐ ๐
Best regressions I've found so far for England and for the whole of the UK. Years picked for change_since are chosen by a for loop. Details in source, but it's pretty simple.
England benefits from IMD data slightly, but also just from excluding the other areas:
# ipython -i brellenge_prep.py
# bestEng = smf.ols('Pct_Remain ~ Q("White British") + Q("White Other") + Asian + Black + Other + y2015_WBR + IMD', data=change_since(2016, 2001).join(metadata))
# bestUK = smf.ols('Pct_Remain ~ Q("White British") + Q("White Other") + Asian + Black + Other + y2015_WBR', data=change_since(2017, 2003).join(metadata))
In [1]: bestEng.fit().summary2()
Out[1]:
<class 'statsmodels.iolib.summary2.Summary'>
"""
Results: Ordinary least squares
======================================================================
Model: OLS Adj. R-squared: 0.624
Dependent Variable: Pct_Remain AIC: 2109.8669
Date: 2018-03-04 15:19 BIC: 2140.1375
No. Observations: 325 Log-Likelihood: -1046.9
Df Model: 7 F-statistic: 77.81
Df Residuals: 317 Prob (F-statistic): 4.59e-65
R-squared: 0.632 Scale: 37.702
----------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
----------------------------------------------------------------------
Intercept 123.1056 14.3254 8.5935 0.0000 94.9207 151.2905
Q("White British") -65.0829 19.6060 -3.3195 0.0010 -103.6572 -26.5086
Q("White Other") 9.7304 4.2564 2.2861 0.0229 1.3561 18.1046
Asian -9.4880 2.8771 -3.2977 0.0011 -15.1487 -3.8273
Black 46.3033 4.4566 10.3899 0.0000 37.5351 55.0715
Other -0.8730 4.9839 -0.1752 0.8611 -10.6786 8.9326
y2015_WBR -48.5744 18.5010 -2.6255 0.0091 -84.9746 -12.1742
IMD 1.4646 0.2204 6.6449 0.0000 1.0310 1.8983
----------------------------------------------------------------------
Omnibus: 12.016 Durbin-Watson: 1.845
Prob(Omnibus): 0.002 Jarque-Bera (JB): 12.578
Skew: 0.418 Prob(JB): 0.002
Kurtosis: 3.478 Condition No.: 572
======================================================================
"""
In [2]: bestUK.fit().summary2()
Out[2]:
<class 'statsmodels.iolib.summary2.Summary'>
"""
Results: Ordinary least squares
=====================================================================
Model: OLS Adj. R-squared: 0.547
Dependent Variable: Pct_Remain AIC: 2300.3152
Date: 2018-03-04 15:19 BIC: 2327.2605
No. Observations: 347 Log-Likelihood: -1143.2
Df Model: 6 F-statistic: 70.53
Df Residuals: 340 Prob (F-statistic): 9.16e-57
R-squared: 0.555 Scale: 43.437
---------------------------------------------------------------------
Coef. Std.Err. t P>|t| [0.025 0.975]
---------------------------------------------------------------------
Intercept 119.1428 13.0029 9.1628 0.0000 93.5666 144.7191
Q("White British") -55.5533 20.2003 -2.7501 0.0063 -95.2866 -15.8199
Q("White Other") 30.5484 3.9682 7.6984 0.0000 22.7432 38.3537
Asian -9.0141 2.6544 -3.3959 0.0008 -14.2351 -3.7930
Black 24.8085 3.5454 6.9974 0.0000 17.8348 31.7822
Other 4.5733 4.9236 0.9289 0.3536 -5.1112 14.2578
y2015_WBR -35.2234 15.8739 -2.2190 0.0271 -66.4467 -4.0000
---------------------------------------------------------------------
Omnibus: 3.247 Durbin-Watson: 1.823
Prob(Omnibus): 0.197 Jarque-Bera (JB): 2.962
Skew: 0.207 Prob(JB): 0.227
Kurtosis: 3.181 Condition No.: 148
=====================================================================
"""
So I started writing some bits about the rate of change variables (see google doc), and how choosing the range is important and might have an effect on conclusions. Just to clarify we have the 3 outcomes (right?):
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.