rumgroup / brexit_data_challenge Goto Github PK

R and Python codes used in the analysis for the shortlisted R Uni of Manchester & MMU R User Group submission to the CDRC GISRUK 2018 Brexit Data Challenge

R 60.35% Python 39.65%

gisruk brexit mmu manchester r

brexit_data_challenge's People

Contributors

Stargazers

Watchers

Forkers

heatherrobinson

brexit_data_challenge's Issues

Unit of analysis

The Economist seem to use Local Authority Districts as their unit of analysis, which is going to fall into the Modifiable Areal Unit Problem discussion. There are smaller geographic units available (nested within Districts) that are likely to have between-unit variance which is masked when aggregated to the District level.

But to demonstrate this, we need data at a geographic scale which is (1) smaller and (2) nested within districts. The ward-level referendum data provided is great but it's missing a lot, so we'd have to use a subset? Lots of other interesting data (including the ethnicity/citizenship data provided) is at LSOA, which is geographically nested within other units for comparison, including districts, I think.

Maybe this could be a pre-analysis demonstration to justify what we do (or a discussion point) rather than the main topic.

Budget of the European Union 2017

https://data.europa.eu/euodp/data/dataset/final-budget-2017

Updates to Covariates.rda

Apologies to anyone who has had issues with Covariates.rda, we had a problem with null value rows. Please download the most recent version from the dropbox (csv version also now added), and note that some variables are only available for London.

Linking to additional area by area deprivation data

File 2 at https://www.gov.uk/government/statistics/english-indices-of-deprivation-2015 gives a breakdown of the 2005 indices of multiple deprivation for each LSOA, including healthcare provision, employment, crime and access to social housing. These will be good variables to include in our models. Have a go at adding this onto your table using the merge() command described at https://www.statmethods.net/management/merging.html.

Migration table

https://www.ons.gov.uk/peoplepopulationandcommunity/populationandmigration/migrationwithintheuk/datalist?filter=datasets

Citizen engagement/ civic tech

Could be interesting to explore not only about immigration, but also some level of civic engagement too. Mysociety.org have quite a bit of data around this: http://data.mysociety.org/datasets/?category=united-kingdom

For example, there is a data set on EU-related FOI requests (some summary analysis of this here: https://www.mysociety.org/2016/10/19/what-do-we-know-about-the-eu-referendum/)

Or could consider MPs declaraions of conflict of interest/sources of income, see if there is anything interesting there - but I am only fishing now 🐟 😆

Best regressions?

Best regressions I've found so far for England and for the whole of the UK. Years picked for change_since are chosen by a for loop. Details in source, but it's pretty simple.

England benefits from IMD data slightly, but also just from excluding the other areas:

# ipython -i brellenge_prep.py

# bestEng = smf.ols('Pct_Remain ~ Q("White British") + Q("White Other") + Asian + Black + Other + y2015_WBR + IMD', data=change_since(2016, 2001).join(metadata))
# bestUK = smf.ols('Pct_Remain ~ Q("White British") + Q("White Other") + Asian + Black + Other + y2015_WBR', data=change_since(2017, 2003).join(metadata))

In [1]: bestEng.fit().summary2()
Out[1]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""
                   Results: Ordinary least squares
======================================================================
Model:                OLS               Adj. R-squared:      0.624    
Dependent Variable:   Pct_Remain        AIC:                 2109.8669
Date:                 2018-03-04 15:19  BIC:                 2140.1375
No. Observations:     325               Log-Likelihood:      -1046.9  
Df Model:             7                 F-statistic:         77.81    
Df Residuals:         317               Prob (F-statistic):  4.59e-65 
R-squared:            0.632             Scale:               37.702   
----------------------------------------------------------------------
                    Coef.   Std.Err.    t    P>|t|    [0.025   0.975] 
----------------------------------------------------------------------
Intercept          123.1056  14.3254  8.5935 0.0000   94.9207 151.2905
Q("White British") -65.0829  19.6060 -3.3195 0.0010 -103.6572 -26.5086
Q("White Other")     9.7304   4.2564  2.2861 0.0229    1.3561  18.1046
Asian               -9.4880   2.8771 -3.2977 0.0011  -15.1487  -3.8273
Black               46.3033   4.4566 10.3899 0.0000   37.5351  55.0715
Other               -0.8730   4.9839 -0.1752 0.8611  -10.6786   8.9326
y2015_WBR          -48.5744  18.5010 -2.6255 0.0091  -84.9746 -12.1742
IMD                  1.4646   0.2204  6.6449 0.0000    1.0310   1.8983
----------------------------------------------------------------------
Omnibus:               12.016         Durbin-Watson:            1.845 
Prob(Omnibus):         0.002          Jarque-Bera (JB):         12.578
Skew:                  0.418          Prob(JB):                 0.002 
Kurtosis:              3.478          Condition No.:            572   
======================================================================

"""

In [2]: bestUK.fit().summary2()
Out[2]: 
<class 'statsmodels.iolib.summary2.Summary'>
"""
                   Results: Ordinary least squares
=====================================================================
Model:               OLS               Adj. R-squared:      0.547    
Dependent Variable:  Pct_Remain        AIC:                 2300.3152
Date:                2018-03-04 15:19  BIC:                 2327.2605
No. Observations:    347               Log-Likelihood:      -1143.2  
Df Model:            6                 F-statistic:         70.53    
Df Residuals:        340               Prob (F-statistic):  9.16e-57 
R-squared:           0.555             Scale:               43.437   
---------------------------------------------------------------------
                    Coef.   Std.Err.    t    P>|t|   [0.025   0.975] 
---------------------------------------------------------------------
Intercept          119.1428  13.0029  9.1628 0.0000  93.5666 144.7191
Q("White British") -55.5533  20.2003 -2.7501 0.0063 -95.2866 -15.8199
Q("White Other")    30.5484   3.9682  7.6984 0.0000  22.7432  38.3537
Asian               -9.0141   2.6544 -3.3959 0.0008 -14.2351  -3.7930
Black               24.8085   3.5454  6.9974 0.0000  17.8348  31.7822
Other                4.5733   4.9236  0.9289 0.3536  -5.1112  14.2578
y2015_WBR          -35.2234  15.8739 -2.2190 0.0271 -66.4467  -4.0000
---------------------------------------------------------------------
Omnibus:                3.247         Durbin-Watson:            1.823
Prob(Omnibus):          0.197         Jarque-Bera (JB):         2.962
Skew:                   0.207         Prob(JB):                 0.227
Kurtosis:               3.181         Condition No.:            148  
=====================================================================

"""

LSOA atlas containging "working age as a percentage of population" column over various years.

https://data.gov.uk/dataset/lsoa-atlas

lsoa.zip

Note: this only includes London data

rate of change variables - line gradient?

So I started writing some bits about the rate of change variables (see google doc), and how choosing the range is important and might have an effect on conclusions. Just to clarify we have the 3 outcomes (right?):

change over 10 years,
change over 5 years,
change over 3 years,
But do we also have a variable for the line gradients from Liz's plot? I think it would be neat to include that too.
I'm assigning everyone to this so you all get notifications :)

rumgroup / brexit_data_challenge Goto Github PK

brexit_data_challenge's People

Contributors

Stargazers

Watchers

Forkers

brexit_data_challenge's Issues

Unit of analysis

Budget of the European Union 2017

Updates to Covariates.rda

Linking to additional area by area deprivation data

Other useful datasets to link to

Migration table

Citizen engagement/ civic tech

Best regressions?

LSOA atlas containging "working age as a percentage of population" column over various years.

rate of change variables - line gradient?

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent