mrc-cso-sphsu / effect_estimates Goto Github PK

View Code? Open in Web Editor NEW

1.0 1.0 1.0 73 KB

License: GNU General Public License v3.0

R 100.00%

effect_estimates's Introduction

Description

A collection of R scripts to calculate the magnitude of the intervention effect.

effect_estimates's People

Contributors

Stargazers

Watchers

Forkers

andrewbaxter439

effect_estimates's Issues

ENH: update the script

Statistics needed for whole population and groups (by time and run) important to the study of inequalities (education, households type, gender, age group, income quintile)

mean: ghqcase, ghqlike, employment, hours worked, income, poverty
median: ghqlike, hours worked, income
count: ghq cases, employed, hours worked categories, poverty
iqr: ghqlike, hours worked, income
25th percentile: ghqlike, hours worked, income
75th percentile: ghqlike, hours worked, income
10th percentile: income
90th percentile: income
90/10 ratio: income

We have 6 outcomes, some of which must be constructed from the LABSim output:

mean of continuous GHQ12 score (variable dhm)
prevalence (mean of a dummy) of GHQ12 caseness (dummy=1 if dhm <= 24)
prevalence (mean of a dummy) of employment (dummy=1 if les_c4 == "EmployedOrSelfEmployed")
mean hours worked (convert laboursupplyweekly to numerical)
mean household income (variable equivaliseddisposableincomeyearl)
prevalence (mean of a dummy) of poverty (dummy =1 if atriskofpoverty == 1 || atriskofpoverty == null)

We construct 13 groups, and also include results for the whole population:

male (dgn == "Male")
female (dgn == "Female")
age 25-44 (dag >= 25 && dag < 45)
age 45-64 (dag >= 45 && dag < 65)
household with children (one of n_children_1-17 != 0)
household without children (all of n_children_1-17 == 0 or missing)
employed (les_c4 == "EmployedOrSelfEmployed")
unemployed (les_c4 == "NotEmployed")
in-work poverty (grp_emp == 1 && out_poverty == 1)
out-of-work poverty (grp_emp == 0 && out_poverty == 1)
Low education (deh_c3 == "Low")
medium education (deh_c3 == "Medium")
high education (deh_c3 == "High")

We want:

aggregate mean outcomes for the baseline and reform, in each year, and year-run pairs.
Effect estimates (difference in aggregate outcomes between the baseline and reform), again for each year, and each year-run pair.
Ranking of outcomes and effect estimates, again for each year, and each year-run pair.

Table structure:

scenario	run	time	grp_all	grp_male	grp_female	grp_age25	grp_age45	grp_hchild	grp_nchild	grp_emp	grp_unemp	grp_povin	grp_povout	grp_edlow	grp_edmed	grp_edhi	out_ghq_base	out_ghq_ref	eff_ghq	rank_ghq_base	rank_ghq_ref	rank_eff_ghq	out_ghqcase_base	out_ghqcase_ref	eff_ghqcase	rank_ghqcase_base	rank_ghqcase_ref	rank_eff_ghqcase	out_emp_base	out_emp_ref	eff_emp	rank_emp_base	rank_emp_ref	rank_eff_emp	out_emphrs_base	out_emphrs_ref	eff_emphrs	rank_emphrs_base	rank_emphrs_ref	rank_eff_emphrs	out_income_base	out_income_ref	eff_income	rank_income_base	rank_income_ref	rank_eff_income	out_poverty_base	out_poverty_ref	eff_emp	rank_poverty_base	rank_poverty_ref	rank_eff_poverty
scenario number or description	the run number, missing for overall results combining all runs	this is the year	dummy variable (1 if the results relate to the whole population, zero otherwise)	dummy variable (1 if the results relate to the male population, zero otherwise)	dummy variable (1 if the results relate to the female population, zero otherwise)	dummy variable (1 if the results relate to the age 25-44 population, zero otherwise)	dummy variable (1 if the results relate to the age 45-64 population, zero otherwise)	dummy variable (1 if the results relate to households with children in population, zero otherwise)	dummy variable (1 if the results relate to households without children in population, zero otherwise)	dummy variable (1 if the results relate to the employed population, zero otherwise)	dummy variable (1 if the results relate to the unemployed population, zero otherwise)	dummy variable (1 if the results relate to the in-work poverty population, zero otherwise)	dummy variable (1 if the results relate to the out-of-work poverty population, zero otherwise)	dummy variable (1 if the results relate to the low education population, zero otherwise)	dummy variable (1 if the results relate to the medium education population, zero otherwise)	dummy variable (1 if the results relate to the high education population, zero otherwise)	mean of continuous GHQ12 score for baseline	mean of continuous GHQ12 score for the reform	effect of reform on continuous GHQ12 score for this run and year (out_ghq_base minus out_ghq_reform)	rank of out_ghq_base for group and year	rank of out_ghq_ref for group and year	rank of eff_ghq for group and year	mean of dummy GHQ12 caseness for baseline	mean of dummy GHQ12 caseness for the reform	effect of reform on dummy GHQ12 caseness for this run and year (out_ghqcase_base minus out_ghqcase_reform)	rank of out_ghqcase_base for group and year	rank of out_ghqcase_ref for group and year	rank of eff_ghqcase for group and year	mean of employment dummy for baseline	mean of employment dummy for the reform	effect of reform on employment dummy for this run and year (out_emp_base minus out_emp_reform)	rank of out_emp_base for group and year	rank of out_emp_ref for group and year	rank of eff_emp for group and year	mean of employment hours for baseline	mean of employment hours for the reform	effect of reform on employment hours for this run and year (out_emp_base minus out_emp_reform)	rank of out_emphrs_base for group and year	rank of out_emphrs_ref for group and year	rank of eff_emphrs for group and year	mean of income for baseline	mean of income for the reform	effect of reform on income for this run and year (out_income_base minus out_income_reform)	rank of out_income_base for group and year	rank of out_income_ref for group and year	rank of eff_income for group and year	mean of poverty dummy for baseline	mean of poverty dummy for the reform	effect of reform on poverty dummy for this run and year (out_poverty_base minus out_poverty_reform)	rank of out_poverty_base for group and year	rank of out_poverty_ref for group and year	rank of eff_poverty for group and year

Contribution guide

Write a document outlining general rules for contributions.

This should make any code reviews much easier.

Creating graphs with confidence intervals - initial ideas

As a quick initial run of creating graphs, here's the running of the code currently in R/outputting graphs.R:

library(readr)
library(tidyverse)
library(SPHSUgraphs)

out_data <-
  read_csv("C:/Programming/covid19_effect_estimates/data/new_data.csv",
           show_col_types = FALSE)


# tidying dataset ---------------------------------------------------------


compare_results <- out_data |>
  filter(grp_all == TRUE, !is.na(run)) |>
  select(-contains("eff"), -starts_with("grp")) |>
  pivot_longer(
    -c(scenario, run, time),
    names_to = c("metric", "outcome", "policy"),
    values_to = "val",
    names_pattern = "(.*)_(.*)_(baseline|reform)"
  ) |>
  pivot_wider(
    c(scenario, run, time, outcome, policy),
    names_from = metric,
    values_from = val
  )


# faceted graph -----------------------------------------------------------

compare_results |>
  ggplot(aes(time, out, colour = policy, fill = policy)) +
  geom_vline(xintercept = 2019, colour = "red") +
  stat_summary(
    fun.data = mean_se,
    geom = "ribbon",
    alpha = 0.5,
    colour = NA
  ) +
  stat_summary(fun.data = mean_se, geom = "line") +
  stat_summary(fun.data = mean_se, geom = "point") +
  facet_wrap(~ outcome, scales = "free_y") +
  scale_fill_manual(
    "Policy",
    aesthetics = c("fill", "colour"),
    labels = c("Baseline", "Covid policy"),
    values = sphsu_cols("University Blue", "Thistle", names = FALSE)
  ) +
  theme(legend.position = "bottom")

As a small initial point - these outputs currently have a very small range (accidently put just as standard error of means across 50 runs in file). Should intervals combine the sd's of the means of each run, rather than taking the variance between the mean outputs?

Suggested edits so far:

Output separate graphs
Update ribbons to 95% CIs (rather than default se)
Correct variable names and titles
Standardise colour and display

^{Created on 2022-07-15 by the reprex package (v2.0.1)}

Data validation

This code needs some data validation.

@dkopasker I'd like you to describe here what you expect from every variable in the raw data files, that includes their range, possible NA or NaN, etc. In addition, we need to clearly state how the code processes such values. Common options include dropping such entries, asking the aggregate functions to ignore them, or replacing with some imputed values (mean of some sort, median).
This approach should make the data analysis much more reproducible.

We should also consider LABsim output as potentially corrupted as the code itself is not tested properly. Constant changes in the code do not help here either. That means this script must notify every user in the case any input value is out of expected range.

Unusual number of ties

Hi @vkhodygo,

The R code used to aggregate the results from 1,000 runs of the simulation has an unusual amount of ties up to the eighth decimal place. For example, two observations for out_ghqcase_baseline in grp_age25 in 2020 have a value of 0.54180604. This happens multiple times across various outcomes, groups, and years. Could you please review the code to ensure there is not an error?

Code refactoring/docs/tests

Reduce the size
Introduce more general functions, no hardcoded names
Add docstrings
Add tests

Recommend Projects

React

A declarative, efficient, and flexible JavaScript library for building user interfaces.
Vue.js

🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
Typescript

TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
TensorFlow

An Open Source Machine Learning Framework for Everyone
Django

The Web framework for perfectionists with deadlines.
Laravel

A PHP framework for web artisans
D3

Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

javascript

JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
web

Some thing interesting about web. New door for the world.
server

A server is a program made to process requests and deliver data to clients.
Machine learning

Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Visualization

Some thing interesting about visualization, use data art
Game

Some thing interesting about game, make everyone happy.

Recommend Org

Facebook

We are working to build community through open source technology. NB: members must have two-factor auth.
Microsoft

Open source projects and samples from Microsoft.
Google

Google ❤️ Open Source for everyone.
Alibaba

Alibaba Open Source for everyone
D3

Data-Driven Documents codes.
Tencent

China tencent open source team.