Git Product home page Git Product logo

effect_estimates's Introduction

Description

A collection of R scripts to calculate the magnitude of the intervention effect.

effect_estimates's People

Contributors

andrewbaxter439 avatar vkhodygo avatar

Stargazers

 avatar

Watchers

 avatar

Forkers

andrewbaxter439

effect_estimates's Issues

ENH: update the script

Statistics needed for whole population and groups (by time and run) important to the study of inequalities (education, households type, gender, age group, income quintile)

  • mean: ghqcase, ghqlike, employment, hours worked, income, poverty
  • median: ghqlike, hours worked, income
  • count: ghq cases, employed, hours worked categories, poverty
  • iqr: ghqlike, hours worked, income
  • 25th percentile: ghqlike, hours worked, income
  • 75th percentile: ghqlike, hours worked, income
  • 10th percentile: income
  • 90th percentile: income
  • 90/10 ratio: income

Main tasks

We have 6 outcomes, some of which must be constructed from the LABSim output:

  1. mean of continuous GHQ12 score (variable dhm)
  2. prevalence (mean of a dummy) of GHQ12 caseness (dummy=1 if dhm <= 24)
  3. prevalence (mean of a dummy) of employment (dummy=1 if les_c4 == "EmployedOrSelfEmployed")
  4. mean hours worked (convert laboursupplyweekly to numerical)
  5. mean household income (variable equivaliseddisposableincomeyearl)
  6. prevalence (mean of a dummy) of poverty (dummy =1 if atriskofpoverty == 1 || atriskofpoverty == null)

We construct 13 groups, and also include results for the whole population:

  1. male (dgn == "Male")
  2. female (dgn == "Female")
  3. age 25-44 (dag >= 25 && dag < 45)
  4. age 45-64 (dag >= 45 && dag < 65)
  5. household with children (one of n_children_1-17 != 0)
  6. household without children (all of n_children_1-17 == 0 or missing)
  7. employed (les_c4 == "EmployedOrSelfEmployed")
  8. unemployed (les_c4 == "NotEmployed")
  9. in-work poverty (grp_emp == 1 && out_poverty == 1)
  10. out-of-work poverty (grp_emp == 0 && out_poverty == 1)
  11. Low education (deh_c3 == "Low")
  12. medium education (deh_c3 == "Medium")
  13. high education (deh_c3 == "High")

We want:

  1. aggregate mean outcomes for the baseline and reform, in each year, and year-run pairs.
  2. Effect estimates (difference in aggregate outcomes between the baseline and reform), again for each year, and each year-run pair.
  3. Ranking of outcomes and effect estimates, again for each year, and each year-run pair.

Table structure:

scenario run time grp_all grp_male grp_female grp_age25 grp_age45 grp_hchild grp_nchild grp_emp grp_unemp grp_povin grp_povout grp_edlow grp_edmed grp_edhi out_ghq_base out_ghq_ref eff_ghq rank_ghq_base rank_ghq_ref rank_eff_ghq out_ghqcase_base out_ghqcase_ref eff_ghqcase rank_ghqcase_base rank_ghqcase_ref rank_eff_ghqcase out_emp_base out_emp_ref eff_emp rank_emp_base rank_emp_ref rank_eff_emp out_emphrs_base out_emphrs_ref eff_emphrs rank_emphrs_base rank_emphrs_ref rank_eff_emphrs out_income_base out_income_ref eff_income rank_income_base rank_income_ref rank_eff_income out_poverty_base out_poverty_ref eff_emp rank_poverty_base rank_poverty_ref rank_eff_poverty
scenario number or description the run number, missing for overall results combining all runs this is the year dummy variable (1 if the results relate to the whole population, zero otherwise) dummy variable (1 if the results relate to the male population, zero otherwise) dummy variable (1 if the results relate to the female population, zero otherwise) dummy variable (1 if the results relate to the age 25-44 population, zero otherwise) dummy variable (1 if the results relate to the age 45-64 population, zero otherwise) dummy variable (1 if the results relate to households with children in population, zero otherwise) dummy variable (1 if the results relate to households without children in population, zero otherwise) dummy variable (1 if the results relate to the employed population, zero otherwise) dummy variable (1 if the results relate to the unemployed population, zero otherwise) dummy variable (1 if the results relate to the in-work poverty population, zero otherwise) dummy variable (1 if the results relate to the out-of-work poverty population, zero otherwise) dummy variable (1 if the results relate to the low education population, zero otherwise) dummy variable (1 if the results relate to the medium education population, zero otherwise) dummy variable (1 if the results relate to the high education population, zero otherwise) mean of continuous GHQ12 score for baseline mean of continuous GHQ12 score for the reform effect of reform on continuous GHQ12 score for this run and year (out_ghq_base minus out_ghq_reform) rank of out_ghq_base for group and year rank of out_ghq_ref for group and year rank of eff_ghq for group and year mean of dummy GHQ12 caseness for baseline mean of dummy GHQ12 caseness for the reform effect of reform on dummy GHQ12 caseness for this run and year (out_ghqcase_base minus out_ghqcase_reform) rank of out_ghqcase_base for group and year rank of out_ghqcase_ref for group and year rank of eff_ghqcase for group and year mean of employment dummy for baseline mean of employment dummy for the reform effect of reform on employment dummy for this run and year (out_emp_base minus out_emp_reform) rank of out_emp_base for group and year rank of out_emp_ref for group and year rank of eff_emp for group and year mean of employment hours for baseline mean of employment hours for the reform effect of reform on employment hours for this run and year (out_emp_base minus out_emp_reform) rank of out_emphrs_base for group and year rank of out_emphrs_ref for group and year rank of eff_emphrs for group and year mean of income for baseline mean of income for the reform effect of reform on income for this run and year (out_income_base minus out_income_reform) rank of out_income_base for group and year rank of out_income_ref for group and year rank of eff_income for group and year mean of poverty dummy for baseline mean of poverty dummy for the reform effect of reform on poverty dummy for this run and year (out_poverty_base minus out_poverty_reform) rank of out_poverty_base for group and year rank of out_poverty_ref for group and year rank of eff_poverty for group and year

Contribution guide

  • Write a document outlining general rules for contributions.

This should make any code reviews much easier.

Creating graphs with confidence intervals - initial ideas

As a quick initial run of creating graphs, here's the running of the code currently in R/outputting graphs.R:

library(readr)
library(tidyverse)
library(SPHSUgraphs)

out_data <-
  read_csv("C:/Programming/covid19_effect_estimates/data/new_data.csv",
           show_col_types = FALSE)


# tidying dataset ---------------------------------------------------------


compare_results <- out_data |>
  filter(grp_all == TRUE, !is.na(run)) |>
  select(-contains("eff"), -starts_with("grp")) |>
  pivot_longer(
    -c(scenario, run, time),
    names_to = c("metric", "outcome", "policy"),
    values_to = "val",
    names_pattern = "(.*)_(.*)_(baseline|reform)"
  ) |>
  pivot_wider(
    c(scenario, run, time, outcome, policy),
    names_from = metric,
    values_from = val
  )


# faceted graph -----------------------------------------------------------

compare_results |>
  ggplot(aes(time, out, colour = policy, fill = policy)) +
  geom_vline(xintercept = 2019, colour = "red") +
  stat_summary(
    fun.data = mean_se,
    geom = "ribbon",
    alpha = 0.5,
    colour = NA
  ) +
  stat_summary(fun.data = mean_se, geom = "line") +
  stat_summary(fun.data = mean_se, geom = "point") +
  facet_wrap(~ outcome, scales = "free_y") +
  scale_fill_manual(
    "Policy",
    aesthetics = c("fill", "colour"),
    labels = c("Baseline", "Covid policy"),
    values = sphsu_cols("University Blue", "Thistle", names = FALSE)
  ) +
  theme(legend.position = "bottom")

As a small initial point - these outputs currently have a very small range (accidently put just as standard error of means across 50 runs in file). Should intervals combine the sd's of the means of each run, rather than taking the variance between the mean outputs?

Suggested edits so far:

  • Output separate graphs
  • Update ribbons to 95% CIs (rather than default se)
  • Correct variable names and titles
  • Standardise colour and display

Created on 2022-07-15 by the reprex package (v2.0.1)

Data validation

This code needs some data validation.

@dkopasker I'd like you to describe here what you expect from every variable in the raw data files, that includes their range, possible NA or NaN, etc. In addition, we need to clearly state how the code processes such values. Common options include dropping such entries, asking the aggregate functions to ignore them, or replacing with some imputed values (mean of some sort, median).
This approach should make the data analysis much more reproducible.

We should also consider LABsim output as potentially corrupted as the code itself is not tested properly. Constant changes in the code do not help here either. That means this script must notify every user in the case any input value is out of expected range.

Unusual number of ties

Hi @vkhodygo,

The R code used to aggregate the results from 1,000 runs of the simulation has an unusual amount of ties up to the eighth decimal place. For example, two observations for out_ghqcase_baseline in grp_age25 in 2020 have a value of 0.54180604. This happens multiple times across various outcomes, groups, and years. Could you please review the code to ensure there is not an error?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.