Git Product home page Git Product logo

replication-files-ei24's Introduction

Overview

The code in this replication package installs all necessary commands and runs all the analyses for the paper " From High School to Higher Education: Is recreational marijuana a consumption amenity for US college students?" by Ahmed El Fatmaoui. All analysis is done in R. A parent file(master.R) can be used to run all files at once, calling other scripts to install and load required packages and create tables and figures. master.R script code saves tables in tables folder and figuresin figuresfolder. To replicate a single table or figure,run only the source named after that table or figure(e.g., run source("table_A7.R") to replicate table A7). Replicators can expect the code to take about 1 hour to run. Mainly, replication of figure6 and table A9 takes a longer time to run as it computes the distance between each college location and the closest treated state border.

Figure 1: Layout of Replication Files

replication-files-rml data

Figure 1 illustrates the layout of the replication files. The data folder encompasses two primary subfolders. One folder (source_data) houses the raw data obtained from National Center for Education Statistics (2022a) or other sources (Bureau of Labor Statistics, 2021, Bureau of Economic Analysis, 2021, U.S. Census Bureau, 2021). The other folder contains the processed data, specifically the merged IPEDS data.

Similarly, the programs folder, which comprises all the R scripts, includes a subfolder (data_cleaning) responsible for downloading and cleaning the data. Another subfolder (fig_tab) within the programs folder executes all analyses. The latter saves figuresand tables in their respective folders located within the last two major folders illustrated in Figure 1.

Data Availability and Provenance Statements

The paper uses mainly IPEDS data National Center for Education Statistics (2022a), and two other data from Bureau of Economic Analysis (2021)1, Bureau of Labor Statistics (2021), and U.S. Census Bureau (2021). Other data used in appendix are from Google Trends and priceofweed.com. See section 3 and appendix B for detailed description of these data. I certify that the author(s) of the manuscript have legitimate access to and permission to use, redistribute, and publish the data used in this manuscript. All data are publicly available and have been deposited in the ICPSR repository of this paper.

Dataset List

Table 1: Table 1: datasets Used in Paper

Dataset Data files Description and processing DataLocation Citation Provided
IPEDS: Fall Enrollment enroll_main.csv, enroll_all.csv, enroll_vocational.csv

Data source (Link):

- IPEDs Survey: Fall Enrollment

- IPEDs Title: Race/ethnicity, gender, attendance status, and level of student: Fall (2009-2019).

Data Processing

- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads the data.

- data_cleaning/enrollment_cleaning/enroll_cleaning.R merges the data together and saves for vocational (enroll_vocational), academic (enroll_main) and all institutions (enroll_all).

 data/clean_data National Center for Educa- tion Statistics (2022a)  TRUE
IPEDS— grad_rates.csv Data source (Link): data/clean_data National Center TRUE
Graduation - The link will navigate you to the dataset detailing graduation for Educa-
rates rates over 4, 5, and 6-year periods. Click ‘Continue’ as required tion Statistics
until the data is successfully downloaded to your local machine (2022b)
(data/source_data/ipeds/gradrateraw.csv).
Data Processing
- data_cleaning/grad_rates_panel/gradratespanel.R mergesthedata
together.
IPEDS— completion.csv Data source (Link): data/clean_data National Center TRUE
Completion - IPEDs Survey: Completions for Educa-
- IPEDs Title: Awards/degrees conferred by program (6-digit CIP tion Statistics
code), award level, race/ethnicity, and gender: (2009-2019). (2022a)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
- data_cleaning/completion_cleaning/compl_cleaning.R merges the
data together.
PEDS— Tu- welfare.csv Data source (Link): data/clean_data National Center TRUE
ition revenue for Educa-
and retention 1) Tuition revenue tion Statistics
rates - IPEDs Survey: Finance (2022a)
- IPEDs Title: all of the financesurveys (2009-2019)
2) Retention rates
- IPEDs Survey: Fall Enrollment
- IPEDs Title: Race/ethnicity, gender, attendance status, and level of
student: Fall (2009-2019)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
- data_cleaning/enrollment_cleaning/welfare_cleaning.R merges the
data together.
IPEDS— adm.csv Data source (Link): data/clean_data National Center TRUE
Admission for Educa-
and test - IPEDs Survey: Admissions and Test Scores tion Statistics
scores - IPEDs Title: Admission considerations, applications, admissions, (2022a)
enrollees and test scores, fall (2009-2019)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
IPEDS— resid_first_enrol.csv Data source (Link): data/source_data/ipeds National Center TRUE
Fall En- for Educa-
rollment, - IPEDs Survey: Fall Enrollment tion Statistics
residency - IPEDs Title: Residence and migration of first-timefreshman: Fall (2022a)
(2009-2019)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
IPEDS— Fi- finance_all.csv Data source (Link): data/source_data/ipeds National Center TRUE
nance finance_fasb.csv for Educa-
finance_gasp.csv - All Finance surveys except "Response status for all survey compo- tion Statistics
finance_private.csv nents" from 2009 to 2019 (2022a)
finance_public.csv Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
IPEDS— df_inst_char.csv Data source (Link): data/source_data/ipeds National Center TRUE
Institutional df_inst_char2.csv for Educa-
Characteris- df_inst_char3.csv - IPEDs Survey: Institutional Characteristics tion Statistics
tics - IPEDS Titles: (2022a)
Directory information (2009-2019)
Educational offerings, organization, services and athletic associa-
tions (2009-2019)
Student charges for academic year programs (2009-2019)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
IPEDS— headcounts.csv Data source (Link): data/source_data/ipeds National Center TRUE
12-Month for Educa-
Enrollment - IPEDs Survey: 12-Month Enrollment tion Statistics
- IPEDS Titles: 12-month unduplicated headcount by race/ethnicity, (2022a)
gender and level of student:(2009-2019)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
IPEDS— quality_measures.csv Data source (Link): data/source_data/ipeds National Center TRUE
Fall Enroll- for Educa-
ment - IPEDs Survey: Fall Enrollment tion Statistics
- IPEDS Titles: Total entering class, retention rates, and student-to- (2022a)
faculty ratio: (2009-2019)
Data Processing
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
the data.
BEA, BLS, bls_bea_data.csv cen- County level data for population, unemployment rate and per capita data/source_data/controls (Bureau of Eco- TRUE
and Census sus_agesex.csv cen- income from 2009 to 2019: nomic Analysis,
Data sus_migration.csv 2021, Bureau of
BLS data source (Link) Labor Statistics,
2021, U.S.
BEA data source (Link) Census Bureau,
2021)
Census data source (Link)
Data Processing
- data_cleaning/census_pop/census_data.R downloads the census
data.
- data_cleaning/bls_bea/scraping_functions.R downloads the BLS
and BEA data.
Other Data WeedPrice.csv Google trends for marijuana related keywords, marijuana prices, data/source_data/controls National Center TRUE
cpi_data.csv and medical marijuana legalization timeline from 2009 to 2019: for Educa-
cleaned_price_df.csv tion Statistics
google_trend_df.csv Google Trends (Link) (2022a)
trends_raw.csv
medical_marij Price of Weed data (Link)
medical.csv
Legalization timeline (Marijuana Policy Project, 2022, Carnevale
Associates, 2022)
CPI data (Link)
Data Processing
- /data_cleaning/google_trend/google_trend_data.R processes
Google Trends data.
- /data_cleaning/marijuana_price/marijuana_price_scraping.R
processes priceofweed.com data and retrieving the old data using
archive.org/web.

Note: To locate the data on the provided link source, users should visit the National Center for Education Statistics (NCES) website. Once there, they can navigate to the section containing data from the Integrated Postsecondary Education Data System (IPEDS) surveys. These surveys are identifiedby two key variables: the IPEDS Survey (Survey column) and the IPEDS Survey Title (Title column). Users can findthese identifierslisted in Table 1 under the ‘Data Source’ column for each IPEDS dataset. Note that data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads the raw IPEDS data from National Center for Education Statistics (2022a) and saves the data in data/source_data/ipeds. In data_cleaning/ipeds_downloading/IPEDS_scraping.R, there are functions called to construct panel data from each survey. For instance, the fall_enroll_race() function is utilized to extract yearly surveys for the firstdataset listed in the table, thereby forming fall enrollment panels. All the data is accessed as of June, 2023. The additional raw data, not listed in the provided table, includes: df_completion.csv (refer to the Completion Dataset in the table for data sources), df_enroll_fall_race.csv (refer to the firstFall Enrollment Dataset in the table for data sources), df_adm_act.csv (refer to the Admission and test scores Dataset in the table for data sources), grad-rate-raw.csv (refer to the Residence Dataset in the table for data sources), and resid_first_enrol.csv(refer to the Residence Dataset in the table for data sources). Further documentation is provided in the programs/data_cleaning/ subfolders.

Computational Requirements

Software Requirements

While the code should run in most computers as it is not computationally expensive, the code was run on a MacBook Pro with the following specifications:

  • System: macOS Monterey (version 12)
  • Processor: 2.3 GHz 8-Core Intel Core i9 and
  • Memory: 16 GB 2667 MHz DDR4

The only software used is R. The code has been run with R version 4.2.2 (2022-10-31). All programs used can be installed by running install_load_packages.R which is called in master.R. They include the following:

  • tidyverse • haven • rvest

  • fixest • ggplot2 • lubridate

  • modelsummary • dplyr • kableExtra

  • bacondecomp • magrittr • geosphere

  • fwildclusterboot • did2s • rnaturalearthhires

  • rmapshaper • readxl • magick

  • tigris • scales • ggpattern

  • sf • urbnmapr • grid

  • ggspatial • urbnthemes • groupdata2

  • rnaturalearth • devtools • tikzDevice

  • matrixStats • maps • gtrendsR

Several packages (pacbacondecomp, gtrendsR, rnaturalearthhires, urbnthemes, urbnmapr) are loaded from GitHub using either the devtools or remotes package. The script install_load_packages.R is responsible for executing the package loading process from both CRAN and GitHub. For GitHub packages, users should expect to answer some prompt questions to load all the required packages.

Memory and Runtime Requirements

Running master.R takes no more than one hour to run, but replicators can replicate each table and figureseparately by running only the source of the table or figureof interest. With the exception of spillover related figures,which require computation of distance between institutions locations and treated states borders, each figureor table should run in no more than 10 minutes.

Description of Programs and Instructions to Replicators

The repository has four main folders: programs, figures,tables, and data. programs/fig_tabfolder contains all the scripts needed to replicate all the tables and figures. in programs/fig_tabfolder, master.R (orchestrator script) replicates all the tables an figures. The tables are saved in tables folder; figuresare saved in figuresfolder.

programs/data_cleaning contains codes for downloading and saving raw data from the source. For instance, pro- grams/data_cleaning/ipeds_downloading contains code that downloads IPEDS surveys from the NCES source (see ReadMe.Docxfileinthesamefolder). Therawdataisthensavedindata/source_data/ipedsandthecleaneddatawhichis mergedwithcontrolvariablesisthensavedindata/clean_data(seescriptsinprograms/data_cleaning/enrollment_cleaning folder).

Lines 16 to 39 in the master.R file execute all data cleaning processes in the specified order. It is important to note that these processes are commented out, so the user needs to uncomment these lines before executing. Additionally,

executing this will update all data files. It is worth mentioning that all data was last accessed as of December 2023. As mentioned in footnote 1, Prior to using bea.R package to extract BEA data, the user must obtain an API key (see https://github.com/us-bea/bea.R). Starting from line 46, master.R runs all the analysis to generates tables and figureson this paper. Please note that main tables presented in the paper’s main findingsadhere to a 5 percent significancelevel. Therefore, any indication of significanceat the 10 percent level, which is denoted by a plus sign in some main tables, is omitted. For instance, any plus signs in tab_2_med_completion_Associate’s degree.texare presented in Table 2 without the "+" sign.

Any minor discrepancies in the wild bootstrap p-values in the tables are solely attributed to the use of different seeds in older versions. Changing the seed in data-sources.R will lead to slightly different wild bootstrap p-values. Parallel computing and random number generation algorithms could also introduce slight variations in the wild bootstrap p-values. However, it is important to note that the results remain qualitatively the same despite these differences.

List of Tables, Figures and Programs

To facilitate the replication and comprehension of the code, each script in the "programs/fig_tab"directory is named after the tables or figures it generates. Figures and tables are saved with numerical initials corresponding to their respective numbers in the paper. Figures are stored in either the "figures/main"or "figures/appendix"directory, while tables are saved in the "tables/main" or "tables/appendix" directory.

For example, executing source(figure_2.R)in master.R generates Figure 2, which is then saved as fig_2_a_main_did.eps and fig_2_b_main_did.eps in the figures/main directory. All source code files included in master.R adhere to this convention for creating tables and figures.

References

Bureau of Economic Analysis (2021), ‘County level per capita income’, https://www.bea.gov/data. Accessed:

2023-03-19.

Bureau of Labor Statistics (2021), ‘Labor force data by county’, https://www.bls.gov/lau/tables.htm#cntyaa. Accessed:

2023-03-19.

Carnevale Associates (2022), ‘Status of state Marijuana Legalization - Carnevale Info Brief’, https://www.carnevaleass

ociates.com/our-work/status-of-state-marijuana-legalization.html. Online; accessed March 20, 2022.

Marijuana Policy Project (2022), ‘State policy’, https://www.mpp.org/states/. Online; accessed March 20, 2022. National Center for Education Statistics (2022a), ‘Integrated postsecondary education data system (ipeds)’, https:

//nces.ed.gov/ipeds/datacenter/DataFiles.aspx?gotoReportId=7&fromIpeds=true&sid=ac68b949-876c-439b-abf 0-e431b89449a2&rtid=1. Accessed: 2022-4-9.

National Center for Education Statistics (2022b), ‘Integrated postsecondary education data system (ipeds)’, https:

//nces.ed.gov/ipeds/datacenter/MasterVariableList.aspx?cFrom=ADDVARIABLE&sid=ac68b949-876c-439b-abf 0-e431b89449a2&rtid=1. Accessed: 2022-4-9.

U.S. Census Bureau (2021), ‘County level population by age groups’, https://www2.census.gov/programs-surveys/pop

est/datasets/. Accessed: 2023-03-19. 10

Footnotes

  1. This data is extracted using the bea.R package in R. Prior to accessing the data, the user must obtain an API key. Further details can be found at https://github.com/us-bea/bea.R.

replication-files-ei24's People

Contributors

ahmedelfatmaoui avatar

Watchers

Tyler Ransom avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.