The replication-files-ei24 from ahmedelfatmaoui

Overview

The code in this replication package installs all necessary commands and runs all the analyses for the paper " From High School to Higher Education: Is recreational marijuana a consumption amenity for US college students?" by Ahmed El Fatmaoui. All analysis is done in R. A parent file(master.R) can be used to run all files at once, calling other scripts to install and load required packages and create tables and figures. master.R script code saves tables in tables folder and figuresin figuresfolder. To replicate a single table or figure,run only the source named after that table or figure(e.g., run source("table_A7.R") to replicate table A7). Replicators can expect the code to take about 1 hour to run. Mainly, replication of figure6 and table A9 takes a longer time to run as it computes the distance between each college location and the closest treated state border.

Figure 1: Layout of Replication Files

replication-files-rml data

Figure 1 illustrates the layout of the replication files. The data folder encompasses two primary subfolders. One folder (source_data) houses the raw data obtained from National Center for Education Statistics (2022a) or other sources (Bureau of Labor Statistics, 2021, Bureau of Economic Analysis, 2021, U.S. Census Bureau, 2021). The other folder contains the processed data, specifically the merged IPEDS data.

Similarly, the programs folder, which comprises all the R scripts, includes a subfolder (data_cleaning) responsible for downloading and cleaning the data. Another subfolder (fig_tab) within the programs folder executes all analyses. The latter saves figuresand tables in their respective folders located within the last two major folders illustrated in Figure 1.

Data Availability and Provenance Statements

The paper uses mainly IPEDS data National Center for Education Statistics (2022a), and two other data from Bureau of Economic Analysis (2021)¹, Bureau of Labor Statistics (2021), and U.S. Census Bureau (2021). Other data used in appendix are from Google Trends and priceofweed.com. See section 3 and appendix B for detailed description of these data. I certify that the author(s) of the manuscript have legitimate access to and permission to use, redistribute, and publish the data used in this manuscript. All data are publicly available and have been deposited in the ICPSR repository of this paper.

Dataset List

Table 1: Table 1: datasets Used in Paper

Dataset	Data files	Description and processing	DataLocation	Citation	Provided
IPEDS: Fall Enrollment	enroll_main.csv, enroll_all.csv, enroll_vocational.csv	Data source (Link): - IPEDs Survey: Fall Enrollment - IPEDs Title: Race/ethnicity, gender, attendance status, and level of student: Fall (2009-2019). Data Processing - data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads the data. - data_cleaning/enrollment_cleaning/enroll_cleaning.R merges the data together and saves for vocational (enroll_vocational), academic (enroll_main) and all institutions (enroll_all).	data/clean_data	National Center for Educa- tion Statistics (2022a)	TRUE
IPEDS—	grad_rates.csv	Data source (Link):	data/clean_data	National Center	TRUE
Graduation		- The link will navigate you to the dataset detailing graduation		for Educa-
rates		rates over 4, 5, and 6-year periods. Click ‘Continue’ as required		tion Statistics
		until the data is successfully downloaded to your local machine		(2022b)
		(data/source_data/ipeds/gradrateraw.csv).
		Data Processing
		- data_cleaning/grad_rates_panel/gradratespanel.R mergesthedata
		together.
IPEDS—	completion.csv	Data source (Link):	data/clean_data	National Center	TRUE
Completion		- IPEDs Survey: Completions		for Educa-
		- IPEDs Title: Awards/degrees conferred by program (6-digit CIP		tion Statistics
		code), award level, race/ethnicity, and gender: (2009-2019).		(2022a)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
		- data_cleaning/completion_cleaning/compl_cleaning.R merges the
		data together.
PEDS— Tu-	welfare.csv	Data source (Link):	data/clean_data	National Center	TRUE
ition revenue				for Educa-
and retention		1) Tuition revenue		tion Statistics
rates		- IPEDs Survey: Finance		(2022a)
		- IPEDs Title: all of the financesurveys (2009-2019)
		2) Retention rates
		- IPEDs Survey: Fall Enrollment
		- IPEDs Title: Race/ethnicity, gender, attendance status, and level of
		student: Fall (2009-2019)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
		- data_cleaning/enrollment_cleaning/welfare_cleaning.R merges the
		data together.
IPEDS—	adm.csv	Data source (Link):	data/clean_data	National Center	TRUE
Admission				for Educa-
and test		- IPEDs Survey: Admissions and Test Scores		tion Statistics
scores		- IPEDs Title: Admission considerations, applications, admissions,		(2022a)
		enrollees and test scores, fall (2009-2019)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
IPEDS—	resid_first_enrol.csv	Data source (Link):	data/source_data/ipeds	National Center	TRUE
Fall En-				for Educa-
rollment,		- IPEDs Survey: Fall Enrollment		tion Statistics
residency		- IPEDs Title: Residence and migration of first-timefreshman: Fall		(2022a)
		(2009-2019)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
IPEDS— Fi-	finance_all.csv	Data source (Link):	data/source_data/ipeds	National Center	TRUE
nance	finance_fasb.csv			for Educa-
	finance_gasp.csv	- All Finance surveys except "Response status for all survey compo-		tion Statistics
	finance_private.csv	nents" from 2009 to 2019		(2022a)
	finance_public.csv	Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
IPEDS—	df_inst_char.csv	Data source (Link):	data/source_data/ipeds	National Center	TRUE
Institutional	df_inst_char2.csv			for Educa-
Characteris-	df_inst_char3.csv	- IPEDs Survey: Institutional Characteristics		tion Statistics
tics		- IPEDS Titles:		(2022a)
		Directory information (2009-2019)
		Educational offerings, organization, services and athletic associa-
		tions (2009-2019)
		Student charges for academic year programs (2009-2019)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
IPEDS—	headcounts.csv	Data source (Link):	data/source_data/ipeds	National Center	TRUE
12-Month				for Educa-
Enrollment		- IPEDs Survey: 12-Month Enrollment		tion Statistics
		- IPEDS Titles: 12-month unduplicated headcount by race/ethnicity,		(2022a)
		gender and level of student:(2009-2019)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
IPEDS—	quality_measures.csv	Data source (Link):	data/source_data/ipeds	National Center	TRUE
Fall Enroll-				for Educa-
ment		- IPEDs Survey: Fall Enrollment		tion Statistics
		- IPEDS Titles: Total entering class, retention rates, and student-to-		(2022a)
		faculty ratio: (2009-2019)
		Data Processing
		- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads
		the data.
BEA, BLS,	bls_bea_data.csv cen-	County level data for population, unemployment rate and per capita	data/source_data/controls	(Bureau of Eco-	TRUE
and Census	sus_agesex.csv cen-	income from 2009 to 2019:		nomic Analysis,
Data	sus_migration.csv			2021, Bureau of
		BLS data source (Link)		Labor Statistics,
				2021, U.S.
		BEA data source (Link)		Census Bureau,
				2021)
		Census data source (Link)
		Data Processing
		- data_cleaning/census_pop/census_data.R downloads the census
		data.
		- data_cleaning/bls_bea/scraping_functions.R downloads the BLS
		and BEA data.
Other Data	WeedPrice.csv	Google trends for marijuana related keywords, marijuana prices,	data/source_data/controls	National Center	TRUE
	cpi_data.csv	and medical marijuana legalization timeline from 2009 to 2019:		for Educa-
	cleaned_price_df.csv			tion Statistics
	google_trend_df.csv	Google Trends (Link)		(2022a)
	trends_raw.csv
	medical_marij	Price of Weed data (Link)
	medical.csv
		Legalization timeline (Marijuana Policy Project, 2022, Carnevale
		Associates, 2022)
		CPI data (Link)
		Data Processing
		- /data_cleaning/google_trend/google_trend_data.R processes
		Google Trends data.
		- /data_cleaning/marijuana_price/marijuana_price_scraping.R
		processes priceofweed.com data and retrieving the old data using
		archive.org/web.

Note: To locate the data on the provided link source, users should visit the National Center for Education Statistics (NCES) website. Once there, they can navigate to the section containing data from the Integrated Postsecondary Education Data System (IPEDS) surveys. These surveys are identifiedby two key variables: the IPEDS Survey (Survey column) and the IPEDS Survey Title (Title column). Users can findthese identifierslisted in Table 1 under the ‘Data Source’ column for each IPEDS dataset. Note that data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads the raw IPEDS data from National Center for Education Statistics (2022a) and saves the data in data/source_data/ipeds. In data_cleaning/ipeds_downloading/IPEDS_scraping.R, there are functions called to construct panel data from each survey. For instance, the fall_enroll_race() function is utilized to extract yearly surveys for the firstdataset listed in the table, thereby forming fall enrollment panels. All the data is accessed as of June, 2023. The additional raw data, not listed in the provided table, includes: df_completion.csv (refer to the Completion Dataset in the table for data sources), df_enroll_fall_race.csv (refer to the firstFall Enrollment Dataset in the table for data sources), df_adm_act.csv (refer to the Admission and test scores Dataset in the table for data sources), grad-rate-raw.csv (refer to the Residence Dataset in the table for data sources), and resid_first_enrol.csv(refer to the Residence Dataset in the table for data sources). Further documentation is provided in the programs/data_cleaning/ subfolders.

Computational Requirements

Software Requirements

While the code should run in most computers as it is not computationally expensive, the code was run on a MacBook Pro with the following specifications:

System: macOS Monterey (version 12)
Processor: 2.3 GHz 8-Core Intel Core i9 and
Memory: 16 GB 2667 MHz DDR4

The only software used is R. The code has been run with R version 4.2.2 (2022-10-31). All programs used can be installed by running install_load_packages.R which is called in master.R. They include the following:

tidyverse • haven • rvest
fixest • ggplot2 • lubridate
modelsummary • dplyr • kableExtra
bacondecomp • magrittr • geosphere
fwildclusterboot • did2s • rnaturalearthhires
rmapshaper • readxl • magick
tigris • scales • ggpattern
sf • urbnmapr • grid
ggspatial • urbnthemes • groupdata2
rnaturalearth • devtools • tikzDevice
matrixStats • maps • gtrendsR

Several packages (pacbacondecomp, gtrendsR, rnaturalearthhires, urbnthemes, urbnmapr) are loaded from GitHub using either the devtools or remotes package. The script install_load_packages.R is responsible for executing the package loading process from both CRAN and GitHub. For GitHub packages, users should expect to answer some prompt questions to load all the required packages.

Memory and Runtime Requirements

Running master.R takes no more than one hour to run, but replicators can replicate each table and figureseparately by running only the source of the table or figureof interest. With the exception of spillover related figures,which require computation of distance between institutions locations and treated states borders, each figureor table should run in no more than 10 minutes.

Description of Programs and Instructions to Replicators

The repository has four main folders: programs, figures,tables, and data. programs/fig_tabfolder contains all the scripts needed to replicate all the tables and figures. in programs/fig_tabfolder, master.R (orchestrator script) replicates all the tables an figures. The tables are saved in tables folder; figuresare saved in figuresfolder.

programs/data_cleaning contains codes for downloading and saving raw data from the source. For instance, pro- grams/data_cleaning/ipeds_downloading contains code that downloads IPEDS surveys from the NCES source (see ReadMe.Docxfileinthesamefolder). Therawdataisthensavedindata/source_data/ipedsandthecleaneddatawhichis mergedwithcontrolvariablesisthensavedindata/clean_data(seescriptsinprograms/data_cleaning/enrollment_cleaning folder).

Lines 16 to 39 in the master.R file execute all data cleaning processes in the specified order. It is important to note that these processes are commented out, so the user needs to uncomment these lines before executing. Additionally,

executing this will update all data files. It is worth mentioning that all data was last accessed as of December 2023. As mentioned in footnote 1, Prior to using bea.R package to extract BEA data, the user must obtain an API key (see https://github.com/us-bea/bea.R). Starting from line 46, master.R runs all the analysis to generates tables and figureson this paper. Please note that main tables presented in the paper’s main findingsadhere to a 5 percent significancelevel. Therefore, any indication of significanceat the 10 percent level, which is denoted by a plus sign in some main tables, is omitted. For instance, any plus signs in tab_2_med_completion_Associate’s degree.texare presented in Table 2 without the "+" sign.

Any minor discrepancies in the wild bootstrap p-values in the tables are solely attributed to the use of different seeds in older versions. Changing the seed in data-sources.R will lead to slightly different wild bootstrap p-values. Parallel computing and random number generation algorithms could also introduce slight variations in the wild bootstrap p-values. However, it is important to note that the results remain qualitatively the same despite these differences.

List of Tables, Figures and Programs

To facilitate the replication and comprehension of the code, each script in the "programs/fig_tab"directory is named after the tables or figures it generates. Figures and tables are saved with numerical initials corresponding to their respective numbers in the paper. Figures are stored in either the "figures/main"or "figures/appendix"directory, while tables are saved in the "tables/main" or "tables/appendix" directory.

For example, executing source(figure_2.R)in master.R generates Figure 2, which is then saved as fig_2_a_main_did.eps and fig_2_b_main_did.eps in the figures/main directory. All source code files included in master.R adhere to this convention for creating tables and figures.

References

Bureau of Economic Analysis (2021), ‘County level per capita income’, https://www.bea.gov/data. Accessed:

2023-03-19.

Bureau of Labor Statistics (2021), ‘Labor force data by county’, https://www.bls.gov/lau/tables.htm#cntyaa. Accessed:

2023-03-19.

Carnevale Associates (2022), ‘Status of state Marijuana Legalization - Carnevale Info Brief’, https://www.carnevaleass

ociates.com/our-work/status-of-state-marijuana-legalization.html. Online; accessed March 20, 2022.

Marijuana Policy Project (2022), ‘State policy’, https://www.mpp.org/states/. Online; accessed March 20, 2022. National Center for Education Statistics (2022a), ‘Integrated postsecondary education data system (ipeds)’, https:

//nces.ed.gov/ipeds/datacenter/DataFiles.aspx?gotoReportId=7&fromIpeds=true&sid=ac68b949-876c-439b-abf 0-e431b89449a2&rtid=1. Accessed: 2022-4-9.

National Center for Education Statistics (2022b), ‘Integrated postsecondary education data system (ipeds)’, https:

//nces.ed.gov/ipeds/datacenter/MasterVariableList.aspx?cFrom=ADDVARIABLE&sid=ac68b949-876c-439b-abf 0-e431b89449a2&rtid=1. Accessed: 2022-4-9.

U.S. Census Bureau (2021), ‘County level population by age groups’, https://www2.census.gov/programs-surveys/pop

est/datasets/. Accessed: 2023-03-19. 10

This data is extracted using the bea.R package in R. Prior to accessing the data, the user must obtain an API key. Further details can be found at https://github.com/us-bea/bea.R. ↩

ahmedelfatmaoui / replication-files-ei24 Goto Github PK

replication-files-ei24's Introduction

Overview

Data Availability and Provenance Statements

Computational Requirements

Software Requirements

Memory and Runtime Requirements

Description of Programs and Instructions to Replicators

List of Tables, Figures and Programs

References

replication-files-ei24's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent

ahmedelfatmaoui / replication-files-ei24 Goto Github PK

replication-files-ei24's Introduction

Overview

Data Availability and Provenance Statements

Computational Requirements

Software Requirements

Memory and Runtime Requirements

Description of Programs and Instructions to Replicators

List of Tables, Figures and Programs

References

Footnotes

replication-files-ei24's People

Contributors

Watchers

Recommend Projects

Recommend Topics

Recommend Org