The code in this replication package installs all necessary commands and runs all the analyses for the paper " From High School to Higher Education: Is recreational marijuana a consumption amenity for US college students?" by Ahmed El Fatmaoui. All analysis is done in R. A parent file(master.R) can be used to run all files at once, calling other scripts to install and load required packages and create tables and figures. master.R script code saves tables in tables folder and figuresin figuresfolder. To replicate a single table or figure,run only the source named after that table or figure(e.g., run source("table_A7.R") to replicate table A7). Replicators can expect the code to take about 1 hour to run. Mainly, replication of figure6 and table A9 takes a longer time to run as it computes the distance between each college location and the closest treated state border.
Figure 1: Layout of Replication Files
Figure 1 illustrates the layout of the replication files. The data folder encompasses two primary subfolders. One folder (source_data) houses the raw data obtained from National Center for Education Statistics (2022a) or other sources (Bureau of Labor Statistics, 2021, Bureau of Economic Analysis, 2021, U.S. Census Bureau, 2021). The other folder contains the processed data, specifically the merged IPEDS data.
Similarly, the programs folder, which comprises all the R scripts, includes a subfolder (data_cleaning) responsible for downloading and cleaning the data. Another subfolder (fig_tab) within the programs folder executes all analyses. The latter saves figuresand tables in their respective folders located within the last two major folders illustrated in Figure 1.
The paper uses mainly IPEDS data National Center for Education Statistics (2022a), and two other data from Bureau of Economic Analysis (2021)1, Bureau of Labor Statistics (2021), and U.S. Census Bureau (2021). Other data used in appendix are from Google Trends and priceofweed.com. See section 3 and appendix B for detailed description of these data. I certify that the author(s) of the manuscript have legitimate access to and permission to use, redistribute, and publish the data used in this manuscript. All data are publicly available and have been deposited in the ICPSR repository of this paper.
Dataset List
Table 1: Table 1: datasets Used in Paper
Dataset | Data files | Description and processing | DataLocation | Citation | Provided |
---|---|---|---|---|---|
IPEDS: Fall Enrollment | enroll_main.csv, enroll_all.csv, enroll_vocational.csv | Data source (Link): - IPEDs Survey: Fall Enrollment - IPEDs Title: Race/ethnicity, gender, attendance status, and level of student: Fall (2009-2019). Data Processing - data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads the data. - data_cleaning/enrollment_cleaning/enroll_cleaning.R merges the data together and saves for vocational (enroll_vocational), academic (enroll_main) and all institutions (enroll_all). |
data/clean_data | National Center for Educa- tion Statistics (2022a) | TRUE |
IPEDS— | grad_rates.csv | Data source (Link): | data/clean_data | National Center | TRUE |
Graduation | - The link will navigate you to the dataset detailing graduation | for Educa- | |||
rates | rates over 4, 5, and 6-year periods. Click ‘Continue’ as required | tion Statistics | |||
until the data is successfully downloaded to your local machine | (2022b) | ||||
(data/source_data/ipeds/gradrateraw.csv). | |||||
Data Processing | |||||
- data_cleaning/grad_rates_panel/gradratespanel.R mergesthedata | |||||
together. | |||||
IPEDS— | completion.csv | Data source (Link): | data/clean_data | National Center | TRUE |
Completion | - IPEDs Survey: Completions | for Educa- | |||
- IPEDs Title: Awards/degrees conferred by program (6-digit CIP | tion Statistics | ||||
code), award level, race/ethnicity, and gender: (2009-2019). | (2022a) | ||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
- data_cleaning/completion_cleaning/compl_cleaning.R merges the | |||||
data together. | |||||
PEDS— Tu- | welfare.csv | Data source (Link): | data/clean_data | National Center | TRUE |
ition revenue | for Educa- | ||||
and retention | 1) Tuition revenue | tion Statistics | |||
rates | - IPEDs Survey: Finance | (2022a) | |||
- IPEDs Title: all of the financesurveys (2009-2019) | |||||
2) Retention rates | |||||
- IPEDs Survey: Fall Enrollment | |||||
- IPEDs Title: Race/ethnicity, gender, attendance status, and level of | |||||
student: Fall (2009-2019) | |||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
- data_cleaning/enrollment_cleaning/welfare_cleaning.R merges the | |||||
data together. | |||||
IPEDS— | adm.csv | Data source (Link): | data/clean_data | National Center | TRUE |
Admission | for Educa- | ||||
and test | - IPEDs Survey: Admissions and Test Scores | tion Statistics | |||
scores | - IPEDs Title: Admission considerations, applications, admissions, | (2022a) | |||
enrollees and test scores, fall (2009-2019) | |||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
IPEDS— | resid_first_enrol.csv | Data source (Link): | data/source_data/ipeds | National Center | TRUE |
Fall En- | for Educa- | ||||
rollment, | - IPEDs Survey: Fall Enrollment | tion Statistics | |||
residency | - IPEDs Title: Residence and migration of first-timefreshman: Fall | (2022a) | |||
(2009-2019) | |||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
IPEDS— Fi- | finance_all.csv | Data source (Link): | data/source_data/ipeds | National Center | TRUE |
nance | finance_fasb.csv | for Educa- | |||
finance_gasp.csv | - All Finance surveys except "Response status for all survey compo- | tion Statistics | |||
finance_private.csv | nents" from 2009 to 2019 | (2022a) | |||
finance_public.csv | Data Processing | ||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
IPEDS— | df_inst_char.csv | Data source (Link): | data/source_data/ipeds | National Center | TRUE |
Institutional | df_inst_char2.csv | for Educa- | |||
Characteris- | df_inst_char3.csv | - IPEDs Survey: Institutional Characteristics | tion Statistics | ||
tics | - IPEDS Titles: | (2022a) | |||
Directory information (2009-2019) | |||||
Educational offerings, organization, services and athletic associa- | |||||
tions (2009-2019) | |||||
Student charges for academic year programs (2009-2019) | |||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
IPEDS— | headcounts.csv | Data source (Link): | data/source_data/ipeds | National Center | TRUE |
12-Month | for Educa- | ||||
Enrollment | - IPEDs Survey: 12-Month Enrollment | tion Statistics | |||
- IPEDS Titles: 12-month unduplicated headcount by race/ethnicity, | (2022a) | ||||
gender and level of student:(2009-2019) | |||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
IPEDS— | quality_measures.csv | Data source (Link): | data/source_data/ipeds | National Center | TRUE |
Fall Enroll- | for Educa- | ||||
ment | - IPEDs Survey: Fall Enrollment | tion Statistics | |||
- IPEDS Titles: Total entering class, retention rates, and student-to- | (2022a) | ||||
faculty ratio: (2009-2019) | |||||
Data Processing | |||||
- data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads | |||||
the data. | |||||
BEA, BLS, | bls_bea_data.csv cen- | County level data for population, unemployment rate and per capita | data/source_data/controls | (Bureau of Eco- | TRUE |
and Census | sus_agesex.csv cen- | income from 2009 to 2019: | nomic Analysis, | ||
Data | sus_migration.csv | 2021, Bureau of | |||
BLS data source (Link) | Labor Statistics, | ||||
2021, U.S. | |||||
BEA data source (Link) | Census Bureau, | ||||
2021) | |||||
Census data source (Link) | |||||
Data Processing | |||||
- data_cleaning/census_pop/census_data.R downloads the census | |||||
data. | |||||
- data_cleaning/bls_bea/scraping_functions.R downloads the BLS | |||||
and BEA data. | |||||
Other Data | WeedPrice.csv | Google trends for marijuana related keywords, marijuana prices, | data/source_data/controls | National Center | TRUE |
cpi_data.csv | and medical marijuana legalization timeline from 2009 to 2019: | for Educa- | |||
cleaned_price_df.csv | tion Statistics | ||||
google_trend_df.csv | Google Trends (Link) | (2022a) | |||
trends_raw.csv | |||||
medical_marij | Price of Weed data (Link) | ||||
medical.csv | |||||
Legalization timeline (Marijuana Policy Project, 2022, Carnevale | |||||
Associates, 2022) | |||||
CPI data (Link) | |||||
Data Processing | |||||
- /data_cleaning/google_trend/google_trend_data.R processes | |||||
Google Trends data. | |||||
- /data_cleaning/marijuana_price/marijuana_price_scraping.R | |||||
processes priceofweed.com data and retrieving the old data using | |||||
archive.org/web. |
Note: To locate the data on the provided link source, users should visit the National Center for Education Statistics (NCES) website. Once there, they can navigate to the section containing data from the Integrated Postsecondary Education Data System (IPEDS) surveys. These surveys are identifiedby two key variables: the IPEDS Survey (Survey column) and the IPEDS Survey Title (Title column). Users can findthese identifierslisted in Table 1 under the ‘Data Source’ column for each IPEDS dataset. Note that data_cleaning/ipeds_downloading/IPEDS_scraping.R downloads the raw IPEDS data from National Center for Education Statistics (2022a) and saves the data in data/source_data/ipeds. In data_cleaning/ipeds_downloading/IPEDS_scraping.R, there are functions called to construct panel data from each survey. For instance, the fall_enroll_race() function is utilized to extract yearly surveys for the firstdataset listed in the table, thereby forming fall enrollment panels. All the data is accessed as of June, 2023. The additional raw data, not listed in the provided table, includes: df_completion.csv (refer to the Completion Dataset in the table for data sources), df_enroll_fall_race.csv (refer to the firstFall Enrollment Dataset in the table for data sources), df_adm_act.csv (refer to the Admission and test scores Dataset in the table for data sources), grad-rate-raw.csv (refer to the Residence Dataset in the table for data sources), and resid_first_enrol.csv(refer to the Residence Dataset in the table for data sources). Further documentation is provided in the programs/data_cleaning/ subfolders.
While the code should run in most computers as it is not computationally expensive, the code was run on a MacBook Pro with the following specifications:
- System: macOS Monterey (version 12)
- Processor: 2.3 GHz 8-Core Intel Core i9 and
- Memory: 16 GB 2667 MHz DDR4
The only software used is R. The code has been run with R version 4.2.2 (2022-10-31). All programs used can be installed by running install_load_packages.R which is called in master.R. They include the following:
-
tidyverse • haven • rvest
-
fixest • ggplot2 • lubridate
-
modelsummary • dplyr • kableExtra
-
bacondecomp • magrittr • geosphere
-
fwildclusterboot • did2s • rnaturalearthhires
-
rmapshaper • readxl • magick
-
tigris • scales • ggpattern
-
sf • urbnmapr • grid
-
ggspatial • urbnthemes • groupdata2
-
rnaturalearth • devtools • tikzDevice
-
matrixStats • maps • gtrendsR
Several packages (pacbacondecomp, gtrendsR, rnaturalearthhires, urbnthemes, urbnmapr) are loaded from GitHub using either the devtools or remotes package. The script install_load_packages.R is responsible for executing the package loading process from both CRAN and GitHub. For GitHub packages, users should expect to answer some prompt questions to load all the required packages.
Running master.R takes no more than one hour to run, but replicators can replicate each table and figureseparately by running only the source of the table or figureof interest. With the exception of spillover related figures,which require computation of distance between institutions locations and treated states borders, each figureor table should run in no more than 10 minutes.
The repository has four main folders: programs, figures,tables, and data. programs/fig_tabfolder contains all the scripts needed to replicate all the tables and figures. in programs/fig_tabfolder, master.R (orchestrator script) replicates all the tables an figures. The tables are saved in tables folder; figuresare saved in figuresfolder.
programs/data_cleaning contains codes for downloading and saving raw data from the source. For instance, pro- grams/data_cleaning/ipeds_downloading contains code that downloads IPEDS surveys from the NCES source (see ReadMe.Docxfileinthesamefolder). Therawdataisthensavedindata/source_data/ipedsandthecleaneddatawhichis mergedwithcontrolvariablesisthensavedindata/clean_data(seescriptsinprograms/data_cleaning/enrollment_cleaning folder).
Lines 16 to 39 in the master.R file execute all data cleaning processes in the specified order. It is important to note that these processes are commented out, so the user needs to uncomment these lines before executing. Additionally,
executing this will update all data files. It is worth mentioning that all data was last accessed as of December 2023. As mentioned in footnote 1, Prior to using bea.R package to extract BEA data, the user must obtain an API key (see https://github.com/us-bea/bea.R). Starting from line 46, master.R runs all the analysis to generates tables and figureson this paper. Please note that main tables presented in the paper’s main findingsadhere to a 5 percent significancelevel. Therefore, any indication of significanceat the 10 percent level, which is denoted by a plus sign in some main tables, is omitted. For instance, any plus signs in tab_2_med_completion_Associate’s degree.texare presented in Table 2 without the "+" sign.
Any minor discrepancies in the wild bootstrap p-values in the tables are solely attributed to the use of different seeds in older versions. Changing the seed in data-sources.R will lead to slightly different wild bootstrap p-values. Parallel computing and random number generation algorithms could also introduce slight variations in the wild bootstrap p-values. However, it is important to note that the results remain qualitatively the same despite these differences.
To facilitate the replication and comprehension of the code, each script in the "programs/fig_tab"directory is named after the tables or figures it generates. Figures and tables are saved with numerical initials corresponding to their respective numbers in the paper. Figures are stored in either the "figures/main"or "figures/appendix"directory, while tables are saved in the "tables/main" or "tables/appendix" directory.
For example, executing source(figure_2.R)in master.R generates Figure 2, which is then saved as fig_2_a_main_did.eps and fig_2_b_main_did.eps in the figures/main directory. All source code files included in master.R adhere to this convention for creating tables and figures.
Bureau of Economic Analysis (2021), ‘County level per capita income’, https://www.bea.gov/data. Accessed:
Bureau of Labor Statistics (2021), ‘Labor force data by county’, https://www.bls.gov/lau/tables.htm#cntyaa. Accessed:
Carnevale Associates (2022), ‘Status of state Marijuana Legalization - Carnevale Info Brief’, https://www.carnevaleass
ociates.com/our-work/status-of-state-marijuana-legalization.html. Online; accessed March 20, 2022.
Marijuana Policy Project (2022), ‘State policy’, https://www.mpp.org/states/. Online; accessed March 20, 2022. National Center for Education Statistics (2022a), ‘Integrated postsecondary education data system (ipeds)’, https:
//nces.ed.gov/ipeds/datacenter/DataFiles.aspx?gotoReportId=7&fromIpeds=true&sid=ac68b949-876c-439b-abf 0-e431b89449a2&rtid=1. Accessed: 2022-4-9.
National Center for Education Statistics (2022b), ‘Integrated postsecondary education data system (ipeds)’, https:
//nces.ed.gov/ipeds/datacenter/MasterVariableList.aspx?cFrom=ADDVARIABLE&sid=ac68b949-876c-439b-abf 0-e431b89449a2&rtid=1. Accessed: 2022-4-9.
U.S. Census Bureau (2021), ‘County level population by age groups’, https://www2.census.gov/programs-surveys/pop
est/datasets/. Accessed: 2023-03-19. 10
Footnotes
-
This data is extracted using the bea.R package in R. Prior to accessing the data, the user must obtain an API key. Further details can be found at https://github.com/us-bea/bea.R. ↩