Replication Package for "Impacts of the Jones Act on U.S. Petroleum Markets" by Ryan Kellogg and Richard L. Sweeney
The code in this replication package takes as inputs a mixture of publicly available data and proprietary data, and outputs the figures, tables, and LaTeX input files used in the paper. A replicator in possession of all of raw data (including proprietary data from Bloomberg and Argus Media) can run all of code by executing the jones_bash_file.sh
shell script from the root level of the replication package. This script will execute of series of Stata and R scripts that clean the raw data, execute all analyses, and produce all the figure, numeric, and tabular results in the paper. The script will also generate a PDF file of the paper, using LaTeX and the files generated by the aforementioned Stata and R programs.
This paper uses several publicly accessible data sources, exact copies of which are included in the replication package, and two commercially accessible data sources, from Bloomberg and Argus Media, that are not included.
To access the publicly-accessible data, replicators should download to their machines this publicly-accessible (zipped) folder. This folder contains three sub-folders:
rawdata
holds the raw data used in the project- The
orig
subfolder holds directories for all data that we downloaded from publicly-available sources or purchased from commercial providers (Bloomberg and Argus Media). Each subfolder withinrawdata/orig
contains a README file that describes the data source. TheBloomberg
andArgus
folders do not contain the proprietary data but do include README files describing the Bloomberg and Argus data that we used. - Replicators must obtain the Bloomberg data through a Bloomberg terminal, and must acquire the Argus Media data by executing a data use agreement with Argus Media.
- The
data
subfolder holds data obtained through the EIA's API. The R-script that executes the API call isJonesAct/code/build/EIA_API_Output/run_eia_api_v2.R
. Note that this script is not called byjones_bash_file.sh
to ensure that replicators use the same raw data that were used to create the paper.
- The
intdata
holds cleaned version of the raw data files as well as intermediate data files that facilitate the paper's analysis. We have included all intermediate files that do not contain any proprietary data (thus, some subfolders withinintdata
are empty).images
includes two figures that are included in the paper that are not produced by the Stata and R scripts.
To replicate the paper, you should first clone this repository to your machine, e.g. to a directory such as /Users/kelloggr/JonesAct
or C:/Work/JonesAct
. You should next download the data folder to your machine, e.g. to a directory such as /Users/kelloggr/public_ks_jonesact_data
or C:/Users/kelloggr/Dropbox/public_ks_jonesact_data
.
- If you obtain the proprietary data from Bloomberg and Argus Media, these files should be saved to
rawdata/orig/Bloomberg
andrawdata/orig/Argus
, respectively. It is possible that the data formats have changed since we obtained our datasets; please contact us regarding any questions about data formatting. - If you do not obtain the proprietary data, you cannot execute the full replication. However, you will be able to execute the following scripts:
- `JonesAct/code/build/ArmyCorps/LoadArmyCorpsData.do
- `JonesAct/code/build/EIACompanyLevelImports/CleanCompanyImports.do
- `JonesAct/code/build/EIACompanyLevelImports/CleanCompanyImports_rename.do
- `JonesAct/code/build/EIATerritories/CleanEITerritories.do
- `JonesAct/code/build/EIARefineryInputs/CleanRefineryInputs.R
- `JonesAct/code/analysis/padd1c_portshares.do
- So that the Stata scripts locate your local files, create a file in your root JonesAct directory called
globals.do
. The contents ofglobals.do
should look like the following, substituting in your directory paths to the root and data directories:
global repodir = "C:/Work/JonesAct"
global dropbox = "C:/Users/kelloggr/Dropbox/public_ks_jonesact_data"
- So that the R scripts locate your local files, create a file in your root
JonesAct/code
directory calledpaths.R
. The contents ofpaths.R
should look like the following, substituting in your directory paths to the root and data directories:
repo <-
file.path("C:/Work/JonesAct")
dropbox <-
file.path("C:/Users/kelloggr/Dropbox/public_ks_jonesact_data")
- So that the shell scripts locate your local files, you must specify the
REPODIR
,DBDIR
,OS
, andSTATA
variables inhbp_bash_file.sh
. These point to your local root repo directory, dropbox directory, operating system, and stata versionREPODIR
should look something likeREPODIR=C:/Work/JonesAct
DBDIR
should look something like `DBDIR="C:/Users/kelloggr/Dropbox/public_ks_jonesact_data"OS
should beOS="Windows"
orOS="Unix"
(MacOS users should use the Unix version)STATA
should beSTATA="MP"
orSTATA="SE"
- Note: do NOT include a white space on either side of the equal sign in any of these expressions
- To identify your
$HOME
variable, you can typeecho $HOME
into your bash shell command line
- If you haven't already installed R, install it.
- If you haven't already installed Stata, install it.
- If you haven't already installed LaTeX, install it.
- Packages:
- To run the scripts, you must have first installed one Stata package ("pathutil") and one R package ("here"). These packages help Stata and R scripts find the local path in which they are located. All other R package installs are handled, if necessary, by the included
code/basic_setup.R
script. - These packages can be automatically installed as part of the
jones_bash_file.sh
shell script by uncommenting the linebash -x $REPODIR/jones_stata_r_installs.sh |& tee jones_stata_r_installs_out.txt
. Alternatively, you can install the packages manually within your Stata and R interfaces (see the commands included withinjones_stata_r_installs.sh
)
- The
jones_bash_file.sh
shell script is located in the root repo directory. For users with access to the full set of public and proprietary data, executing this script will: (1) delete all intermediate data and results, leaving only the raw data files; (2) copy the files inimages
into the root'soutput/figures
subfolder; (3) conduct all of the data work and analysis starting from the raw data; and (4) compile the paper and appendix.- The deletion of the intermediate data and results files ensures that the paper's results are fully replicated from the raw data and that there are no hidden, improper file dependencies. Users with access to the confidential data who wish to replicate the entire data cleaning and analysis should proceed with this deletion (the raw data will not be deleted).
- Users who only wish to run part of the code and users who only have access to the public data should NOT run the script that deletes the intermediate data and results files.
- To execute
jones_bash_file.sh
, we recommend opening your bash shell, changing the directory to your local repository, and then using the following command
bash -x jones_bash_file.sh |& tee jones_bash_file_out.txt
- This command will log output and any error messages to
jones_bash_file_out.txt