Pierre Bachas, Paul Gertler, Sean Higgins, Enrique Seira
This document explains how to replicate the results in Bachas, Pierre, Paul Gertler, Sean Higgins, and Enrique Seira, "How Debit Cards Enable the Poor to Save More," Journal of Finance.
The paper's main data sources are confidential, and thus replicating most results in the paper requires obtaining access to these confidential data.
The data sets used by this replication package are either confidential or publicly available. For confidential data sets, contact information to request access is provided below. For publicly available data sets, information to download the data are provided below. All raw data included in the analysis falls into one of the folders included below, and a comprehensive list of all of the raw and processed data sets and the files that use them as inputs and outputs can be found in scripts/00_run.do
.
The data/
folder contains the following subfolders; the subsections below describe the source for the data that go into each subfolder.
Bansefi
BDU
CNBV
comparison
CONEVAL
CPI
DENUE
ENCASDU
ENCELURB
ENOE
INE
INEGI
ITER
MCC
Medios_de_Pago
Prospera
SEPOMEX
shapefiles
Administrative data from Bansefi in the data/Bansefi/
folder are confidential. The contact person to request access is Ana Lilia Urquieta, [email protected].
Administrative data from Banco de México in the data/BDU/
folder are confidential. These data were accessed on-site on a Banco de México server, and hence the corresponding scripts can only be run on Banco de México's server. The contact person to request access is Othón Moreno, [email protected].
Administrative data from CNBV in the data/CNBV/
folder are publicly available from http://portafolioinfo.cnbv.gob.mx/PortafolioInformacion/. The contact person to obtain further information about these publicly available data sets is Diana Radilla, [email protected].
We use microdata from the replication files from other studies to compare the effect size in our study to those of other studies. Specifically, we use the following replication data sets from the following studies:
data/comparison/Drexler_et_al_2014_AEJApplied/kisDataFinal.dta
is from the replication package for Drexler, Fisher, and Schoar (2014) available here: https://www.openicpsr.org/openicpsr/project/113888/version/V1/view.data/comparison/Dupas_Robinson_2013_AER/HARP_ROSCA_final.dta
is from the replication package for Dupas and Robinson (2013) available here: https://www.openicpsr.org/openicpsr/project/116115/version/V1/view.data/comparison/Karlan_etal_2016_MgmtSci/analysis_dataallcountries.dta
is from the replication package for Karlan, McConnell, Mullainathan, Zinman (2016) available here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/UJD5OP. (Note: downloadanalysis_dataallcountries.tab
and select "Original File Format (Stata 14 Binary)").data/comparison/Karlan_Zinman_2017/proc/dreamfinal_for_analysis.dta
is a file created by running the replication package for Karlan and Zinman (2018) available here: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/STTNUF.data/comparison/Prina_2015_JDE/Nepal_JDE_R1.dta
is from the replication package for Prina (2015) available here: https://ars.els-cdn.com/content/image/1-s2.0-S0304387815000061-mmc1.zip (or click the "Download zip file" link at the article page https://www.sciencedirect.com/science/article/pii/S0304387815000061#s0075).
For other studies included in our comparison, we used the point estimates and standard errors from the paper; see scripts/96_comparison_figure.do
for more detail.
The file data/comparison/savings_rates_metadata.xlsx
was created by the authors and includes metadata about the studies included in our comparison folder. It is included with the replication data.
Locality-level data from CONEVAL, data/CONEVAL/rezago_social_localidad.dta
, are publicly available from https://www.coneval.org.mx/rw/resource/Rezago_social_-_localidad_-_stata.rar. Inside the .rar file downloaded from that link is a file Rezago social - localidad - stata.dta
which should be renamed rezago_social_localidad.dta
and placed inside the data/CONEVAL/
folder to run the relevant replication scripts.
Microdata on Mexico's consumer price index in the data/CPI/
folder are confidential. The contact person to request access is Natalia Volkow, [email protected].
Administrative data from INEGI's DENUE in the data/DENUE/
folder, used to create a locality to postal code mapping, are publicly available from https://www.inegi.org.mx/app/descarga/. The contact to obtain further information about these publicly available data sets is [email protected].
Survey data from ENCASDU in the data/ENCASDU
is confidential. The contact person to request access is Rogelio Grados, [email protected].
Survey data from ENCELURB in the data/ENCELURB/
folder is confidential. The contact person to request access is Rogelio Grados, [email protected].
The ENOE is used as an auxiliary data set; the raw ENOE data are available from https://www.inegi.org.mx/programas/enoe/15ymas/default.html#Microdatos. The version of the data used in the replication files is a cleaned version of the data provided to the authors by Laura Chioda, [email protected].
Local elections data were prepared by a research assistant of Enrique Seira's based on confidential data from INE. The contact person to request access is Enrique Seira, [email protected].
Auxiliary administrative data from INEGI in the data/INEGI/
folder are publicly available from https://www.inegi.org.mx/app/ageeml/#.
Auxiliary survey data from INEGI's ITER, based on the Population Census, in the data/ITER/
folder are confidential. The contact person to request access is Natalia Volkow, [email protected].
Auxiliary data on merchant category codes in data/MCC/mcc_codes.csv
are publicly available from https://github.com/greggles/mcc-codes.
Survey data from the Medios de Pago survey in data/Medios_de_Pago/medios_pago_titular_beneficiarios.dta
are publicly available from https://evaluacion.prospera.gob.mx/es/eval_cuant/p_bases_cuanti.php. The contact person to obtain further information about these publicly available data is Rogelio Grados, [email protected].
Administrative data from Oportunidades (now Prospera) in the data/Prospera/
folder is confidential. The contact person to request access is Rogelio Grados, [email protected].
Auxiliary administrative data from SEPOMEX, Mexico's postal service, in data/SEPOMEX/CPdescarga.txt
are publicly available from https://www.correosdemexico.gob.mx/SSLServicios/ConsultaCP/CodigoPostal_Exportar.aspx. Select TXT to obtain the data in the same format as used in the replication files.
Data on the geocoordinates of Bansefi branches in the data/shapefiles/bansefi_geocoordinates/
folder, compiled by Oportunidades (now Prospera), are confidential. The contact person to request access is José Solis, [email protected].
Shapefiles from INEGI in the data/shapefiles/INEGI/
folder are publicly available; we use the version that has been compiled by Diego Valle Jones. To download them, go to https://blog.diegovalle.net/2013/06/shapefiles-of-mexico-agebs-manzanas-etc.html and enter your email to receive the files in your email (note: check your spam folder for the email). Once you receive the email, you must click a link to confirm that you want to receive the shapefiles. In the subsequent email you receive, click the link to download "2010 Census AGEBs, Manzanas, Municipios, States, etc". The file you download after clicking this link should be called agebsymas.zip
and should have a subfolder called scince_2010/shps/
. Within this folder are subfolders corresponding to each state in Mexico, ags
, bc
, etc. Place the folders corresponding to each state in Mexico directly inside the folder data/shapefiles/INEGI/
to run the relevant parts of the replication package.
Most of the code was run on a server node with 28 CPU cores and 1.5 TB of RAM.
The following software programs and packages are required.
Earlier versions of Stata (13 or newer) should work as well but have not been tested.
The Stata code was run using 4-core Stata-MP 15.1.
All user-written .ado files required for replication are included in the adofiles/
folder, so they do not need to be separately installed to run the replication package. They include:
- .ado files we've written
time
graph_options
bimestrify
uniquevals
mydi
exampleobs
stringify
spanishaccents
winsify
, including modified code fromwinsor
by Nicholas J. Coxlower
dupcheck
cluster_permute
putmatrix
getmatrix
dim
mtab
myzscore
latexify
- .ado files written by others
extremes
(Nicholas J. Cox)_gbom
(Nicholas J. Cox)_geom
(Nicholas J. Cox)fre
(Ben Jann)randcmd
(Alwyn Young)carryforward
(David Kantor)reghdfe
(Sergio Correia). All files:r/reghdfe*.*
ande/estfe.ado
ftools
(Sergio Correia). All files:f/f*.*
,l/local_inlist.*
,m/ms_*.*, j/join.*
svmat2
(Nicholas J. Cox)
The following list includes the R packages used and the version used; all of these packages and their dependencies are listed in the renv.lock
file and can be installed using the renv
package (installation instructions below).
sf
(0.8.0)tidyverse
(1.3.0)data.table
(1.13.0)dtplyr
(0.0.3). Note: requires version 0.0.3 or earlier (1.0.0 breaks the code). To install, if not usingrenv
to install the full set of packages needed for this replication package:remotes::install_version("dtplyr", version = "0.0.3", repos = "http://cran.us.r-project.org")
magrittr
(1.5)haven
(2.2.0)assertthat
(0.2.1)here
(0.1)foreign
(0.8.75)lubridate
(1.7.8)readxl
(1.3.1)plm
(2.2.0)zoo
(1.8.6)pbapply
(1.4.2)wrapr
(2.0.2)metafor
(2.4.0)lfe
(2.8.5.1)colorblindr
(0.1.0) Note: not available on CRAN. To install, if not usingrenv
to install the full set of packages needed for this replication package:remotes::install_github("clauswilke/colorblindr")
colorspace
(1.4.1)renv
(0.12.0)
The only Python packages needed are os
and re
which are included in the Python Standard Library.
After downloading the replication code and data, unzip it into a folder on your computer. After unzipping, the project root directory containing the folders described below should be thought of as the main
folder when editing the 00_run.do
script to run on another machine. The folders inside of the zipped replication file should be placed in a folder that you will denote as global main
in 00_run.do
. These folders are:
adofiles
contains the.ado
files required to run our Stata scripts.data
and its subfolders are for raw data.graphs
(initially empty) is the folder to which graphs produced by the scripts will be written.logs
(initially empty) is the folder in which log files will be written.proc
(initially empty) is the folder in which processed data sets will be saved.renv
contains the information needed to install all needed R packages and dependencies.scripts
contains the replication code.tables
(initially empty) is the folder to which tables produced by the scripts will be written.waste
(initially empty) saves temporary files produced as part of the data preparation.
Initially empty folders include a 0-byte blank.txt
text file so that the folder structure is maintained on the GitHub repository for the replication package.
The project root directory contains the following files:
.here
is included to enable R'shere::here()
function to work with relative file paths..Rprofile
contains information to install all needed R packages and dependencies.LICENSE.md
describes that this repository is licensed under a CC BY 4.0 License, which allows reuse with attribution. The license applies to the entirety of this repository with the exception of the files in theadofiles/
folder listed under ".ado files written by others" above.README.md
is a markdown README file for the replication package.README.txt
is identical toREADME.md
but included for those unsure how to open an.md
file.renv.lock
contains information to install all needed R packages and dependencies.
All the needed Stata packages (.ado files) are included in the adofiles/
folder and do not need to be installed. The list of needed R packages, including their versions and dependencies, are included in renv.lock
, .Rprofile
, and the renv/
folder, and can be installed by opening R from the project's root directory (or, equivalently, opening R and setting the working directory as the project's root directory), then running:
renv::restore()
- The entire replication package can be run to go from raw data to final figures and tables by running
scripts/00_run.do
. - Individual scripts are numbered 01 through 120 and should be run in order; if you run scripts by running
00_run.do
they will automatically be run in order. (To determine exceptions when they can be run out of order, each script listed00_run.do
describes its input data sets and output data sets.) - To only run part of the replication package, there are local macros corresponding to each script on lines 113-272 that can be edited to only run certain scripts. Set the local macro to 1 for that script to run when running
00_run.do
or set it to 0 for that script to not run when running00_run.do
. - Note that all of the do files should be run by running
00_run.do
since that file assigns the global macros for file paths (with a few exceptions described below). For example, if running just one .do file, set the local macro for that script equal to 1 in00_run.do
and set the local macros for all of the other scripts equal to 0, then run00_run.do
. - Due to computational reasons, in practice it is unlikey you would run the entire replication package at once. Some specifics to note about certain scripts:
- Scripts 37 to 41 use confidential data from Banco de México accessed on-site; those scripts were thus run on a Banco de México server.
- Scripts 13 and 99 use the
sf
package in R, which were run in a separate conda environment due to issues installing the dependencies of thesf
package on the Kellogg Linux Cluster. - Scripts 46 and 70 process data that contains accent marks. Thus, they were run in RStudio on a laptop rather than with
00_run.do
. - Scripts 77, 78, and 80 are particularly computationally intensive because they conduct randomization inference on a large data set. These files are designed to be manually parallelized; see the instructions in lines 1105-1111 of
00_run.do
.
scripts/00_run.do
also contains:- Details on all of the raw and processed data sets used in the replication package, including which data sets are inputs and outputs of which scripts.
- Details on which tables and figures are produced by which scripts.
- In addition to the scripts
00_run.do
and the scripts numbered 01 through 120, there are a few additional files in thescripts/
folder:encelurb_dataprep_preliminary.doh
contains functions used by the scripts that clean the ENCELURB data.myfunctions.R
contains functions used by the R scripts.tabulator/
folder contains functions that make up the tabulator package by Sean Higgins. Alternatively, the package can be installed through R with:remotes::install_github("skhiggins/tabulator")
- Download and unzip the replication file.
- Obtain confidential replication data and place it in the corresponding
data/
subfolders described above. Download publicly available data from the links above and place it in the correspondingdata/
subfolders described above. - Uncomment line 30 in
scripts/00_run.do
by deleting the**
, and changing"/path/to/replication/folder"
to the parent directory for the replication on your local machine (i.e. the folder that immediately contains the foldersadofiles
,data
, etc. that were zipped in the replication file). - If the scripts you are running with
00_run.do
include R scripts, uncomment line 31 inscripts/00_run.do
by deleting the**
and replacing"/path/to/R"
with the path to your R program. On Windows, it should include the.exe
extension. - If the scripts you are running with
00_run.do
include Python scripts, uncomment line 32 inscripts/00_run.do
by deleting the**
and replacing"/path/to/python"
with the path to your Python program. On Windows, it should include the.exe
extension. - If you are running scripts 77, 78, and 80 through
00_run.do
, those files also specify the main file path since they are designed to be able to be manually parallelized as described above. Hence, uncomment line 33 in each of these scripts by deleting the**
, and changing"/path/to/replication/folder"
to the parent directory for the replication (i.e. the folder that immediately contains the foldersadofiles
,data
, etc.) - (Optional) To run only some of the scripts, edit the local macros on lines 113-272 of
00_run.do
to control which scripts run. - Run
00_run.do
, for example on a Linux server with Stata-MP it can be run in the command line with:nohup stata-mp -b do scripts/00_run.do &
Bachas, Pierre, Paul Gertler, Sean Higgins, Enrique Seira, "How debit cards enable the poor to save more," Journal of Finance, forthcoming.
Drexler, Alejandro, Greg Fischer, and Antoinette Schoar, "Keeping it simple: Financial literacy and rules of thumb," American Economic Journal: Applied Economics, 6 (2014), 1–31.
Dupas, Pascaline and Jonathan Robinson, "Why don’t the poor save more? Evidence from health savings experiments," American Economic Review, 103 (2013), 1138–1171.
Karlan, Dean, Margaret McConnell, Sendhil Mullainathan, and Jonathan Zinman, "Getting to the top of mind: How reminders increase saving," Management Science, 62 (2016), 3393– 3411.
Karlan, Dean and Jonathan Zinman, "Price and control elasticities of demand for savings," Journal of Development Economics, 130 (2018), 145–149.
Prina, Silvia, "Banking the poor via savings accounts: Evidence from a field experiment," Journal of Development Economics, 115 (2015), 16–31.