Git Product home page Git Product logo

nwxtregress's Introduction

nwxtregress

Network Regressions in Stata with unbalanced panel data and time varying network structures or spatial weight matrices.

version release

Table of Contents

  1. Syntax
  2. Description
  3. Options
  4. Postestimation (Predict, Direct, indirect and total effects)
  5. Saved Values
  6. Examples
  7. References
  8. How to install
  9. Questions?
  10. About

1. Syntax

SAR

nwxtregress depvar  indepvars [if], 
        ivarlag(W1[, sparse timesparse mata id(string)])
        [mcmcoptions absorb(varlist, keepsingeltons) transform(transfrom_varlist, transform_options)]

SDM

nwxtregress depvar  indepvars [if], 
        ivarlag(W1[, sparse timesparse mata id(string)])
        dvarlag(Ws:varlist[, sparse timesparse mata id(string)]
        [mcmcoptions absorb(varlist, keepsingeltons) transform(transfrom_varlist, transform_options)]

Data has to be xtset before use. W1 and Ws define the spatial weight matrix, default is Sp object. dvarlag() and ivarlag() define the spatial lag of the dependent and independent variables. ivarlag() is repeatable and multiple spatial weight matrices are supported.

nwxtregress requires Stata 14.2 or higher. python and frame can only be used with Stata 16 or higher.

Options for ivarlag() and dvarlag()

option Description
mata declares weight matrix is mata matrix.
sparse if weight matrix is sparse.
timesparse weight matrix is sparse and varying over time.
id(string) vector of IDs if W is a non sparse mata matrix
normalize(string) which normalization to use.
zero(real) how to treat zeros in spatial weight matrix.

General Options

options Description
nosparse not convert weight matrix internally to a sparse matrix
asarray(name) change name of array estimation results and info
standardize standardizes all variables, short for transform(_all, by(idvar))
impact caculate total, direct and indirect effects and add them to e(b) and e(V). See Postestimation (Predict, Direct, indirect and total effects).
impactseed(seed) set seed for impact.

MCMC Options

mcmcoptions Description
draws() number of griddy gibs draws, default 2000
gridlength() grid length, default 1000
nomit() number of omitted draws, default 500
barrypace(numlist) settings for BarryPace Trick. Order is iterations, maxorder. Default is 50 and 100
usebp use BarryPace trick instead of LUD for inverse of (I−ρW).
python use Python to calculate LUD or Barry Pace trick.
seed(#) sets the seed

Transform Options

transformoptions Description
transform_varlist variables to be transformed. _all transformes all dependent and independent variables. If not specified, cmd:_all assumed.
by(varname) variable defining level of transformation.
after transform variables after spatial lags are calcuted
wy transform spatial lag of dependent variable
wx transform spatial lag of independent variables as defined by varlist
nom:ean do not demean data
nosd do not standardize data (standard deviation of 1)

Maintenance:

nwxtregress , [update version]

nwxtregress, version displays the current version. nwxtregress, update updates nwxtregress from GitHub.

2. Description

nwxtregress estimates Spatial Autoregressive (SAR) or Spatial Durbin (SDM) models. The spatial weight matrices are allowed to be time varying and the dataset can be unbalanced.

The SAR is:

y = rho W1 y + beta X + eps

The SDM is:

Y = rho W1 Y + beta X + gamma W2 X + eps

where W1 and W2 are spatial weight matrices, Y the dependent and X the independent variables.

nwxtregress can handle spatial weights in three formats: 1. square matrix, 2. sparse and 3. time sparse. Sparse matrices have the advantage that they save space and thus computational time and allow for time varying weights. The Sp environment only supports the square matrix format. nwxtregress can read square, sparse and time sparse formats if the data for the weights is in mata or saved in a frame.{p_end}

1. Square matrix format

The spatial weights are a matrix with dimension N_g x N_g. It is time constant. An Example with a 5 x 5 matrix is:

    0    0.1  0.2  0
    0    0    0.1  0.2
    0.3  0.1  0    0
    0.2  0    0.2  0

2. Sparse matrix format

The sparse matrix format is a v x 3 matrix, where v is the number of non-zero elements in the spatial weight matrix. The weight matrix is time constant. The first column indicates the destination, the second the origin of the flow. A sparse matrix of the matrix from above is:

       Destination  Origin    Flow
       1            2         0.1
       1            3         0.2
       2            3         0.1
       2            4         0.2
       3            1         0.3
       3            2         0.1
       4            1         0.2
       4            3         0.2

3. Time-Sparse format

The time sparse format can handle time varying spatial weights. The first column indicates the time period, the remaining are the same as for the sparse matrix. For example, if there are two time periods and we have the matrix from above for the first and the square for the second period:

       Time    Destination    Origin    Flow
       1       1              2         0.1
       1       1              3         0.2
       1       2              3         0.1
       1       2              4         0.2
       1       3              1         0.3
       1       3              2         0.1
       1       4              1         0.2
       1       4              3         0.2
          (next time period)
       2       1              2         0.1
       2       1              3         0.4
       2       2              3         0.1
       2       2              4         0.4
       2       3              1         0.9
       2       3              2         0.1
       2       4              1         0.4
       2       4              3         0.4

Internally, nextregress will always use the time sparse format. This ensures that unbalanced panels do not pose a problem. nextregress comes with functions for creating sparse matrices, coplying a sparse matrix into a squared format, and functions for mathematical operations (transpose and multiplication).

3. Options

Options

Option Description
frame(name) declares weight matrix is saved in a frame. Default is to use a spatial weight matrix from the Sp environment. If a frame is used, data can be in sparse, timesparse or square matrix format.
mata declares weight matrix is mata matrix. Default is to use a spatial weight matrix from the Sp environment. If a mata matrix is used, data can be in sparse, time sparse or square matrix format.
sparse if weight matrix is in sparse format. Sparse format implies that the first two column define the origin and the destination of the flow, the third column the value of the flow.
timesparse weight matrix is sparse and varying over time. As sparse but first column includes the time period.
id(string) vector of IDs if W is a non sparse mata matrix. If a frame is used, then id() contains the varible names of the time indicator (if applicable), the origin and destination of the flows.
normalize(string) which normalization to use for spatial weight matrix. Default is row normalisation. Can be none, row (default), column, spectral or minmax, see normalisation option of spmat creat. The normalisation is done for each time period individually.
zero(real) defines how to treat zeros in spatial weight matrics. Default is to remove zero entries for non-sparse matrices and to set zeros to 0.0001 if weight matrix is (time)sparse.
nosparse not convert weight matrix internally to a sparse matrix. Option is not recommended to use.
asarray(name) nwxtregress saves intermediate results such as the spatial weight matrix in an internal time sparse format, residuals and results from the MCMC in an array, see stored values. It is not recommended to change contents of the array and the option to change the name should only be rarely used. The default name is NWXTREG_OBJECT#, where # is a counter if the array already existed.
draws() number of griddy gibs draws, default 2000.
gridlength() grid length, default 1000.
nomit() number of omitted draws, default 500.
barrypace(numlist) settings for BarryPace Trick. Order is iterations, maxorder. Default is 50 and 100.
usebp use BarryPace trick instead of LUD for inverse of (I−ρW).
python use Python to calculate the LU Decomposition or BarryPace trick. Requires installation of Python, scipy, sfi and numpy. Using Python to calculate the LUD is faster by a factor 4-10.
impact caculate total, direct and indirect effects and add them to e(b) and e(V). See Postestimation (Predict, Direct, indirect and total effects).
impactseed(seed) set seed for impact.
seed(#) sets the seed.
version display version.
update update from Github.

3.1 High Dimensional Fixed Effects

nwxtregress can remove high dimensional fixed effects using reghdfe. The fixed effects are partialled out before spatial lags are cacluated. Constant is automatically removed when cmd:absorb() is used. The syntax is:

asorb(varlist, keepsingeltons)

Option Description
varlist categorical variables that identify the fixed effects to be absorbed.
veepsingelton keep singelton units.

3.2 Transform Data

nwxtregress can demean and standardize dependent and independent variables, before or after the calculation of the spatial lags. Spatial lags can be transformed as well. The syntax is:

transform([varlist] [, by(varname)) after nomean nosd wy wx])

Option Description
varlist variables to be transformed. _all implies all dependent and independent variabkes. If left empty, _all assumed.
by(varname) variable defining transformation. Default is by(ID), where ID identifies the cross-sections. by(_all) transforms data across all cross-sections.
after transform data after caculation of spatial lags. Default is to transform data first.
nomean do not demean data.
nosd do not standardize data.
wy transform spatial lag of dependent variable. Implies after.
wx transform spatial lags of independent variables as defined in it:varlist. Implies cmd:after.
transform short for transform(_all).
standardize short for transform(_all).

4. Postestimation

4.1 Direct, indirect and total effects.

Direct, indirect and total effects. can be calculated using estat impact. The syntax is

 estat impact [varlist] [, options]
Option Description
seed(#) set seed for Barry Pace matrix inversion.
array(name) name of array with saved contents from nwxtregress, see stored results.

varlist defines the variables for which the direct, indirect and total effects are displayed. If not specified, then estat impact will calculate the effects for all explanatory variables (indepvars).

estat impact saves the following in r():

Matrix Description
r(b_direct) Coefficient Matrix of direct effects
r(V_direct) Variance covariance matrix of direct effects
r(b_indirect) Coefficient Matrix of indirect effects
r(V_indirect) Variance covariance matrix of indirect effects
r(b_total) Coefficient Matrix of total effects
r(V_total) Variance covariance matrix of total effects

4.2 Predict

predict can be used after nwxtregress. The syntax for predict is:

predict [type] varname [, options]
Option Description
xb calculate linear prediction.
res calculate residuals.
replace replace if varname exists.
array(name) name of array with saved contents from nwxtregress, see stored results.

5. Saved Values

nwxtregress saves the following in e()

Matrices

Matrices Description
b Coefficient Matrix
V Variance-Covariance Matrix

Scalars

Scalars Description
N Number of observations
N_g Number of groups
T Number of time periods
Tmin Minimum number of time periods
Tavg Average number of time periods
Tmax Maximum number of time periods
K Number of regressors excluding spatial lags
Kfull Number of regressors including spatial lags
r2 R-squared
r2_a adjusted R-squared
MCdraws Number of MCMC draws

Macros

Macro Description
sample sample

mata arrays

In addition to e() and r() nwxtregress saves informations about the estimation in a mata array. The contents are the weight matrix in time sparse format, residuals and results from the MCMC. Storing those saves time for estat impact and predict. The name default name of the array is _NWXTREG_OBJECT#, but can be set with the option asarray(). In general it is not recommended to change this setting.

6. Examples

An example dataset with USE/MAKE table data from the BEA’s website and links between industries is available GitHub. The dataset IO.dta contains the linkages (spatial weights) and the dataset VA.dta the firm data. We want to estimate capital consumption by using compensation and net surplus as explanatory variables.

First we load the data from the W dataset and convert into a SP object for the year 1998.

use https://janditzen.github.io/nwxtregress/examples/IO.dta
keep if Year == 1998
replace sam = 0 if sam < 0
replace sam = 0 if ID1==ID2
keep ID1 ID2 sam
reshape wide sam, i(ID1) j(ID2)
spset ID1
spmatrix fromdata WSpmat = sam* , replace

Next, we load the dataset with the firm data and estimate a SAR with a time constant spatial weight matrix. We also obtain the total, direct and indirect effects using estat impact. For reproducibility we set a seed.

use https://janditzen.github.io/nwxtregress/examples/VA.dta
nwxtregress cap_cons compensation net_surplus , dvarlag(WSpmat) seed(1234)
estat impact

The disadvantage is that the spatial weight are constant across time and we had to get rid of all negative numbers. To allow for time varying spatial weights, we load the W dataset again and but load it into the frame IO:

frame create IO
frame IO: use https://janditzen.github.io/nwxtregress/examples/IO.dta

Using the VA dataset again, we can estimate the SAR model with time varying spatial weights. To do so we use the options frame(name), where name indicates the frame and the weight matrix name corresponds to the variable names. The data is in timesparse format so we need to use the option timesparse. Finally it is nessary to define the year identifier and the origin and destination of the flows using the id() option:

nwxtregress cap_cons compensation net_surplus ,
dvarlag(sam, frame(IO) id(Year ID1 ID2)  timesparse) 
seed(1234) 

Alternatively we can load the spatial weight matrix into mata:

frame IO: putmata Wt = (Year ID1 ID2 sam), replace
nwxtregress cap_cons compensation net_surplus , 
dvarlag(Wt, mata timesparse) seed(1234)

If we want to estimate an SDM by adding the option ivarlag():

nwxtregress cap_cons compensation net_surplus , 
dvarlag(Wt,mata timesparse) ivarlag(Wt: compensation,mata timesparse )  
seed(1234)

Use Python (requires Stata 16 or later) to improve speed of calculating the LUD:

nwxtregress cap_cons compensation net_surplus , dvarlag(Wt,mata timesparse) ivarlag(Wt: compensation,mata timesparse )  seed(1234) python

Transform data by demeaning and standardising it:

nwxtregress cap_cons compensation net_surplus , dvarlag(Wt,mata timesparse) ivarlag(Wt: compensation,mata timesparse )  seed(1234) transform(_all, by(ID)

or

nwxtregress cap_cons compensation net_surplus , dvarlag(Wt,mata timesparse) ivarlag(Wt: compensation,mata timesparse )  seed(1234) standardize

Partial out firm and year fixed effects (requires reghdfe):

nwxtregress cap_cons compensation net_surplus , dvarlag(Wt,mata timesparse) ivarlag(Wt: compensation,mata timesparse )  seed(1234) absorb(ID Year)

We can also define two different spatial weight matrices:

mata: Wt2 = Wt[selectindex(Wt[.,4]:>2601.996),.]
nwxtregress cap_cons compensation net_surplus , 
dvarlag(Wt, mata timesparse) 
ivarlag(Wt: net_surplus, mata timesparse) 
ivarlag(Wt2: compensation, mata timesparse) seed(1234)

Total, direct and indirect effects can be calculated using estat impact:

estat impact

To predict fitted values and residuals predict can be used:

predict xb
predict residuals, residual

7. References

Please cite as:

Ditzen, Grieser, Zekhnini. (2023). nwxtregress - network regression in Stata.

8. How to install

The latest version of the nwxtregress package can be obtained by typing in Stata:

net from https://janditzen.github.io/nwxtregress/

or

net install nwxtregress , from(https://janditzen.github.io/nwxtregress/)

9. Questions?

Questions? Feel free to write us an email, open an issue or start a discussion.

10. Authors

Jan Ditzen (Free University of Bozen-Bolzano)

Email: [email protected]

Web: www.jan.ditzen.net

William Grieser (Texas Christian University)

Email: [email protected]

Web: https://www.williamgrieser.com/

Morad Zekhnini (Michigan State University)

Email: [email protected]

Web: https://sites.google.com/view/moradzekhnini/home

About:

This version 0.3 as of 13.11.2023

Changelog:

Version 0.3

  • support for reghdfe v. 6.12

Version 0.2

  • bug when absorb() option used fixed
  • bug when non sparse matrix used with estat fixed

Version 0.132

  • bug when absorb() used fixed, sometimes variables were not updated correctly.

Version 0.131

  • moved python code to nwxtregress.py
  • added option impact

Version 0.13

  • added options absorb() and transform()
  • bugfixes when using fixed effects
  • Python support for BarryPace trick

Version 0.12

  • Support for Python to calculated LUD

Version 0.03 (alpha)

  • Bugs in sparse matrix multiplication and return if non sparse matrix is used fixed.

nwxtregress's People

Contributors

janditzen avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

nwxtregress's Issues

Error if non sparse weight matrix used

Dear Jan,
I run the demo code in your github , eg VA data set to estimate a SAR mode, the estimation process goes smoothly but while I try to get the imact, estat impact, it returns the following errors:
estat impact
----+--- 10 --+--- 20 --+--- 30 --+--- 40 --+--- 50 %
*: 3200 conformability error
SparseMultiply(): - function returned error
BarryPace(): - function returned error
calc_DIEffects(): - function returned error
: - function returned error

As you methioned in one of the discussion, I need to install the new version of moremata library in Stata. The problem is that even I install it, the errors are still there, I sincerely hope that you can help me.

the coding is just like this :

use IO
keep if Year == 1998
drop buyer_industry seller_industry
replace sam = 0 if sam < 0
replace sam = 0 if ID1==ID2
reshape wide sam, i(ID1) j(ID2)
spset ID1
spmatrix fromdata WSpmat = sam* , replace
clear
use VA
nwxtregress cap_cons compensation net_surplus , dvarlag(WSpmat) seed(1234)
estat impact

Thank you very much!

Originally posted by @zhangzhqnk in #5

an issue of estat imapct after nwxtregress estimation

Hello,
while trying to use estat imapct after successfully run nwxtregress, Stata gives the following error message, wondering if it is a bug.

. estat impact
----+--- 10 --+--- 20 --+--- 30 --+--- 40 --+--- 50 %
J(): 3900 unable to allocate real [2147379216,50]
SparseMultiply(): - function returned error
BarryPace(): - function returned error
calc_DIEffects(): - function returned error
: - function returned error
r(3900);

Stata 16.1

How to add multiple matrices in dvarlag()?

Dear Jan,

It seems that multiple matrices could not be added to dvarlag(), but only multiple matrices in ivarlag(). See if you could post the solution to it.
Thank you very much!

Best,
Wenna

Bug report in running -nwxtregress-

Hi,
When I run nwxtregress cap_cons compensation net_surplus , dvarlag(WSpmat) seed(1234) following the codes in help page, the error information is as follows:
'''
Initialise Grid (200)
----+--- 10 --+--- 20 --+--- 30 --+--- 40 --+--- 50 %
.................................................. 50
.................................................. 100
Griddy Gibbs (2000)
----+--- 10 --+--- 20 --+--- 30 --+--- 40 --+--- 50 %
mm_colrunsum(): 3001 expected 1 arguments but received 3
GriddyGibbsi(): - function returned error
GriddyGibbs(): - function returned error
estimatesar(): - function returned error
: - function returned error
r(3001);
'''
My Stata is SE 17.0 for Windows.

Thanks and best wishes.

3301 error in the 0.13 version of nwxtregress

Hi, Jan.

Happy new year!
When running the code, the following errors return in the 0.13 version of nwxtregress.
Is it anything suspicious about the data that cause the issue?

Screenshot 2023-01-23 at 11 33 08

Thank you,
Wenna

Error return with 3301

Dear Jan,

I came across the following errors. The error code is 3301.
I checked the id identifiers in the spatial network, which match across.
Wdistance is the spatial distance of cities.
Wdistancetry is the truncated version of Wdistance.
Screenshot 2023-01-01 at 11 03 46

The same code worked perfectly for another dataset.
Screenshot 2023-01-01 at 11 05 34

See if you have any solutions or insights for this issue.
Thank you very much!

Best,
Wenna

Questions when run nwxtregress

nwxtregress cap_cons compensation net_surplus , dvarlag(WSpmat) seed(1234)
error:
Traceback (most recent call last):
File "", line 1, in
ModuleNotFoundError: No module named 'scipy'
(43 lines skipped)

when I run this code, I met this error.

r(199); reghdfe Mata library not found or error in reghdfe.

Dear Jan,
I run the demo code in your GitHub:

Partial out firm and year fixed effects (requires reghdfe):
nwxtregress cap_cons compensation net_surplus , dvarlag(Wt,mata timesparse) ivarlag(Wt: compensation,mata timesparse ) seed(1234) absorb(ID Year)

I encounter the following error message:
reghdfe Mata library not found or error in reghdfe.
r(199);

I use stata 17MP and update the reghdfe.
Thank you!
Peng

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.