sergiocorreia / ivreghdfe Goto Github PK

View Code? Open in Web Editor NEW

77.0 77.0 28.0 298 KB

Run IV/2SLS with many levels of fixed effects (i.e. ivreg2+reghdfe)

License: MIT License

Stata 99.98% TeX 0.02%

ivreghdfe's People

Contributors

Stargazers

Watchers

ivreghdfe's Issues

-

[BUG] Incorrect fixed effects with `cluster()`

clear
sysuse auto, clear
ivreghdfe price (mpg = turn), absorb(a1=rep78) cluster(rep78)
ivreghdfe price (mpg = turn), absorb(a2=rep78) cluster(rep78)
gen diff = reldif(a1, a2)
which ivreghdfe
sum diff, d

gives

/home/mauricio/ado/plus/i/ivreghdfe.ado
*! ivreghdfe 1.1.3  04Jan2023 (bugfix for github issue #48)
*! ivreghdfe 1.1.2  29Sep2022 (bugfix for github issue #44)
*! ivreghdfe 1.1.1  14Dec2021 (experimental -margins- support)
*! ivreghdfe 1.1.0  25Feb2021
*! ivreg2 4.1.11  22Nov2019
*! authors cfb & mes
*! see end of file for version comments

                            diff
-------------------------------------------------------------
      Percentiles      Smallest
 1%            0              0
 5%            0              0
10%     .0145922              0       Obs                  74
25%     .0145922              0       Sum of Wgt.          74

50%     .0188667                      Mean           .0251712
                        Largest       Std. Dev.      .0174165
75%     .0292981       .0621538
90%     .0621538       .0621538       Variance       .0003033
95%     .0621538       .0621538       Skewness       1.148167
99%     .0621538       .0621538       Kurtosis       3.469954

error occurred while loading ivreghdfe.ado

When I run the code in the help file of ivreghdfe

sysuse auto
ivreghdfe price weight (length=gear), absorb(rep78, tol(1e-6))
I can't get any result it always appear the following words:

struct ms_vcvorthog undefined
(817 lines skipped)
(error occurred while loading ivreghdfe.ado)
r(3000);

I have use the online guide to installed three packages: ftools, reghdfe, and ivreg2, and the version of my Stata is MP 14.2.

R(3499) with IV/2sls using reghdfe, old and ivreghdfe

Dear Sergio,

Many thanks for providing the commands reghdfe and ivreghdfe. I use them extensively, given my old syntax I still prefer the reghdfe, old for running 2sls.
Since today I face a new error message I never say before and I do not know what to do:

. reghdfe kse_ef60 ( kse_ef61 = kse_ef53), absorb (jahr) old
(running historical version of reghdfe)
assert_msg(): 3499 _assert_abort() not found
map_init_keepsingletons(): - function returned error
: - function returned error
r(3499);

. ivreghdfe kse_ef60 ( kse_ef61 = kse_ef53), absorb (jahr)
assert_msg(): 3499 _assert_abort() not found
fixed_effects(): - function returned error
: - function returned error
r(3499);

I ran the exact same code on the exact same data multiple times already and I never faced this error message. The data is also rather small and simple. Since I work at an institute where our computers share the same ado-folder, it could very well be the case that someone has updated reghdfe and that this introduced this problem.

Do you know of this issue? It would be of great help to me if you could point out what I did potentially wrong (or is this a new bug?).

Many thanks for all your work,
bests,
matthias

ivreghdfe weak instrument

Hi Sergio

I am using ivreghdfe in the context of trade models because the function can deal with numerous country fixed effects and I have an endogenous regressor. However, when I use this function I am getting a very low f statistic in the first stage (which isn't happening when I use a simple ivreg2 with a small option).

I don't understand how the absorbing of FEs is affecting my first state f value. I am completely new to this. I would like to understand why this is happening. Also, for my application when should I use clustering?

Thank you

Test for equality of coefficients of 2 different regressions

Hi Sergio,
I'm using ivreghdfe to run 2 different regressions for 2 different samples. E.g.
ivreghdfe y d (x=z) if sample==1, absorb(a) cluster(clustervar)
ivreghdfe y d (x=z) if sample==2, absorb(a) cluster(clustervar)
After having the result, I want to test if the coefficients on the endogenous variable (e.g. x) of the two regressions are similar or not. I process according to what is written here https://www.stata.com/statalist/archive/2009-11/msg01485.html . i.e.
g d1=d*(sample==1)
g d2=d*(sample==2)
g x1=x*(sample==1)
g x2=x*(sample==2)
g z1=z*(sample==1)
g z2=z*(sample==2)
ivreghdfe y d? (x?=z?), absorb(a) cluster(clustervar)

But three problems occur:
i) Cluster: The cluster variable in the "stacked" regression should be sample. But in my original regression, the standard error is clustered at the clustervar level. How should I deal with the cluster in the "stacked" regression in this case? Can I do like the following:
egen clid=group(sample clustervar)
ivreghdfe y d? (x?=z?), absorb(a) cluster(clid)
ii) Fixed effect in the absorb option: do I need to generate a1=a*(sample==1) and a2=a*(sample==2), and use absorb(a?) instead of absorb(a) in the "stacked" regression?
iii) If I run the above "stacked" regression (i.e. ivreghdfe y d? (x?=z?), absorb(a) cluster(clustervar)), the coefficients on the endogenous variable for 2 different samples in the "stacked" regression are the same as the ones in the two original regressions, but the coefficients on the exogenous variable for 2 different samples in the "stacked" regression is different to the ones in the two original regressions. Did I make any mistake here?

I highly appreciate if you could reply to my questions.
Thank you!

FE in first stage of ivreghdfe?

Hi Sergio, Wanted to double check whether the first stage of ivreghdfe includes the same absorb() variables from the second stage, ie, same included instruments. Couldn't find this from the output in stata, also didn't find the option in the help of the command. Thanks!

F statistics for multiple endogenous variables

Thank you for this great command!

I am wondering if there is a way to get the first stage Sanderson-Windmeijer (SW) partial F statistics for multiple endogenous variables that is produced by ivreg2 (saved in e(first) with the first or ffirst options).

How to report the coefficient of the constant in ivreghdfe?

I note that ivreghdfe automatically partial out the constant. How can I report the coefficient of the constant in ivreghdfe?

issues with clustering - transmorphic found where struct expected

@sergiocorreia

my code is as follows:
ivreghdfe y ( s = z) x1 x2, absorb(InstructorFE CourseFE) vce(cluster StudentID)

Then, it showed an error code:
type mismatch: exp.exp: transmorphic found where struct expected

Error with saving residuals

I'm running ivreghdfe with 3 fixed effects and trying to save the residuals.

The residuals are saved but I get an error that says "command reghdfe_store_alphas is unrecognized"

I've tried uninstalling everything (ftools, reghdfe, ivreghdfe) and reinstalling from github to no avail.

I can save residuals from reghdfe without a problem. This seems to be related to line 2285 in the ivreghdfe.ado file where it references that command.

Standard errors different for ivreghdfe and ivreg2 (even more so when using "small" option for ivreg2)

Hi Sergio,

Would you be able to offer insight as to how the standard errors in ivreghdfe differ from those computed using ivreg2? My understanding from a previous thread (#21) was that "ivreghdfe" and "ivreg2, small" are supposed to yield the same standard errors. I can see that this is the case for the simple automobile example that you provided.

However, in a regression model that I'm running I find that adding the "small" option to ivreg2 makes the SEs diverge even more from ivreghdfe. Whereas "ivreg2" produces SEs that are far smaller than those produced by ivreghdfe, "ivreg2, small" yields much larger SEs (screenshot below):

The dataset is not publicly accessible but the code looks something like:

ivreghdfe outcome (treatment =instrument) black white asian sat_score, robust a(i.pscore_group)
or
ivreg2 outcome (treatment =instrument) black white asian sat_score i.pscore_group, robust partial(i.pscore_group) small

where the pscore_group variable is a pretty finely-distributed variable (there are 9,619 observations and 4,375 distinct values of the pscore_group variable).

If I replace pscore_group with a set of indicators for a variable that is much less finely-distributed, the "ivreg2, small" standard error is indeed very close to that of "ivreghdfe" (.025526 vs .025523), and much closer than that of "ivreg2".

So why is it that when including a large number of fixed effects the standard errors of "ivreghdfe" and "ivreg2, small" differ so much? Does this come down to how the degrees of freedom are computed?

Thanks!

ivreghdfe with keep singletons option

Hi,

I was using reghdfe before for an IV regression along with keep singleton option. But recently IV regression can now be done using ivreghdfe command only, however I can no longer use keep singletons option in the IV regression.

Regards,
Prachi

issues with clustering - insufficient observations

@sergiocorreia , I needed some advice on clustering while using ivreghdfe.

My code is -
ivreghdfe yvar (ever_win = true_treat), absorb(strata gender#batch city#batch yrs_school#batch age#batch mult_apply_hh#batch) vce(cluster unique_id)

I have 78,670 observations with about 42000 distinct unique_ids. The FEs in absorb() amount to about 3000 dummies.

The error I get is insufficient observations.
When I remove the vce(cluster) option, the code runs properly. Could you help out with what I must be doing wrong here?

R-squared from IVREGHDFE

IVREGHDFE only posts the within r2, rather than the full set of r2 statistics provided by REGHDFE

Using partial option with ivreghdfe

Perhaps more of a question than an issue. I'm running regressions of the form

ivreghdfe Y (X = Z) , absorb(i.F) cluster(C)

The number of fixed effects is very large (thus the use of reghdfe) and I get a warning because the number of clusters is smaller than the number of fixed effects. F-statistics also do not get reported, which is clearly problematic for IV. It does say that the use of the "partial" option may help fix this problem. However, the way partial() works in ivreg2 is that the user has to generate the variables to be partialed out, i.e., I cannot write "partial(i.F)" (I tried, ivreghdfe breaks down). But if I generate the indicators (let's call the set _F*), and run

ivreghdfe Y (X = Z) , absorb(i.F) partial(_F*) cluster(C)

it doesn't seem like there's any way reghdfe will recognize that i.F and _F* are the same variable. Thus, it seems like if I want F-statistics in this case, I just need to use ivreg2 and suffer through the slowness?

The older version of reghdfe on ssc used to report F-stats for IV regressions in these cases, so perhaps it wouldn't be difficult to bring them back?

ivreghdfe with interactionterms

Hey there!

I'd like to run the ivreghdfe command (as I need to use different fixed effects) in Stata/SE 15.0, whereas the endogenous variable is interacted with another variable.
I tried something like this:
ivreghdfe y ex*(en = z) x, where 'ex' is an exogenous variable interacted with 'en', which is instrumented by 'z'.
However, this does not work. Stata reports the error 132, parantheses unbalanced.
Can someone help me with this problem? Or is it not possible to use interactions with an endogenous variable?

Thanks very much in advance,
Patricia

ivreghdfe throws off ivreg2 instrument collinearity detection?

These two commands return basically the same results:

ivreg2 price i.turn (headroom = foreign#c.turn)
ivreghdfe price (headroom = foreign#c.turn), absorb(turn)

However, in the first one, ivreg2 detects that the instruments foreign#c.turn are multicollinear with i.turn, whereas in the second it does not detect collinearity as far as I can see, looking at return macros such as e(exexog). This matters to me because I would like to make boottest return the same results after both. However, boottest relies on ivreg2 to detect and mark collinear instruments. Since none are marked in the second case, boottest is returning wrong results after that command line, at least when doing the Anderson-Rubin test.

Do you think there's anything to be done about this?
Thanks.

F-statistics and Clustering

Hi Sergio,

Thanks for your reghdfe and ivreghdfe packages. I've found them immensely useful over the past few years.

I'm running an IV regression with a large number of instruments (300-400). I'm also using a large number of fixed effects representing counties and state-by-week (>1000 FEs). In some specifications, the instruments are colinear with the FEs. This creates the problem that F statistics don't calculate. The specific error message is:

"warning: -ranktest- error in calculating weak identification test statistics;
may be caused by collinearities"

I solved this by manually generating the FEs, checking which are colinear, and dropping them from the regression. However, when I then clustered by county and state, I ran into a new problem that prevents the F statistics from computing. I'm unsure of the nature of this issue, although it appears similar to one previously raised by Tatyana Deryugina in 2018. The new error message reads:

"Warning: estimated covariance matrix of moment conditions not of full rank.
overidentification statistic not reported, and standard errors and
model tests should be interpreted with caution.
Possible causes:
number of clusters insufficient to calculate robust covariance matrix
singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
partial option may address problem."

From the code for her associated paper, it seems like Deryugina dealt with this issue by simply running reghdfe with the old option. The old option, however, no longer appears to work with reghdfe. Is there something else I can try to include clusters? If you have any work-arounds for the colinear instruments, that would also help immensely.

Thanks again.

Question with the ivreghdfe command

I am using the ivreghdfe command, which does not work and says "option requirements not allowed".
I tried to update the command manually with following code:
cap ado uninstall ftools
cap ado uninstall reghdfe
cap ado uninstall ivreghdfe
net install ftools, from(c:\git\ftools)
net install reghdfe, from(c:\git\reghdfe)
net install ivreghdfe, from(c:\git\ivreghdfe)
However, the error continues.

Fixed effects not saved

sysuse auto
ivreg2hdfe price weight, a(FE=turn)
conf var FE

todo: warn if there are more than two cluster variables

Currently ivreghdfe (and ivreg2) ignore the third and subsequent cluster vars. Instead, maybe give an error

Check for dependencies

EG: If ivreg2 is not installed, give an understandable error message

Cannot Download Package

Fixed Effects in Second Stage

Hello,

I was trying to estimate an instrumental variables regression using individual fixed effects in the second stage. However, I do not want to use these fixed effects in the first stage. I know that the absorb commands includes the fixed effects in both the first and second stage. Is there an additional absorb command to only use the fixed effects in the second stage (but not the first stage)?

My current alternative is to use categorical variables (the "i." command), which is quite slow as it calculates the standard errors for these fixed effects. Another alternative is to demean the dependent variable within the fixed effect groups, but the corresponding standard error adjustment for degrees of freedom becomes quite complicated, especially when clustering standard errors.

Any suggestions would be greatly appreciated.

Thank you!

driscoll-kraay standard errors: ivreg2 vs ivreghdfe

Consider the following models:

ivreg2 y (x=z) i.cross i.time, dkraay(3) small
ivreg2 y (x=z) i.cross i.time, partial(i.cross i.time) dkraay(3) small
reghdfe y (x=z), absorb(i.cross i.time) vce(cluster time,dkraay(3)) old
ivreghdfe y (x=z), absorb(i.cross i.time) dkraay(3)

I get identical point estimates. Standard errors are however different. In particular, I get m(1) = m(2) ≠ m(3) = m(4)

What am I missing?

`margins` postestimation command not supported

Code:

sysuse auto, clear
reghdfe price weight i.foreign (length=gear), absorb(turn trunk) old
margins i.foreign

Underlying issue: ivreg2 does not support margins if used with partial()

ivreg2 price weight i.foreign, partial(weight)
margins i.foreign

Possible soln: study when is it valid to use margins+partial

standard errors different for ivreg2 and ivreghdfe

Hello,

Would you be able to explain the source of the difference between the standard errors in ivreghdfe and ivreg2? Thanks.

Running the same regression with ivreghdfe and ivreg2 yields standard errors that are larger with ivreghdfe:

ivreghdfe outcome (tr = iv), absorb(i.year i.country_num) cluster(country_num)
(MWFE estimator converged in 2 iterations)

IV (2SLS) estimation
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on country_num

Number of clusters (country_num) = 52 Number of obs = 1300
F( 1, 51) = 5.39
Prob > F = 0.0243
Total (centered) SS = 43.31870105 Centered R2 = -0.0120
Total (uncentered) SS = 43.31870105 Uncentered R2 = -0.0120
Residual SS = 43.83852081 Root MSE = .1855

         |               Robust
 outcome |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
tr | .2087836 .0899259 2.32 0.024 .0282499 .3893173
Underidentification test (Kleibergen-Paap rk LM statistic): 16.181
Chi-sq(1) P-val = 0.0001
Weak identification test (Cragg-Donald Wald F statistic): 1449.754
(Kleibergen-Paap rk Wald F statistic): 108.133
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
Instrumented: tr
Excluded instruments: iv
Partialled-out: _cons
nb: total SS, model F and R2s are after partialling-out;
any small-sample adjustments include partialled-out
variables in regressor count K
Absorbed degrees of freedom:
-----------------------------------------------------+
Absorbed FE | Categories - Redundant = Num. Coefs |
-------------+---------------------------------------|
year | 25 0 25 |
country_num | 52 52 0 *|
-----------------------------------------------------+

= FE nested within cluster; treated as redundant for DoF computation
ivreg2 outcome (tr = iv) i.year i.country_num, partial(i.year i.country_num) cluster(country_num

)

IV (2SLS) estimation
Estimates efficient for homoskedasticity only
Statistics robust to heteroskedasticity and clustering on country_num

Number of clusters (country_num) = 52 Number of obs = 1300
F( 1, 51) = 5.17
Prob > F = 0.0272
Total (centered) SS = 43.31870105 Centered R2 = -0.0120
Total (uncentered) SS = 43.31870105 Uncentered R2 = -0.0120
Residual SS = 43.83852081 Root MSE = .1836

         |               Robust
 outcome |      Coef.   Std. Err.      z    P>|z|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
tr | .2087836 .0881959 2.37 0.018 .0359229 .3816443
Underidentification test (Kleibergen-Paap rk LM statistic): 16.181
Chi-sq(1) P-val = 0.0001
Weak identification test (Cragg-Donald Wald F statistic): 1391.718
(Kleibergen-Paap rk Wald F statistic): 103.804
Stock-Yogo weak ID test critical values: 10% maximal IV size 16.38
15% maximal IV size 8.96
20% maximal IV size 6.66
25% maximal IV size 5.53
Source: Stock-Yogo (2005). Reproduced by permission.
NB: Critical values are for Cragg-Donald F statistic and i.i.d. errors.
Hansen J statistic (overidentification test of all instruments): 0.000
(equation exactly identified)
Instrumented: tr
Excluded instruments: iv
Partialled-out: 1991.year 1992.year 1993.year 1994.year 1995.year
1996.year 1997.year 1998.year 1999.year 2000.year
2001.year 2002.year 2003.year 2004.year 2005.year
2006.year 2007.year 2008.year 2009.year 2010.year
2011.year 2012.year 2013.year 2014.year 2.country_num
5.country_num 7.country_num 8.country_num 10.country_num
11.country_num 13.country_num 14.country_num
16.country_num 17.country_num 18.country_num
19.country_num 20.country_num 21.country_num
22.country_num 23.country_num 24.country_num
25.country_num 26.country_num 27.country_num
29.country_num 30.country_num 31.country_num
32.country_num 34.country_num 35.country_num
37.country_num 38.country_num 39.country_num
41.country_num 42.country_num 43.country_num
45.country_num 46.country_num 47.country_num
48.country_num 50.country_num 51.country_num
52.country_num 53.country_num 54.country_num
55.country_num 57.country_num 58.country_num
59.country_num 60.country_num 61.country_num
62.country_num 63.country_num 64.country_num
65.country_num _cons
nb: total SS, model F and R2s are after partialling-out;
any small-sample adjustments include partialled-out
variables in regressor count K

Rename

To avoid confusion with Dany Bahar's code

estimates post: matrix has missing values r(504);

IVREGHDFE returns a "estimates post: matrix has missing values r(504);" error when a perfectly collinear variable is dropped from a model.

REGHDFE, OLD works fine when this happens.

Thanks for producing such a valuable public good!

assert_msg(): 3498 cols(vars)!=cols(y)

Hi Sergio,
Thank you very much for creating ivreghdfe.

To give you some context: I have a dataset containing firms, their establishments and their location. I have more or less 5000 firms and 12000 establishments, during more than 10 years and more than 50 municipalities. Currently, I am trying to run a regression at the headquarter (hq) level; something like

ivreghdfe y (x=z) i.year i.mun_hq if hq==1, absorb(idfirm) robust first

where mun_hq is a fixed effect indicating the municipality in which the firm's headquarter is located.

After running it I am getting the following message

(dropped 553 singleton observations)
assert_msg(): 3498 cols(var)!=cols(y)
FixedEffects::_partial_out(): - function returned error
FixedEffects::partial_out(): - function returned error
: - function returned error
r(3498);

Do you have an idea of what could be wrong?

Thanks a lot

Camilo

command reghdfe_store_alphas is unrecognized

Hi,

I am using ivreghdfe and would like to save fixed effects or the residuals with

ivreghdfe price weight, absorb(fe=trunk)
ivreghdfe price weight, absorb(trunk, resid(myresidname))

In either case, I obtain the error message:

``command reghdfe_store_alphas is unrecognized".
Doing the same with reghdfe works fine.

I tried to follow all steps of an earlier post (sergiocorreia/reghdfe#148) and should have the latest versions installed (see below).

Do you know what could cause the problem?
I would be really grateful for any suggestions on how to fix it.

Best
M

which ivreghdfe
...Library/Application Support/Stata/ado/plus/i/ivreghdfe.ado
*! ivreghdfe 1.0.1 05may2020
*! this just adds absorb() to this code:
*! ivreg2 4.1.10 9Feb2016
*! authors cfb & mes
*! see end of file for version comments

. which reghdfe
...Library/Application Support/Stata/ado/plus/r/reghdfe.ado
*! version 5.8.0 27dec2019

vce(cluster var) accepted, but ignored.

I recently realized that ivreghdfe accepts the vce(cluster clustvar) syntax, but seem to ignore it in my case. Stata/BE 17, ivreghdfe, reghdfe, ftools and ranktest just updated to most recent versions as of today. The MWE below reproduces a simple example of three stacked IV regressions, where vce(cluster id) is accepted, but returns what seems to be conventional standard errors, while cluster(id) works fine. Struggled with this for days!

Thanks a ton for all the great work on coding these up, by the way, and sorry if the mistake is between the chair and the keyboard.

clear all
set obs 100
gen id=_n
gen z=rnormal()
gen D=runiform()<0.3+0.4*z
expand 3
bys id: gen t=_n
gen y=t*D+2+rnormal()+10*rnormal()*(t>2)

ivregress 2sls y (c.D#t=t#c.z) i.t //conventional s.e.'s 
ivreghdfe y (c.D#t=t#c.z), absorb(t) vce(cluster id) //also conventional s.e.'s - vce(cluster id) accepted, but ignored.
ivregress 2sls y (c.D#t=t#c.z) i.t, cluster(id) //clustered s.e.'s 
ivreghdfe y (c.D#t=t#c.z), absorb(t) cluster(id) //also clustered s.e.'s

ivreghdfe reporting of F-statistics when instruments are indicator variables

Hi Sergio,

My first stage involves instruments that are interaction terms in the form i.Z1#i.Z2, where Z1 is collinear with some of the absorbed fixed effects. When I run the ivreghdfe regressions, I get the following warnings:

warning: -ranktest- error in calculating underidentification test statistics;
may be caused by collinearities

Warning: estimated covariance matrix of moment conditions not of full rank.
overidentification statistic not reported, and standard errors and
model tests should be interpreted with caution.
Possible causes:
singleton dummy variable (dummy with one 1 and N-1 0s or vice versa)
partial option may address problem.

The results of the warning is that ivreghdfe does not report any F-statistics. I dug deeper into the cause of this problem, and I think ivreghdfe is not correctly detecting collinearities between the instruments and the fixed effects (possibly just when calculating the F-statistic?). Specifically, I do NOT have this problem when I:

Use the "old" option
Generate the interactions terms manually AND omit the correct number of terms from the first stage (but this is obviously not practical for large numbers of instruments/large datasets)
Use ivreg2

3 and 1 make me think that the problem is now ivreg2 itself but how it's implemented with ivreghdfe. I'm attaching a minimal working example of the problem as well as a log file of the results. I've verified that this is still a problem using the latest version of ivregdfe from Github (downloaded today) as well as what's on ssc.

reghdfe_problem.txt
ivreghdfe_problem_log.txt

reghdfe_store_alphas is unrecognized

When I run the sample code in Readme with option to save fixed effect, there's an error message saying the command reghdfe_store_alphas is unrecognized.

I've downloaded the latest version from Github, and the codes are below. Thank you!

. sysuse auto, clear
(1978 Automobile Data)

. ivreghdfe price weight, absorb(trunk, save)
(dropped 5 singleton observations)
(MWFE estimator converged in 1 iterations)
command reghdfe_store_alphas is unrecognized
r(199)

. which reghdfe
*! version 5.7.1 20mar2019

. which ivreghdfe
*! ivreghdfe 1.0.0  07jul2018
*! this just adds absorb() to this code:
*! ivreg2 4.1.10  9Feb2016
*! authors cfb & mes
*! see end of file for version comments

Coefficients differ when using absorb with analytic weights

Coefficients differ when using absorb with analytic weights. The issue is replicated in the code below. The change in coefficients is minor here, but it is nontrivial in the dataset I am using. Thanks!

sysuse auto
reghdfe price weight [aweight = mpg], a(turn) // coef = 4.320956
ivreg2 price weight i.turn [aweight = mpg], partial(i.turn) small // coef = 4.320956
ivreghdfe price weight i.turn [aweight = mpg] // coef = 4.320956
ivreghdfe price weight [aweight = mpg], a(turn) // coef = 4.367747

Fixed effects by combination of variables and absorbed degrees of freedom

I thank you for creating nice code, ivreghdfe!
I encountered an issue regarding the usage of fixed effects by combining a subregion variable (137 different values) and decades (three different values). I created a variable of the subregion variable#decade. When I use regdhdfe, the absorbed degree of freedom is 137 multiply 3=411, but when I do with ivreghdfe, the absorbed degrees of freedom is 137 multiply 2=274. Also, the empirical results are the same when I use only two later (consecutive) decades. For the case of using two later decades, the absorbed degrees of freedom looks normal, 274. I am so confused. Would you help me to address this issue?

Incorrect coefficient with weights

It seems that ivreghdfe does not use weights when applying the HDFE transformation? Here is an example:

which ivreghdfe
which reghdfe
which ftools
set seed 42

qui {
    clear
    set obs 10000
    gen g   = ceil(exp(rnormal() * 3))
    gen p   = ceil(runiform() * 10)
    gen e   = rnormal()
    gen z   = rnormal() * 0.5 * (p > 8)
    egen mz = mean(z), by(g p)
    gen x   = 0.2 * mz + e / 4
    gen y   = (2 * rnormal() + 5 * ((g / 50) - 0.5) + 3 * x + e) > 0
}

qui ivreghdfe y (x = mz), absorb(g)
disp %-24s "full sample", _b[x]
gen obs = 1
collapse (mean) y x mz (sum) obs, by(g p)
qui ivreghdfe y (x = mz) [fw = obs], absorb(g)
disp %-24s "collapsed", _b[x]

qui {
    egen dy  = mean(y),  by(g)
    egen dx  = mean(x),  by(g)
    egen dmz = mean(mz), by(g)
    replace dy  = y  - dy
    replace dx  = x  - dx
    replace dmz = mz - dmz
    gstats transform (demean) y x mz [fw = obs], by(g) replace
}

qui ivregress 2sls dy (dx = dmz) [fw = obs], noconstant
disp %-24s "de-mean no weights", _b[dx]
qui ivregress 2sls y  (x  =  mz) [fw = obs], noconstant
disp %-24s "de-mean with weights", _b[x]

The output:


/homes/nber/caceres-dua54762/ado/plus/i/ivreghdfe.ado
*! ivreghdfe 1.1.0  25Feb2021
*! ivreg2 4.1.11  22Nov2019
*! authors cfb & mes
*! see end of file for version comments

/homes/nber/caceres-dua54762/ado/plus/r/reghdfe.ado
*! version 6.12.1 27June2021

/homes/nber/caceres-dua54762/ado/plus/f/ftools.ado
*! version 2.48.0 29mar2021

full sample              -.12088464
collapsed                -.08857211
de-mean no weights       -.0885721
de-mean with weights     -.12088463

ivreghdfe interaction terms

Hi, could you please tell me how to run ivreghdfe with an interaction with a categorical variable? My regression without the interaction is:
ivreghdfe y (x=w), r cl(district) a(district)
I want to interact x with the categorical variable state that has categories from 1 to 50. But the following command doesn't seem to work:
ivreghdfe y (c.x##i.state=c.w##i.state), r cl(district) a(district)

Thank you!

Incorrect results with `absorb`?

I'd assume this is an issue stemming from ftools or reghdfe, since this repo hasn't changed in some time. But I am not sure what part of it is causing this issue. In any case, the results form an IV with absorb are wrong:

local sergio https://raw.githubusercontent.com/sergiocorreia

cap ado uninstall reghdfe
cap ado uninstall ftools
cap ado uninstall ivreghdfe

net install reghdfe, from(`sergio'/reghdfe/master/src/) replace
net install ftools,  from(`sergio'/ftools/master/src/)  replace

reghdfe, compile
ftools,  compile

ssc install ivreg2
net install ivreghdfe, from(`sergio'/ivreghdfe/master/src/) replace

sysuse auto, clear
xi: ivreghdfe price (mpg = foreign) i.rep78
xi: ivreg     price (mpg = foreign) i.rep78
xi: ivreghdfe price (mpg = foreign), absorb(rep78)

The first two give the exact same coefficient, 10.335, but the third gives 0.0208.

Failure to install ivreghdfe

Dear Sergio,

When I reach the final step of installation: net install ivreghdfe, from(https://raw.githubusercontent.com/sergiocorreia/ivreghdfe/master/src/), Stata reports the errors as follows:

https://raw.githubusercontent.com/sergiocorreia/ivreghdfe/master/src/ either

is not a valid URL, or
could not be contacted, or
is not a Stata download site (has no stata.toc file).

Could you please check whether there is anything wrong with the link? (the "net install" command should be fine as I install reghdfe from GitHub successfully).

Best
Wei

Cannot access r(table)

I suspect that the use of
mata: _coef_table()
on line 2541 isn't working as intended since I cannot access r(table) after estimating my model

ivreghdfe/src/ivreghdfe.ado

Line 2541 in 9681fb8

 return add // adds r(level), r(table), etc. to ereturn (before the footnote deletes them) 

"Warning: *variance matrix is nonsymmetric or highly singular" with reghdfe

I would like to absorb time FE in my regression as well as include a categorical variable among my explanatory variables (for age brackets).

When including age bracket FE as a regressor and absorbing time FE, I get the following message: "Warning: *variance matrix is nonsymmetric or highly singular" and SE are not estimated.

However, when running the exact same model while absorbing both time FE and age bracket FE I get no warning and all SE are estimated. Is it same to use these results?

PS: I am specifying robust standard errors in both estimations mentioned above. The same happens when I specify clustered standard errors for both estimations. This issue only does not happen when my standard errors are neither clustered, nor robust.

. ********************* I get the warning message when I estimate coefficients for age bracket and absorb time FE
. reghdfe avg_peer_cost iv_age iv_fem iv_uni pat_fem pat_age i.age_int, absorb(ym) vce(robust)
(MWFE estimator converged in 1 iterations)
Warning:  variance matrix is nonsymmetric or highly singular

HDFE Linear regression                            Number of obs   =  7,148,998
Absorbing 1 HDFE group                            F(  21,7148887) =   10378.63
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.0330
                                                  Adj R-squared   =     0.0330
                                                  Within R-sq.    =     0.0292
                                                  Root MSE        =  2096.0537

----------------------------------------------------------------------------------
                 |               Robust
   avg_peer_cost |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-----------------+----------------------------------------------------------------
          iv_age |  -31.59436          .        .       .            .           .
          iv_fem |    411.207          .        .       .            .           .
          iv_uni |   738.8013          .        .       .            .           .
         pat_fem |  -386.0751          .        .       .            .           .
         pat_age |   10.79922          .        .       .            .           .
                 |
         age_int |
 20 to 25 years  |  -1180.342          .        .       .            .           .
 25 to 30 years  |  -1009.705          .        .       .            .           .
 30 to 35 years  |  -795.4957          .        .       .            .           .
 35 to 40 years  |  -708.4765          .        .       .            .           .
 40 to 45 years  |  -698.3601          .        .       .            .           .
 45 to 50 years  |   -755.683          .        .       .            .           .
 50 to 55 years  |  -831.5416          .        .       .            .           .
 55 to 60 years  |  -909.2266          .        .       .            .           .
 60 to 65 years  |  -979.1657          .        .       .            .           .
 65 to 70 years  |  -935.7501          .        .       .            .           .
 70 to 75 years  |  -788.7044          .        .       .            .           .
 75 to 80 years  |  -607.2163          .        .       .            .           .
 80 to 85 years  |  -902.8926          .        .       .            .           .
 85 to 90 years  |  -975.3231          .        .       .            .           .
 90 to 95 years  |  -864.3475          .        .       .            .           .
95 to 100 years  |  -177.4297          .        .       .            .           .
                 |
           _cons |   3718.518          .        .       .            .           .
----------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ym |        90           0          90     |
-----------------------------------------------------+
. 
.********************* I don't get the warning any longer when I absorb the coefficients of age brackets together with time FE
. reghdfe avg_peer_cost iv_age iv_fem iv_uni pat_fem pat_age, absorb(ym age_int) vce(robust)
(MWFE estimator converged in 4 iterations)

HDFE Linear regression                            Number of obs   =  7,148,998
Absorbing 2 HDFE groups                           F(   5,7148887) =   39389.47
                                                  Prob > F        =     0.0000
                                                  R-squared       =     0.0330
                                                  Adj R-squared   =     0.0330
                                                  Within R-sq.    =     0.0260
                                                  Root MSE        =  2096.0537

------------------------------------------------------------------------------
             |               Robust
avg_peer_c~t |      Coef.   Std. Err.      t    P>|t|     [95% Conf. Interval]
-------------+----------------------------------------------------------------
      iv_age |  -31.59436   .2484534  -127.16   0.000    -32.08132    -31.1074
      iv_fem |    411.207   8.263385    49.76   0.000      395.011    427.4029
      iv_uni |   738.8013   4.091539   180.57   0.000      730.782    746.8205
     pat_fem |  -386.0751   2.196373  -175.78   0.000    -390.3799   -381.7703
     pat_age |   10.79922   .0335185   322.19   0.000     10.73353    10.86492
       _cons |   2902.447   12.48775   232.42   0.000     2877.971    2926.922
------------------------------------------------------------------------------

Absorbed degrees of freedom:
-----------------------------------------------------+
 Absorbed FE | Categories  - Redundant  = Num. Coefs |
-------------+---------------------------------------|
          ym |        90           0          90     |
     age_int |        17           1          16     |
-----------------------------------------------------+

struct ms_vcvorthog undefined

I'm getting the error struct ms_vcvorthog undefined when I use ivreghdfe.

I get the error from the MWE

set obs 100
generate x = _n
generate y = x + rnormal()
generate z = x + rnormal()
ivreghdfe y (x=z)

I'm using Stata 16.1 and versions of ftools, reghdfe and ivreghdfe downloaded earlier today from https://github.com/sergiocorreia/ftools/archive/master.zip, https://github.com/sergiocorreia/reghdfe/archive/master.zip, https://github.com/sergiocorreia/ivreghdfe/archive/master.zip

small correction for e(cmdline)

Hi, thanks for the plugin. e(cmdline) spits out that the command is ivreg2. Is it possible to introduce the correction that the command line is ivreghdfe instead of ivreg2?

Fix predict

sysuse auto
ivreg2hdfe price weight, a(trunk)
predict e, resid

Issue with ivreghdfe Command in Stata: "option requirements not allowed"

I'm currently working with the ivreghdfe command in Stata 18. Unexpectedly, I've started to face the following error, even though the same code and database didn't produce this error just a week ago:

option requirements not allowed
r(198);

To troubleshoot, I've updated the ftools, reghdfe, ivreg2, and ivreghdfe packages, but the issue persists.

Has anyone encountered this issue previously or have any insights into potential causes?

Thanks in advance!

Only include the absorbed regressors in DoF computation unless -partial- is set

todo: add `noabsorb` option

Note that absorb() is optional, so the functionality of noabsorb is already supported. This would only be for convenience wrt reghdfe

Use ivreghdfe followed by predictnl

Hello Sergio,

I am using ivreghdfe 1.1.1 with ivreg2 4.1.11 on Stata 17 (Windows 10).
I want to estimate the predicted probability after having run an IV regression of the log odds ratio on covariates and fixed effects.
Here is what I run:

ivreghdfe log_odds_ratio (X = Z ) C [pw=weights], absorb(year county_fe) cluster(state)
predictnl pred_prob=exp(predict(xbd))/(1+exp(predict(xbd))) , se(pred_prob_se)
which returns:
you must add the resid option to reghdfe before running this prediction
predict(xbd) invalid
r(198);

then adding the resid option returns:
ivreghdfe log_odds_ratio (X = Z ) C [pw=weights], absorb(year county_fe) cluster(state) resid
predictnl pred_prob=exp(predict(xbd))/(1+exp(predict(xbd))) , se(pred_prob_se)

expression is a function of possibly stochastic quantities other than e(b)
r(498);

(Note: using margins instead of predictnl returns the exact same errors).

Any suggestions on how to estimate non-linear predictions with many fixed?

Thanks a lot!

sergiocorreia / ivreghdfe Goto Github PK

ivreghdfe's People

Contributors

Stargazers

Watchers

Forkers

ivreghdfe's Issues

Recommend Projects

Recommend Topics

Recommend Org