Git Product home page Git Product logo

Comments (4)

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024

I'll preface this by saying this is a relatively long response, so please let me know if you have any questions.

The code in fe.py will not work for your needs. It requires both a worker id and a firm id. Additionally, the code as it is currently written does not accommodate any additional covariates. The reason for this is that the code takes advantage of the sparsity of dummy variables.

In theory, this could be extended to allow for any covariates that are dummies, as is true in your case - but the code does not currently allow this. As a possible workaround, you could treat one of your characteristics as a "worker id" which would allow you to estimate the model. But it sounds like you are trying to estimate more than one characteristic, so this wouldn't be all that helpful. It also wouldn't help with estimating the firm effects alone.

My advice is to build off the code in fe.py to work for your needs. You can write out the least-squares estimator using block-matrix notation (use the formula here). This allows you to a) take advantage of sparsity, and b) only need to invert a much smaller matrix (you can use quadratic programming solvers to make this faster, we use qpsolvers in PyTwoWay).

Once you have the OLS estimates, computing the homoskedastic bias correction is relatively straightforward. You just compute the biased estimator, then estimate the correction term. The correction term requires estimating sigma_hat^2 and a trace term. You can use a numerical trace approximation to easily estimate this term (the code for this starts at line 548 in fe.py). Then you just take fe_biased - sigma_hat^2 * trace_term and this gives you the unbiased estimator.

The heteroskedastic correction is more work. I recommend you check out KSS (Appendix B: Computation, starting at page 56). Please note that what they call the "leave-two-out connected set" is really called "biconnected components" and you can easily find preexisting code to compute this (we use biconnected_components() in networkx).

Finally, please note that we wrote the code for our estimators so that they can (but don't have to) be run on data that is collapsed at the spell level (so any worker-firm match is collapsed into a single observation, where the wage is the mean over the spell). But this requires re-deriving all the estimators using a "collapse" matrix. So if you want to collapse your data, you will have to derive this, but if you ask I may have time to get some of the math together and send it to you. If you don't want to deal with this, the estimators will all work without collapsing the data (although you can't get around the issue of correlated errors within spells without collapsing the data first). But then you have to make sure not to collapse your data before running the estimators.

I hope this helps!

from pytwoway.

Alalalalaki avatar Alalalalaki commented on August 30, 2024

@adamoppenheimer Thanks so much for your long response. I have several relevant questions. Sorry in advance for any of my ignorance.

As a possible workaround, you could treat one of your characteristics as a "worker id" which would allow you to estimate the model. But it sounds like you are trying to estimate more than one characteristic, so this wouldn't be all that helpful. It also wouldn't help with estimating the firm effects alone.

Can I combine several characteristics to create one category, say [Edu=B&Exp=1&Sex=F, ...] and then estimate the model? Does this make sense for the code here? One thing I am worried about on the KSS estimator is that because the worker category is less granular than firm ids, code might not find a leave-one-out connected subset within the largest connected set. Actually given the fact that there is no movers in my data at all, I am not very clear how should the connected set and leave-one/two-out connected subset be constructed.

My advice is to build off the code in fe.py to work for your needs. You can write out the least-squares estimator using block-matrix notation (use the formula here).

This is one of the problem I encounter. I am not familiar with the methods that are used to solve the LS estimators with high-dimensional FEs. I have tried some package to do this, e.g. the AbsorbingLS in linearmodels which uses "LSMR which avoids inverting or even constructing the inner product of the regressors. This is combined with Frish-Waugh-Lovell to orthogonalize x and y from z. (y=xβ+zγ+ε)". I wonder what's the difference between the block-matrix method and the LSMR method?

Once you have the OLS estimates, computing the homoskedastic bias correction is relatively straightforward. You just compute the biased estimator, then estimate the correction term. The correction term requires estimating sigma_hat^2 and a trace term. You can use a numerical trace approximation to easily estimate this term (the code for this starts at line 548 in fe.py)

I guess this trace term is the B_ii in the KSS paper. I have tried to read the code that you point out but I cannot understand what the code does although it looks not complicated. Is this approximation the JLA described in KSS paper?

The heteroskedastic correction is more work. I recommend you check out KSS (Appendix B: Computation, starting at page 56). Please note that what they call the "leave-two-out connected set" is really called "biconnected components" and you can easily find preexisting code to compute this (we use biconnected_components() in networkx).

Again I am not clear how and why should I construct the connected set when my data has no movers at all.

Finally, please note that we wrote the code for our estimators so that they can (but don't have to) be run on data that is collapsed at the spell level (so any worker-firm match is collapsed into a single observation, where the wage is the mean over the spell). ... But then you have to make sure not to collapse your data before running the estimators.

Do you mean that the code in fe.py is used for a data where all worker-firm matches are simplified by their mean and weight in total sample? What's the benefit of doing this? And if I don't want to collapse the data, the algorithm here still works but there would be some issues of the correlated errors within spells (ain't these within errors just into the error term)?

from pytwoway.

Alalalalaki avatar Alalalalaki commented on August 30, 2024

After studying your code for a while, I find what I really struggle to understand is the M and __mult_AAinv. I find it quite hard to follow the code with A splitting to J and W because in the KSS paper they are written together. It would be greatly appreciated if you can point to some sources of the formula that I can learn from. And can I simply replace W with a matrix of X that is the characteristics of workers?

Another part I don't understand is that if we can get Sxx^-1 through __mult_AAinv, why do we still need to do the trace approximation but not calculate Bii directly? And btw there is a non-linearity bias introduced by approximating Pii which is removed from HE estimated σ in KSS paper but I don't find it in the code, is it not important?

from pytwoway.

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024

Hi, I really apologize for the long response time.

I think I have a better understanding of the theory and code now, so I should be able to give a better response. In addition, the code has seen many updates - it is much more reliable now than it was when you originally asked your questions.

If you are still interested in using the package, I highly recommend checking out the documentation to see what has changed.

Please let me know if any of what I wrote below isn't clear, or if you have any other questions!

-----No workers-----
There is actually a very simple way to run the estimator if you don't have workers - simply set all observations to have the same worker id. When the model estimates, it normalizes the maximum firm id to have 0 fixed effect. Because of this, the single "worker effect" will actually just be the mean from the regression, accounting for the fact that one of the firms is normalized to have effect 0.

If you run this estimator, you can compute the bias-corrected variance of the firm effects. You can also compute the bias-corrected variance of the worker effect and bias-corrected covariance of the worker and firm effects, but these will just be 0.

In addition, you can extract the estimated effects for all the workers by setting 'attach_fe_estimates'=True in your FE estimation parameters (this is described in the documentation), or you can extract just the psi_hat and alpha_hat using the class attributes .psi_hat and .alpha_hat (note that .psi_hat will not contain the normalized firm, you can add this by appending a 0 at the end of the array).

-----M-----
The code for M and __mult_AAinv takes advantage of the structure of the linear regression. Since the regression involves two categorical variables, we can write the problem as follows:

Y = A * gamma + epsilon

Then, we can split A into two components, J for firm effects, and W for worker effects. Similarly, we can split gamma into psi for firm effects and alpha for worker effects.

Then we can write

Y = [J | W] * [psi' | alpha']' + epsilon

If you go through the matrix algebra for the OLS estimator of gamma, you can write it in block matrix notation and take advantage of the easy to compute block-matrix inverse (which is shown here). This allows for some big speedups. The one downside of this approach is that it prevents computationally feasible estimation of AKM and its bias corrections while using control variables.

-----LSMR-----
I apologize, but I don't know the LSMR method. I hope my explanation above makes it clear what our code is doing.

-----Combining Characteristics to Create Categories-----
It would definitely be possible to combine characteristics to create categories, and treat those as "worker ids". However, you would need to look into whatever literature you are studying to see if that is an acceptable step to take.

In addition, you would then need the data to be "connected" (which, as you pointed out, needs some justifications to make sense). If the data isn't connected, the estimators won't be identified, so either the code would crash, or it would give nonsensical results.

-----Replacing W with X-----
Unfortunately, as I wrote above, the approach we use in this code does not allow for the feasible use of control variables. The only option you would have would be to substitute W with another categorical variable - but as I said above, the data would then need to be "connected".

-----Bii-----
You are correct that the trace approximation is used for computing the Bii. There is now an option to compute the Bii analytically (which I believe is what you are proposing). This is very computationally costly - I believe you need to estimate the Bii separately for every observation, which could be in the millions of even more (the implementation I wrote uses the M matrix, so it's somewhat efficient, but it's still slow). On the other hand, the trace approximation is relatively accurate with as few as 50 or so trace draws.

However, if I am incorrect about this, please let me know - I would greatly appreciate it if there is a way to speed up the code while also removing an unnecessary approximation!

-----Non-linearity bias-----
Thank you for pointing this out! The correction is now included in our code. However, since I added the code, I haven't noticed any significant changes - as you suggested, I don't think it has much of an effect.

from pytwoway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.