Hi, the example with Sorkin estimator is a bit unclear to me: when I

Hi Samuel, Everything will be linked to the new <code class="notrans

Matching V_EE estimates from Sorkin estimator back to firm IDs about pytwoway HOT 10 CLOSED

samuelskoda commented on August 30, 2024

Matching V_EE estimates from Sorkin estimator back to firm IDs

from pytwoway.

Comments (10)

adamoppenheimer commented on August 30, 2024 1

Since you figured it out, I'll mark this as closed.

Also, to simplify your code a bit, if you just want the vector of firm fixed effects, you can extract them by running np.append(0, fe_estimator.psi_hat) (the first firm is normalized to 0, so it needs to be added back).

Best,
Adam

from pytwoway.

adamoppenheimer commented on August 30, 2024

Hi Samuel,

Everything will be linked to the new j ids.

Try out this code to simulate data (this is code I eventually plan to add to the package, but I don't have time right now to check if this works, please let me know if it's not working):

import numpy as np
import pandas as pd
import bipartitepandas as bpd
import pytwoway as tw

n_workers = 750
n_firms = 5
n_periods = 2
## Firm and worker effects ##
psi = np.random.normal(size=n_firms)
exp_psi = np.exp(psi)
exp_psi /= exp_psi.sum()
psi = np.log(exp_psi)
## Simulate data ##
i_sim = np.repeat(np.arange(n_workers), n_periods)
t_sim = np.tile(np.arange(n_periods), n_workers)
j_sim = np.zeros((n_workers, n_periods), dtype=int)
y_sim = np.zeros(n_workers * n_periods)

## Simulate firms ##
for i in range(n_workers):
    # Simulate first firm randomly
    j1 = np.random.choice(range(n_firms))
    j_sim[i, 0] = j1
    j_prev = j1
    prev_match_draw = psi[j1] + np.random.gumbel()
    for t in range(1, n_periods):
        # Now worker moves to next firm with probability exp(jt) / (exp(jt) + exp(j1))
        jt = np.random.choice(list(range(j_prev)) + list(range(j_prev + 1, n_firms)))
        match_draw = psi[jt] + np.random.gumbel()
        if np.random.uniform() < exp_psi[jt] / (exp_psi[jt] + exp_psi[j_prev]):
#         if match_draw >  prev_match_draw:
            # Move!
            j_sim[i, t] = jt
            j_prev = jt
            prev_match_draw = match_draw
        else:
            j_sim[i, t] = j_prev

j_sim = j_sim.flatten()

## Construct BipartitePandas DataFrame ##
default = True
if default:
    clean_params = bpd.clean_params({'connectedness': 'strongly_connected', 'verbose': False})
    sim_data = bpd.BipartiteDataFrame(i=i_sim, j=j_sim, y=y_sim, t=t_sim, track_id_changes=True).clean(clean_params)
    sim_data = sim_data.collapse(is_sorted=True, copy=False)
    sim_es = sim_data.to_eventstudy(is_sorted=True, copy=False)
else:
    ## Construct BipartitePandas DataFrame ##
    clean_params = bpd.clean_params({'connectedness': 'strongly_connected', 'drop_returns': 'returns', 'verbose': False})
    sim_data = bpd.BipartiteDataFrame(i=i_sim, j=j_sim, y=y_sim, t=t_sim, track_id_changes=True).clean(clean_params)
    sim_data = sim_data.collapse(is_sorted=True, copy=False)
    sim_es = sim_data.to_permutedeventstudy(is_sorted=True, copy=False)
sim_es['w'] = sim_es['w2'] / sim_es['w1']

sorkin_sim = tw.SorkinEstimator()
sorkin_sim.fit(sim_es)

M0 = sim_es.groupby(['j1', 'j2'])['w'].sum().unstack(fill_value=0).to_numpy().T
S0inv = np.diag(1 / M0.sum(axis=0))
print(np.linalg.eig(S0inv @ M0)[0][0])
evec = np.real(np.linalg.eig(S0inv @ M0)[1][:, 0])
evec /= evec.sum()
np.log(evec)

print(np.corrcoef(sorkin_sim.V_EE, psi)[0, 1])
print(np.corrcoef(np.log(evec), psi)[0, 1])

Best,
Adam

from pytwoway.

samuelskoda commented on August 30, 2024

Thanks, the code works but unfortunately I am still confused. Ultimately, I want to correlate AKM firm fixed effects with Sorkin estimates. The way I am doing it is that after estimating Sorkin I attach original IDs to the event-study format dataframe (dataframe=bdf.original_ids()), save j1 and original_j1 and match the V_EE output to j1. Then I use original_j1to match to similarly saved AKM estimates. Does that makes sense?

from pytwoway.

adamoppenheimer commented on August 30, 2024

You can use the option attach_fe_estimates to store the FE results in your dataframe automatically. If you're using FEEstimator this will attach the estimates, but if you are using FEControlEstimator you can specify True to store psi and alpha, or 'all' to store all the parameters you estimate.

Please let me know if this helps.

Best,
Adam

from pytwoway.

samuelskoda commented on August 30, 2024

Right, I am using attach_fe_estimates for obtaining AKM linked to original ID. My question was on making sure that I am correctly matching Sorkin estimates to the same original ID as in AKM estimation. I am sorry if I am not being super clear. I was doing it the way described above because attached FE estimates were dropped when I transformed data to the event study format needed for Sorkin estimator.

from pytwoway.

adamoppenheimer commented on August 30, 2024

The attached FE estimates should not be dropped when you transform the data as long as you convert it to a BipartitePandas dataframe with all the columns before converting, can you send the code you're using before converting it to event study format?

from pytwoway.

samuelskoda commented on August 30, 2024

I think I figured it out, I first had to use .add_column() method to avoid dropping the psi_hat estimates.

dataraw=Data.get('i j year  wages ', valuelabel=True,missingval=np.nan)
df=pd.DataFrame(dataraw,columns=['i', 'j',  't', 'y'])

bdf=bpd.BipartiteDataFrame(df, track_id_changes=True)
bdf=bdf.clean(clean_params)
bdf=bdf.collapse(is_sorted=True, copy=False)

fe_estimator=tw.FEEstimator(bdf,fe_params)
fe_estimator.fit()
fe_estimator.summary

bdf['akm']=bdf['psi_hat'].astype('float')
bdf=bdf.add_column(col_name='akm', dtype='float', how_collapse='mean', long_es_split=True)

bdf=bdf.to_eventstudy(is_sorted=True, copy=False)

akm_series=bdf.groupby('j1').first()['akm1']

sorkin_estimator = tw.SorkinEstimator()
sorkin_estimator.fit(bdf)

np.corrcoef(akm_series, sorkin_estimator.V_EE)[0,1]

from pytwoway.

MartinFriedrich93 commented on August 30, 2024

Hello,

I had the same question because I want to export the estimated v_ee values along with the firm IDs. I am not sure whether I have done it right. Can you please have a quick look at the bottom part of my code and confirm that I paired the estimated v_ee values to the correct firm IDs (j)?

My question, in other words, is whether the index of the object Series(sorkin_estimator.V_EE) is equal to the index of the object bdf['j1'].drop_duplicates().to_frame().sort_values('j1').reset_index(drop=True) .

Thank you!
Best
Martin

import pandas as pd
from pandas import Series
import pytwoway as tw
import bipartitepandas as bpd
 
 
# Cleaning
clean_params = bpd.clean_params(
    {
        'connectedness': 'strongly_connected',
        'drop_single_stayers': True,
        'drop_returns': 'returners',
        'copy': False
    }
)
# Simulating
sim_params = bpd.sim_params(
    {
        'n_workers': 1000,
        'firm_size': 5,
        'alpha_sig': 2, 'w_sig': 2,
        'c_sort': 1.5, 'c_netw': 1.5,
        'p_move': 0.1
    }
)
 
 
sim_data = bpd.SimBipartite(sim_params).simulate()
 
 
# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean
bdf = bdf.clean(clean_params)
# Collapse
bdf = bdf.collapse(is_sorted=True, copy=False)
# Convert to event study format
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)
 
# Initialize Sorkin estimator
sorkin_estimator = tw.SorkinEstimator()
# Fit Sorkin estimator
sorkin_estimator.fit(bdf)
 
# export Sorkin's v_ee values along with firm IDs
v_ee_values = Series(sorkin_estimator.V_EE).to_frame(name='v_ee_values').reset_index()
v_ee_values['j1'] = v_ee_values['index']
del v_ee_values['index']
bdf_js = bdf['j1'].drop_duplicates().to_frame().sort_values('j1').reset_index(drop=True)
#print(bdf_js.head(100))
df_export = pd.merge(bdf_js, v_ee_values, on='j1').reset_index(drop=True)
print(df_export.head(10))
#df_export.to_csv('v_ee_values.csv')

from pytwoway.

adamoppenheimer commented on August 30, 2024

Hi Martin,

Your code looks correct. However, I would recommend not merging on j1 since it's possible a firm only shows up in the second period.

I would also recommend taking advantage of NumPy indexing since it's simpler than merging.

This is my proposed merging code:

bdf2 = bdf.to_long()
bdf2['v_ee_values'] = sorkin_estimator.V_EE[bdf2['j']]
df_export = bdf2.groupby('j')['v_ee_values'].first().reset_index()

Best,
Adam

from pytwoway.

MartinFriedrich93 commented on August 30, 2024

This is great! Thank you so much, Adam!

Best
Martin

from pytwoway.

Matching V_EE estimates from Sorkin estimator back to firm IDs about pytwoway HOT 10 CLOSED

Comments (10)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent