Git Product home page Git Product logo

Comments (10)

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024 1

Since you figured it out, I'll mark this as closed.

Also, to simplify your code a bit, if you just want the vector of firm fixed effects, you can extract them by running np.append(0, fe_estimator.psi_hat) (the first firm is normalized to 0, so it needs to be added back).

Best,
Adam

from pytwoway.

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024

Hi Samuel,

Everything will be linked to the new j ids.

Try out this code to simulate data (this is code I eventually plan to add to the package, but I don't have time right now to check if this works, please let me know if it's not working):

import numpy as np
import pandas as pd
import bipartitepandas as bpd
import pytwoway as tw

n_workers = 750
n_firms = 5
n_periods = 2
## Firm and worker effects ##
psi = np.random.normal(size=n_firms)
exp_psi = np.exp(psi)
exp_psi /= exp_psi.sum()
psi = np.log(exp_psi)
## Simulate data ##
i_sim = np.repeat(np.arange(n_workers), n_periods)
t_sim = np.tile(np.arange(n_periods), n_workers)
j_sim = np.zeros((n_workers, n_periods), dtype=int)
y_sim = np.zeros(n_workers * n_periods)

## Simulate firms ##
for i in range(n_workers):
    # Simulate first firm randomly
    j1 = np.random.choice(range(n_firms))
    j_sim[i, 0] = j1
    j_prev = j1
    prev_match_draw = psi[j1] + np.random.gumbel()
    for t in range(1, n_periods):
        # Now worker moves to next firm with probability exp(jt) / (exp(jt) + exp(j1))
        jt = np.random.choice(list(range(j_prev)) + list(range(j_prev + 1, n_firms)))
        match_draw = psi[jt] + np.random.gumbel()
        if np.random.uniform() < exp_psi[jt] / (exp_psi[jt] + exp_psi[j_prev]):
#         if match_draw >  prev_match_draw:
            # Move!
            j_sim[i, t] = jt
            j_prev = jt
            prev_match_draw = match_draw
        else:
            j_sim[i, t] = j_prev

j_sim = j_sim.flatten()

## Construct BipartitePandas DataFrame ##
default = True
if default:
    clean_params = bpd.clean_params({'connectedness': 'strongly_connected', 'verbose': False})
    sim_data = bpd.BipartiteDataFrame(i=i_sim, j=j_sim, y=y_sim, t=t_sim, track_id_changes=True).clean(clean_params)
    sim_data = sim_data.collapse(is_sorted=True, copy=False)
    sim_es = sim_data.to_eventstudy(is_sorted=True, copy=False)
else:
    ## Construct BipartitePandas DataFrame ##
    clean_params = bpd.clean_params({'connectedness': 'strongly_connected', 'drop_returns': 'returns', 'verbose': False})
    sim_data = bpd.BipartiteDataFrame(i=i_sim, j=j_sim, y=y_sim, t=t_sim, track_id_changes=True).clean(clean_params)
    sim_data = sim_data.collapse(is_sorted=True, copy=False)
    sim_es = sim_data.to_permutedeventstudy(is_sorted=True, copy=False)
sim_es['w'] = sim_es['w2'] / sim_es['w1']

sorkin_sim = tw.SorkinEstimator()
sorkin_sim.fit(sim_es)

M0 = sim_es.groupby(['j1', 'j2'])['w'].sum().unstack(fill_value=0).to_numpy().T
S0inv = np.diag(1 / M0.sum(axis=0))
print(np.linalg.eig(S0inv @ M0)[0][0])
evec = np.real(np.linalg.eig(S0inv @ M0)[1][:, 0])
evec /= evec.sum()
np.log(evec)

print(np.corrcoef(sorkin_sim.V_EE, psi)[0, 1])
print(np.corrcoef(np.log(evec), psi)[0, 1])

Best,
Adam

from pytwoway.

samuelskoda avatar samuelskoda commented on August 30, 2024

Thanks, the code works but unfortunately I am still confused. Ultimately, I want to correlate AKM firm fixed effects with Sorkin estimates. The way I am doing it is that after estimating Sorkin I attach original IDs to the event-study format dataframe (dataframe=bdf.original_ids()), save j1 and original_j1 and match the V_EE output to j1. Then I use original_j1to match to similarly saved AKM estimates. Does that makes sense?

from pytwoway.

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024

You can use the option attach_fe_estimates to store the FE results in your dataframe automatically. If you're using FEEstimator this will attach the estimates, but if you are using FEControlEstimator you can specify True to store psi and alpha, or 'all' to store all the parameters you estimate.

Please let me know if this helps.

Best,
Adam

from pytwoway.

samuelskoda avatar samuelskoda commented on August 30, 2024

Right, I am using attach_fe_estimates for obtaining AKM linked to original ID. My question was on making sure that I am correctly matching Sorkin estimates to the same original ID as in AKM estimation. I am sorry if I am not being super clear. I was doing it the way described above because attached FE estimates were dropped when I transformed data to the event study format needed for Sorkin estimator.

from pytwoway.

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024

The attached FE estimates should not be dropped when you transform the data as long as you convert it to a BipartitePandas dataframe with all the columns before converting, can you send the code you're using before converting it to event study format?

from pytwoway.

samuelskoda avatar samuelskoda commented on August 30, 2024

I think I figured it out, I first had to use .add_column() method to avoid dropping the psi_hat estimates.

dataraw=Data.get('i j year  wages ', valuelabel=True,missingval=np.nan)
df=pd.DataFrame(dataraw,columns=['i', 'j',  't', 'y'])

bdf=bpd.BipartiteDataFrame(df, track_id_changes=True)
bdf=bdf.clean(clean_params)
bdf=bdf.collapse(is_sorted=True, copy=False)

fe_estimator=tw.FEEstimator(bdf,fe_params)
fe_estimator.fit()
fe_estimator.summary

bdf['akm']=bdf['psi_hat'].astype('float')
bdf=bdf.add_column(col_name='akm', dtype='float', how_collapse='mean', long_es_split=True)

bdf=bdf.to_eventstudy(is_sorted=True, copy=False)

akm_series=bdf.groupby('j1').first()['akm1']

sorkin_estimator = tw.SorkinEstimator()
sorkin_estimator.fit(bdf)

np.corrcoef(akm_series, sorkin_estimator.V_EE)[0,1]

from pytwoway.

MartinFriedrich93 avatar MartinFriedrich93 commented on August 30, 2024

Hello,

I had the same question because I want to export the estimated v_ee values along with the firm IDs. I am not sure whether I have done it right. Can you please have a quick look at the bottom part of my code and confirm that I paired the estimated v_ee values to the correct firm IDs (j)?

My question, in other words, is whether the index of the object Series(sorkin_estimator.V_EE) is equal to the index of the object bdf['j1'].drop_duplicates().to_frame().sort_values('j1').reset_index(drop=True) .

Thank you!
Best
Martin

import pandas as pd
from pandas import Series
import pytwoway as tw
import bipartitepandas as bpd
 
 
# Cleaning
clean_params = bpd.clean_params(
    {
        'connectedness': 'strongly_connected',
        'drop_single_stayers': True,
        'drop_returns': 'returners',
        'copy': False
    }
)
# Simulating
sim_params = bpd.sim_params(
    {
        'n_workers': 1000,
        'firm_size': 5,
        'alpha_sig': 2, 'w_sig': 2,
        'c_sort': 1.5, 'c_netw': 1.5,
        'p_move': 0.1
    }
)
 
 
sim_data = bpd.SimBipartite(sim_params).simulate()
 
 
# Convert into BipartitePandas DataFrame
bdf = bpd.BipartiteDataFrame(sim_data)
# Clean
bdf = bdf.clean(clean_params)
# Collapse
bdf = bdf.collapse(is_sorted=True, copy=False)
# Convert to event study format
bdf = bdf.to_eventstudy(is_sorted=True, copy=False)
 
# Initialize Sorkin estimator
sorkin_estimator = tw.SorkinEstimator()
# Fit Sorkin estimator
sorkin_estimator.fit(bdf)
 
# export Sorkin's v_ee values along with firm IDs
v_ee_values = Series(sorkin_estimator.V_EE).to_frame(name='v_ee_values').reset_index()
v_ee_values['j1'] = v_ee_values['index']
del v_ee_values['index']
bdf_js = bdf['j1'].drop_duplicates().to_frame().sort_values('j1').reset_index(drop=True)
#print(bdf_js.head(100))
df_export = pd.merge(bdf_js, v_ee_values, on='j1').reset_index(drop=True)
print(df_export.head(10))
#df_export.to_csv('v_ee_values.csv')

from pytwoway.

adamoppenheimer avatar adamoppenheimer commented on August 30, 2024

Hi Martin,

Your code looks correct. However, I would recommend not merging on j1 since it's possible a firm only shows up in the second period.

I would also recommend taking advantage of NumPy indexing since it's simpler than merging.

This is my proposed merging code:

bdf2 = bdf.to_long()
bdf2['v_ee_values'] = sorkin_estimator.V_EE[bdf2['j']]
df_export = bdf2.groupby('j')['v_ee_values'].first().reset_index()

Best,
Adam

from pytwoway.

MartinFriedrich93 avatar MartinFriedrich93 commented on August 30, 2024

This is great! Thank you so much, Adam!

Best
Martin

from pytwoway.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.