Git Product home page Git Product logo

pytwoway's Introduction

PyTwoWay

https://circleci.com/gh/tlamadon/pytwoway/tree/master.svg?style=shield https://img.shields.io/badge/doc-latest-blue https://badgen.net/badge//gh/pytwoway?icon=github

PyTwoWay is the Python package associated with the following paper:

"How Much Should we Trust Estimates of Firm Effects and Worker Sorting?" by Stéphane Bonhomme, Kerstin Holzheu, Thibaut Lamadon, Elena Manresa, Magne Mogstad, and Bradley Setzler. No. w27368. National Bureau of Economic Research, 2020.

The package provides implementations for a series of estimators for models with two sided heterogeneity:

  1. two way fixed effect estimator as proposed by Abowd, Kramarz, and Margolis
  2. homoskedastic bias correction as in Andrews, et al.
  3. heteroskedastic bias correction as in Kline, Saggio, and Sølvsten
  4. group fixed estimator as in Bonhomme, Lamadon, and Manresa
  5. group correlated random effect as presented in the main paper
  6. fixed-point revealed preference estimator as in Sorkin
  7. estimator as in Borovičková and Shimer for a modified definition of sorting

If you want to give it a try, you can start an example notebook for the FE estimator here: binder_fe for the CRE estimator here: binder_cre for the BLM estimator here: binder_blm for the Sorkin estimator here: binder_sorkin and for the Borovickova-Shimer estimator here: binder_bs . These start fully interactive notebooks with simple examples that simulate data and run the estimators.

The package provides a Python interface. Installation is handled by pip or Conda (TBD). The source of the package is available on GitHub at PyTwoWay. The online documentation is hosted here.

The code is relatively efficient. A benchmark below compares PyTwoWay's speed with that of LeaveOutTwoWay, a MATLAB package for estimating AKM and its bias corrections.

Quick Start

To install via pip, from the command line run:

pip install pytwoway

To make sure you are running the most up-to-date version of PyTwoWay, from the command line run:

pip install --upgrade pytwoway

Please DO NOT download the Conda version of the package, as it is outdated!

Help with Running the Package

Please check out the documentation for detailed examples of how to use PyTwoWay. If you have a question that the documentation doesn't answer, please also check the past Issues to see if someone else has already asked this question and an answer has been provided. If you still can't find an answer, please open a new Issue and we will try to answer as quickly as possible.

Benchmarking

Data is simulated from BipartitePandas using the following code:

import numpy as np
import bipartitepandas as bpd

sim_params = bpd.sim_params({'n_workers': 500000, 'firm_size': 10, 'p_move': 0.05})
rng = np.random.default_rng(1234)

sim_data = bpd.SimBipartite(sim_params).simulate(rng)

This data is then estimated using the PyTwoWay class FEEstimator and using the MATLAB package LeaveOutTwoWay. For estimation using PyTwoWay, all estimators other than AMG use the incomplete Cholesky decomposition as a preconditioner.

Results are estimated on a 2021 MacBook Pro 14" with 16 GB Ram and an Apple M1 Pro processor with 8 cores.

Some summary statistics about the largest leave-one-match-out set:

Package #obs #firms #movers
KSS 2,255,370 44,510 88,542
PyTwoWay 2,269,665 44,601 89,098

Run time:

Solver Cleaning Estimation Total
KSS N/A N/A 55.2s
PYTW-AMG 4.0s 3m2s 3m6s
PYTW-BICG 4.0s 20.4s 24.4s
PYTW-BICGSTAB 4.0s 21.9s 25.9s
PYTW-CG 4.0s 19.6s 23.6s
PYTW-CGS 4.0s 20.6s 24.6s
PYTW-GMRES 4.0s 32.9s 36.9s
PYTW-MINRES 4.0s 10.7s 14.7s
PYTW-QMR 4.0s 3m53s 3m57s

Contributing to the Package

If you want to contribute to the package, the easiest way is to test that it's working properly! If you notice a part of the package is giving incorrect results, please add a new post in Issues and we will do our best to fix it as soon as possible.

We are also happy to consider any suggestions to improve the package and documentation, whether to add a new feature, make a feature more user-friendly, or make the documentation clearer. Please also post suggestions in Issues.

Finally, if you would like to help with developing the package, please make a fork of the repository and submit pull requests with any changes you make! These will be promptly reviewed, and hopefully accepted!

We are extremely grateful for all contributions made by the community!

Dependencies

Solving large sparse linear models relies on a combination of PyAMG (this is the package we use to estimate the different decompositions on US data) and SciPy's iterative sparse linear solvers.

Many tools for handling sparse matrices come from SciPy.

Additional preconditioners for linear solvers come from PyMatting (installing the package is not required, as the necessary files have been copied into the submodule preconditioners). The incomplete Cholesky preconditioner in turn relies on Numba.

Constrained optimization is handled by QPSolvers.

Progress bars are generated with tqdm.

Parameter dictionaries are constructed using ParamsDict.

Data cleaning is handled by BipartitePandas.

We also rely on a number of standard libraries, such as NumPy, Pandas, matplotlib, etc.

Optionally, the code is compatible with: - multiprocess. Installing this may help if multiprocessing is raising errors related to pickling objects. - PyTorch. This may speed up BLM estimation, and adds the option to compute some operations using the GPU.

Citation

Please use following citation to cite PyTwoWay in academic publications:

Bibtex entry:

@techreport{bhlmms2020,
  title={How Much Should We Trust Estimates of Firm Effects and Worker Sorting?},
  author={Bonhomme, St{\'e}phane and Holzheu, Kerstin and Lamadon, Thibaut and Manresa, Elena and Mogstad, Magne and Setzler, Bradley},
  year={2020},
  institution={National Bureau of Economic Research}
}

Authors

Thibaut Lamadon, Assistant Professor in Economics, University of Chicago, [email protected]

Adam A. Oppenheimer, Research Professional, University of Chicago, [email protected]

pytwoway's People

Contributors

adamoppenheimer avatar economoser avatar tlamadon avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar

pytwoway's Issues

Issues importing pytwoway

Hi,
I'm having issues using pytwoway. I use Spyder as my IDE, although the same results are in Jupyter/Xcode, all running on lates MacOS. More specifically, when I try running "import pytwoway as tw" I get the following error:

runfile('/Users/PhD/Papers/AKM/Code/blm/code/untitled0.py',
wdir='/Users/PhD/Papers/AKM/Code/blm/code')
Traceback (most recent call last):

File ~/PhD/Papers/AKM/Code/blm/code/untitled0.py:9 in
import pytwoway as tw

File ~/opt/anaconda3/envs/blm/lib/python3.9/site-packages/pytwoway/init.py:3 in
from .twoway import TwoWay

File ~/opt/anaconda3/envs/blm/lib/python3.9/site-packages/pytwoway/twoway.py:8 in
class TwoWay():

File ~/opt/anaconda3/envs/blm/lib/python3.9/site-packages/pytwoway/twoway.py:104 in TwoWay
def cluster(self, measures=bpd.measures.cdfs(), grouping=bpd.grouping.kmeans(), stayers_movers=None, t=None, weighted=True, dropna=False):

AttributeError: module 'bipartitepandas.measures.measures' has no attribute 'cdfs'

I appreciate any help! Thanks!

leedtwoway in Stata 16 fails to find the command pytw and gives an error saying 'unable to open URL'

Hi,

I am trying to use leedtwoway in Stata 16 to implement pytwoway using Stata.
Using the sample data and the configuration file provided in https://tlamadon.github.io/pytwoway/doc-stata.html, I ran leedtwoway and got an error that reads as:

zsh:1: command not found: pytw
Error -601, Unable to open URL: file not found
Warning: No response from source?!?

I am using Mac, but when I use Windows with my laptop, it shows the same last two lines of the errors above.
Just in case, python version is 3.9.13 and the version of pytwoway installed is 0.3.21.

Could you please let me know how I could resolve this error?

Thank you for your help and consideration!

Fitting the FE and CRE estimators fail on the simulated data example

Hi,

in a fresh virtualenv with Python 3.9.6 installed I try to run the code for the simulated example provided in the online documentation.

import pytwoway as tw
from bipartitepandas import SimBipartite
# Create SimTwoWay object
sbp_net = SimBipartite()
# Generate data
sim_data = sbp_net.sim_network()
# Below is identical to first example, except we are now using the simulated data
# Create TwoWay object
tw_net = tw.TwoWay(sim_data)
# Fit the FE estimators:
tw_net.fit_fe()

Expected behaviour

On simulated data, I can fit the FE estimator without exceptions being raised.

Observed behaviour

The function tw_net.fit_fe() fails with the following error

Traceback (most recent call last):

 File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'm'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pytwoway/twoway.py", line 147, in fit_fe
    fe_solver.fit_1()
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pytwoway/fe.py", line 193, in fit_1
    self.__prep_vars() # Prepare data
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pytwoway/fe.py", line 237, in __prep_vars
    self.res['n_movers'] = len(np.unique(self.adata[self.adata['m'] == 1]['i']))
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pandas/core/frame.py", line 3455, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
    raise KeyError(key) from err
KeyError: 'm'

A similar error is raised when trying to fit the CRE estimator with tw_net.fit_cre().

Traceback (most recent call last):

  File "<stdin>", line 1, in <module>
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/pytwoway/twoway.py", line 177, in fit_cre
    cre_solver = tw.CREEstimator(self.data.get_es().get_cs(), user_cre)
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/bipartitepandas/bipartitelongbase.py", line 46, in get_es
    self.gen_m()
  File "/Users/nicolasreigl/Documents/twoways/.venv/lib/python3.9/site-packages/bipartitepandas/bipartitebase.py", line 875, in gen_m
    sorted_cols = sorted(frame.columns, key=col_order)
ValueError: 'k' is not in list

Requirements

attrs==21.2.0
bipartitepandas==0.1.10
ConfigArgParse==1.5.2
cycler==0.10.0
Cython==0.29.24
iniconfig==1.1.1
joblib==1.0.1
kiwisolver==1.3.1
matplotlib==3.4.3
networkx==2.6.2
numpy==1.21.2
packaging==21.0
pandas==1.3.2
patsy==0.5.1
Pillow==8.3.1
pluggy==0.13.1
py==1.10.0
pyamg==4.1.0
pyparsing==2.4.7
pytest==6.2.4
python-dateutil==2.8.2
pytwoway==0.1.9
pytz==2021.1
qpsolvers==1.6.1
quadprog==0.1.8
scikit-learn==0.24.2
scipy==1.7.1
six==1.16.0
statsmodels==0.12.2
threadpoolctl==2.2.0
toml==0.10.2
tqdm==4.62.1

Monte Carlo example leads to value error (float but should be int)

Hi there,

once again, thanks a lot for this great package!

I just came across some trouble when replicating the Monte Carlo notebook ( https://tlamadon.github.io/pytwoway/notebooks/monte_carlo_example.html ). When running the code, I run into
ValueError: m has the wrong dtype, it is currently float64 but should be one of the following: int

This originates from

  File "/usr/local/lib/python3.9/site-packages/pytwoway/montecarlo.py", line 167, in _monte_carlo_interior
    sim_data = sim_data.clean(clean_params_1).min_joint_obs_frame(is_sorted=True, copy=False).clean(clean_params_2)
  File "/usr/local/lib/python3.9/site-packages/bipartitepandas/bipartitelongbase.py", line 106, in clean
    frame._check_cols()
  File "/usr/local/lib/python3.9/site-packages/bipartitepandas/bipartitebase.py", line 1168, in _check_cols
    raise ValueError(error_msg)

I had a closer look at this, and the issue appears to be with line 64 of bipartitelongbase.py:
frame.loc[:, 'm'] = ((i_col == i_prev) & (j_col != j_prev)).astype(int, copy=False) + ((i_col == i_next) & (j_col != j_next)).astype(int, copy=False)

It seems that under some circumstances, the m column created becomes a float. Maybe this is the case if some values are missing or some dependency version issues at my end?

As a temporary fix, I added the line
frame["m"] = pd.to_numeric(frame["m"], downcast="integer")
below line 64.

Things are running smoothly now. I hope this is useful for anyone else facing this error!

Best,
Martin

CRE variance decomposition

How to get var(eps), var(alpha), var(psi) before and after the correction using CRE? Similar to the FE estimator. Is it possible?

BLM - worker effects var(alpha)

Hi there,

I have a quick question regarding the BLM estimator. The example notebook ( https://tlamadon.github.io/pytwoway/notebooks/blm_example.html ) notes that

By default, the variance is var(psi) and the covariance is cov(psi, alpha). The default estimates don’t include var(alpha), but if you don’t include controls, var(alpha) can be computed as the residual from var(y) = var(psi) + var(alpha) + 2 * cov(psi, alpha) + var(eps).

I might be missing something here, but I guess this only holds if there are no interaction effects (see the BLM paper p.704)? I have no additional controls and just checked if I add tw.Q.VarAlpha() to Q_var when calling the fit function, the var(alpha) differs from the residual - which makes sense unless some additional restrictions are met?

But I might be missing something here!

Thanks,
Martin

df.iloc[:,i] = newvals in-place

From #48 :

Look into this warning: "future version 'df.iloc[:,i] = newvals' will attempt to set the values in place instead of always setting in a new array".

Saving and displaying of individual and firm fixed effects

Hi,

I was wondering if there is a way to obtain individual fixed effects. My understanding is that the function tw.FEEstimator.get_fe_estimates() documented here provides such functionality. Could you provide a working example of how to obtain the individual fixed effects with that function. I have tried to use the User documentation example to supply the necessary arguments to the function but I do not understand what the positional argument in that function refers to.

Minimal working example

import pytwoway as tw
import pandas as pd

df = pd.read_csv("twoway_sample_data.csv")
# Create TwoWay object
tw_net = tw.TwoWay(df)
# Clean data
tw_net.prep_data()


fe_params = {
    'ncore': 1, # Number of cores to use
    'batch': 1, # Batch size to send in parallel
    'ndraw_pii': 50, # Number of draws to use in approximation for leverages
    'levfile': '', # File to load precomputed leverages
    'ndraw_tr': 5, # Number of draws to use in approximation for traces
    'he': True, # If True, compute heteroskedastic correction
    'out': 'res_fe.json', # Outputfile where results are saved
    'statsonly': False, # If True, return only basic statistics
    'Q': 'cov(alpha, psi)' # Which Q matrix to consider. Options include 'cov(alpha, psi)' and 'cov(psi_t, psi_{t+1})'
}

fe_est = tw.FEEstimator(df, params=fe_params)

fe_est is a pytwoway.fe.FEEstimator object. How do I apply the function tw.FEEstimator.get_fe_estimates() on that object?

Matching V_EE estimates from Sorkin estimator back to firm IDs

Hi,

the example with Sorkin estimator is a bit unclear to me: when I obtain estimates stored in the V_EE object, is their order corresponding to new "j" variable or to the original "j" id that I would get with original_ids()
method? I want to match them back to original firm IDs.

Retrieving the firm and worker identifiers (Pytwoway 0.1.14.)

My goal is to estimate the firm and worker fixed effects (psi_hat and alpha_hat) and to merge them to the original population by the firm and worker identifiers (j and i). However, when running the prep_data() function of the TwoWay class, the identifiers i and j are changed and run from 0 to J for firm identifiers j (where J is the number of firms) and from 0 to N for worker identifiers i (where N is the number of workers).

How could I modify the code so that the original firm and worker identifiers are unchanged, which would enable me to merge the estimated psi_hat and alpha_hat to the original population by the firm and worker identifiers (j and i)?

Thank you for your help.

AKM: controls should use columns directly

AKM with controls should use column names directly, rather than using general column names and requiring that they have only 1 subcolumn.

This way people can more easily specify controls like time variables but for only certain periods, etc.

Pytwoway with ONLY continuous controls

I was wondering if it would be possible to estimate the corrected model with 3 continuous controls instead of some of the controls being categorical. Is this doable? If so, what would I need to change? I suspect the parameters below would need to be changed. Thanks!

FE

fecontrol_params = tw.fecontrol_params(
{
'he': True,
'continuous_controls': 'cts_control',
'Q_var': [
tw.Q.VarCovariate('psi'),
tw.Q.VarCovariate('alpha'),
tw.Q.VarCovariate('cts_control'),
tw.Q.VarCovariate(['psi', 'alpha'])
],
'ncore': 8
}
)

AttributeError: 'BipartiteLong' object has no attribute 'columns_req'

Hi all,

Thanks for the great package! I have updated some of my system today and it appears this has broken my pytwoway configuration.

I am running python 3.10.8, pytwoway 0.3.21, and bipartitepandas 1.1.9 on OSX (all installed via pip). When running the FE notebook ( https://tlamadon.github.io/pytwoway/notebooks/fe_example.html ), I get the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
[/var/folders/pk/jy78bg3s5p12wkbr0244zw3h0000gn/T/ipykernel_54928/4116612111.py](https://untitled+.vscode-resource.vscode-cdn.net/var/folders/pk/jy78bg3s5p12wkbr0244zw3h0000gn/T/ipykernel_54928/4116612111.py) in ?()
     35 sim_data = bpd.SimBipartite(sim_params).simulate()
     36 
     37 
     38 # Convert into BipartitePandas DataFrame
---> 39 bdf = bpd.BipartiteDataFrame(sim_data)
     40 # Clean and collapse
     41 bdf = bdf.clean(clean_params)

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitedataframe.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitedataframe.py) in ?(cls, i, j, j1, j2, y, y1, y2, t, t1, t2, t11, t12, t21, t22, g, g1, g2, w, w1, w2, m, custom_categorical_dict, custom_dtype_dict, custom_how_collapse_dict, custom_long_es_split_dict, **kwargs)
     53         Return dataframe (source: https:[//](https://github.com/tlamadon/pytwoway/issues/new)stackoverflow.com[/](https://untitled+.vscode-resource.vscode-cdn.net/)a[/](https://untitled+.vscode-resource.vscode-cdn.net/)2491881[/](https://untitled+.vscode-resource.vscode-cdn.net/)17333120).
     54         '''
     55         if isinstance(i, DataFrame):
     56             # If user didn't split arguments, do it for them
---> 57             return BipartiteDataFrame(**i, custom_categorical_dict=custom_categorical_dict, custom_dtype_dict=custom_dtype_dict, custom_how_collapse_dict=custom_how_collapse_dict, custom_long_es_split_dict=custom_long_es_split_dict, **kwargs)
     58         # Update custom dictionaries to be dictionaries instead of None (source: https://stackoverflow.com/a/54781084/17333120)
     59         if custom_categorical_dict is None:
     60             custom_categorical_dict = {}

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitedataframe.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitedataframe.py) in ?(cls, i, j, j1, j2, y, y1, y2, t, t1, t2, t11, t12, t21, t22, g, g1, g2, w, w1, w2, m, custom_categorical_dict, custom_dtype_dict, custom_how_collapse_dict, custom_long_es_split_dict, **kwargs)
    346                 col_reference = bpd.util.to_list(new_cols_reference_dict[new_col_name])
    347                 if len(col_reference) == 1:
    348                     # Constructed col_references are forced to be lists, if it's length one then just extract the single value from the list
    349                     col_reference = col_reference[0]
--> 350                 df = df.add_column(new_col_name, new_col_data, col_reference=col_reference, is_categorical=custom_categorical_dict[new_col_name], dtype=custom_dtype_dict[new_col_name], how_collapse=custom_how_collapse_dict[new_col_name], long_es_split=custom_long_es_split_dict[new_col_name], copy=False)
    351 
    352         return df

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py) in ?(self, col_name, col_data, col_reference, is_categorical, dtype, how_collapse, long_es_split, copy)
    493             for i, subcol in enumerate(to_list(frame.col_reference_dict[col_name])):
    494                 frame.loc[:, subcol] = col_data_lst[i]
    495 
    496         # Sort columns
--> 497         frame = frame.sort_cols(copy=False)
    498 
    499         return frame

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py) in ?(self, copy)
   1355             frame = self
   1356 
   1357         # Sort columns
   1358         sorted_cols = bpd.util._sort_cols(frame.columns)
-> 1359         frame = frame.reindex(sorted_cols, axis=1, copy=False)
   1360 
   1361         return frame

[/usr/local/lib/python3.10/site-packages/pandas/core/frame.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/pandas/core/frame.py) in ?(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
   5140         fill_value: Scalar | None = np.nan,
   5141         limit: int | None = None,
   5142         tolerance=None,
   5143     ) -> DataFrame:
-> 5144         return super().reindex(
   5145             labels=labels,
   5146             index=index,
   5147             columns=columns,

[/usr/local/lib/python3.10/site-packages/pandas/core/generic.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/pandas/core/generic.py) in ?(self, labels, index, columns, axis, method, copy, level, fill_value, limit, tolerance)
   5517         if self._needs_reindex_multi(axes, method, level):
   5518             return self._reindex_multi(axes, copy, fill_value)
   5519 
   5520         # perform the reindex on the axes
-> 5521         return self._reindex_axes(
   5522             axes, level, limit, tolerance, method, fill_value, copy
   5523         ).__finalize__(self, method="reindex")

[/usr/local/lib/python3.10/site-packages/pandas/core/generic.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/pandas/core/generic.py) in ?(self, axes, level, limit, tolerance, method, fill_value, copy)
   5545                 labels, level=level, limit=limit, tolerance=tolerance, method=method
   5546             )
   5547 
   5548             axis = self._get_axis_number(a)
-> 5549             obj = obj._reindex_with_indexers(
   5550                 {axis: [new_index, indexer]},
   5551                 fill_value=fill_value,
   5552                 copy=copy,

[/usr/local/lib/python3.10/site-packages/pandas/core/generic.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/pandas/core/generic.py) in ?(self, reindexers, fill_value, copy, allow_dups)
   5613             new_data = new_data.copy(deep=copy)
   5614         elif using_copy_on_write() and new_data is self._mgr:
   5615             new_data = new_data.copy(deep=False)
   5616 
-> 5617         return self._constructor_from_mgr(new_data, axes=new_data.axes).__finalize__(
   5618             self
   5619         )

[/usr/local/lib/python3.10/site-packages/pandas/core/frame.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/pandas/core/frame.py) in ?(self, mgr, axes)
    645             # fastpath avoiding constructor call
    646             return df
    647         else:
    648             assert axes is mgr.axes
--> 649             return self._constructor(df, copy=False)

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitelong.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitelong.py) in ?(self, col_reference_dict, col_collapse_dict, *args, **kwargs)
     25             col_collapse_dict = {}
     26         col_reference_dict = bpd.util.update_dict({'t': 't'}, col_reference_dict)
     27         col_collapse_dict = bpd.util.update_dict({'m': 'sum'}, col_collapse_dict)
     28         # Initialize DataFrame
---> 29         super().__init__(*args, col_reference_dict=col_reference_dict, col_collapse_dict=col_collapse_dict, **kwargs)
     30 
     31         # self.log('BipartiteLong object initialized', level='info')

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitelongbase.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitelongbase.py) in ?(self, col_reference_dict, *args, **kwargs)
     21         if col_reference_dict is None:
     22             col_reference_dict = {}
     23         col_reference_dict = bpd.util.update_dict({'j': 'j', 'y': 'y', 'g': 'g', 'w': 'w'}, col_reference_dict)
     24         # Initialize DataFrame
---> 25         super().__init__(*args, col_reference_dict=col_reference_dict, **kwargs)
     26 
     27         # self.log('BipartiteLongBase object initialized', level='info')

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py) in ?(self, columns_req, columns_opt, columns_contig, col_reference_dict, col_dtype_dict, col_collapse_dict, col_long_es_dict, track_id_changes, log, *args, **kwargs)
    185             col_long_es_dict = {}
    186 
    187         if len(args) > 0 and isinstance(args[0], BipartiteBase):
    188             # Note that isinstance works for subclasses
--> 189             self._set_attributes(args[0], no_dict=False, track_id_changes=track_id_changes)
    190             # Update class attributes from the previous dataframe with parameter inputs
    191             self.columns_req = ['i', 'j', 'y'] + [column_req for column_req in self.columns_req if column_req not in ['i', 'j', 'y']]
    192             self.columns_opt = ['t', 'g', 'w', 'm'] + [column_opt for column_opt in self.columns_opt if column_opt not in ['t', 'g', 'w', 'm']]

[/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/bipartitepandas/bipartitebase.py) in ?(self, frame, no_dict, track_id_changes)
    714             track_id_changes (bool): if True, create dictionary of Pandas dataframes linking original categorical id values to updated contiguous id values
    715         '''
    716         # Dictionaries
    717         if not no_dict:
--> 718             self.columns_req = frame.columns_req.copy()
    719             self.columns_opt = frame.columns_opt.copy()
    720             self.col_reference_dict = frame.col_reference_dict.copy()
    721         # Required, even if no_dict

[/usr/local/lib/python3.10/site-packages/pandas/core/generic.py](https://untitled+.vscode-resource.vscode-cdn.net/usr/local/lib/python3.10/site-packages/pandas/core/generic.py) in ?(self, name)
   6200             and name not in self._accessors
   6201             and self._info_axis._can_hold_identifiers_and_holds_name(name)
   6202         ):
   6203             return self[name]
-> 6204         return object.__getattribute__(self, name)

AttributeError: 'BipartiteLong' object has no attribute 'columns_req'

When using my own dataset, the same error appears as soon as I try to do something with the BipartiteDataFrame (such as printing it or cleaning it).

I might be missing something obvious here, but any help would be highly appreciated!

Thanks!

Best,
Martin

Multiprocess doesn't work on Linux

FEEstimator gets stuck in fitting when the sample size exceeds ~1 million observations. I am doing HE correction. After the "leverages batch" step, it stops at the "leverages" step. The progress bar is always 0%.

he-corrected FE-estimates of var(x_cont), cov(alpha, xcont) and cov(psi,xcont) are significantly off from the true values

Hello,

I am interested in estimating the contributions of var(xcont), cov(alpha, xcont) and cov(psi,xcont) to the total var(y), where xcont is a single continuous control variable. I simulated some worker-firm data with limited mobility bias and positive var(xcont), cov(alpha, xcont) and cov(psi,xcont). I then used the fecontrol-estimator with the he-correction. The he-corrected estimates of var(alpha), var(psi) and cov(alpha, psi) look almost perfect. However, the estimates of var(xcont), cov(alpha, xcont) and cov(psi,xcont) are significantly off from the true values. Also, the estimated y(var) is significantly smaller than the true var(y). It looks like y is residualized in the process so any variation due to xcont is taken out, but I am not sure if this is what is happening. Last, the he-corrected variance estimates of the single variance components do not add up to the total variance.

Can you please help me to understand these apparent inconsistencies? Did I specify something wrong in my code? Or was the package actually not built to estimate var(xcont), cov(alpha, xcont) and cov(psi,xcont) but merely to take out any variation in y that is due to xcont?

Thank you for the great package and any help in advance!

Best
Martin

Code:

###############################################################################
### simulate some data where you know the true variance of wage components                                  
###############################################################################

import numpy as np
import pandas as pd
import pytwoway as tw
import bipartitepandas as bpd
import multiprocess 
import openpyxl
import os
import matplotlib.pyplot as plt 
import time

start_time = time.time()

###############################################################################
### system req.                                 
###############################################################################

# The code was last run on a 10-core 2.5GHz Intel processor with 256 GB RAM, 
# the minimum required ammount of RAM is round about 8-10GB.
# The estimation finished in ~ 16 minutes. 

###############################################################################
### parameters                                 
###############################################################################

np.random.seed(43532) 

##### gen data frame with workers (N=w_obs) and f_obs (N=f_obs) firms 
w_obs = 500000
f_obs = 20000

# w_obs = 50000
# f_obs = 2000

# influence limited mobility bias via probability of moving firm in t=2 and t=3
move_prob = 0.05

### set amount of sorting
# this can range from 0 (perfect sorting) to 1 (no sorting at all)
sorting = 0.5
    
###############################################################################
### simulate data                                
###############################################################################

# gen worker IDs and 3-year panel
data1 = {'i': range(1,w_obs+1), 't':1}
data1 = pd.DataFrame(data1)

data2 = {'i': range(1,w_obs+1), 't':2}
data2 = pd.DataFrame(data2)

data3 = {'i': range(1,w_obs+1), 't':3}
data3 = pd.DataFrame(data3)

df = pd.concat([data1,data2]).sort_values('i')
df = pd.concat([df,data3]).sort_values('i')

# gen worker FE
data4 = {'i': range(1,w_obs+1), 'iFE': np.random.normal(1,np.sqrt(0.3),w_obs)}
data4 = pd.DataFrame(data4)
df = pd.merge(df, data4, on='i')

del data1
del data2 
del data3 
del data4

# gen firm IDs and introduce sorting
df['iFE_dec'] = pd.qcut(df['iFE'], q=f_obs, labels=False) + 1 
df['j_range_min'] = df['iFE_dec'] - int(sorting*f_obs) 
df['j_range_min'] = np.where(df['j_range_min']<1, 1, df['j_range_min']).astype('int')
df['j_range_max'] = df['iFE_dec'] + int(sorting*f_obs) 
df['j_range_max'] = np.where(df['j_range_max']>f_obs, f_obs, df['j_range_max']).astype('int')
df['j'] = np.random.randint(low=df['j_range_min'], high=df['j_range_max'] + 1, size=len(df))
del df['iFE_dec']

# look at firm size distribution
# fsize = df.groupby(['j','t']).size()
# print("firm size distribution:")
# print(fsize.describe())
# plt.hist(fsize, bins=100, edgecolor='black')
# plt.show()

#increase limited mobility bias artifically:
df = df.sort_values(by=['i','t'])
df['j_tmin1'] = df.groupby('i')['j'].shift(1)
df_help = df[df['t']>1]
df_help['move_help'] = np.random.choice([0,1],size=len(df_help),p=[1-move_prob,move_prob])
df_help['move'] = np.where(df_help['move_help']==1,1,0)
df_help = df_help[['i','t','move']]
df=pd.merge(df,df_help,on=['i','t'],how='outer')
df.loc[df['move']==0, 'j'] = df['j_tmin1'] 
df_fail = df[(df['move']==1) & (df['j']==df['j_tmin1'])]
def generate_j_new(row):
    new_value = row['j']
    while new_value == row['j']:
        new_value = np.random.randint(row['j_range_min'],row['j_range_max']+1)
    return new_value
df_fail['j_new'] = df_fail.apply(generate_j_new, axis=1)
df_fail = df_fail[['i','t','j_new']]
df=pd.merge(df,df_fail,on=['i','t'],how='outer')
df['j'] = np.where(~df['j_new'].isna(), df['j_new'], df['j'])
df['j_tmin1'] = df.groupby('i')['j'].shift(1)
df.loc[df['move']==0, 'j'] = df['j_tmin1'] 
del df['j_new']
del df['j_tmin1']
del df['move']
del df['j_range_min']
del df['j_range_max']
del df_fail
del df_help

# measure probability to move in any year (to check that it worked)
# df_help = df
# df_help['j_tmin1'] = df_help.groupby('i')['j'].shift(1)
# df_help = df[df['t'] > 1]
# df_help['move'] = np.where(df_help['j']==df_help['j_tmin1'],0,1)
# print(df_help.head(9))
# print(df_help['move'].describe())

# gen firm FE
data2 = {'j': range(1,f_obs+1)}
df2 = pd.DataFrame(data2)
df2['jFE'] = np.random.normal(1,np.sqrt(0.2),f_obs)
df2 = df2.sort_values('jFE').reset_index()
df2['j_new'] = df2.index + 1
del df2['j']
df2['j'] = df2['j_new'] 
df2 = df2[['j', 'jFE']]
df = pd.merge(df, df2, on='j')
del df2
df = df.sort_values(by=['i','t'])

# gen xcont and introduce positive covariance with firm FE (and hence also worker FE)
df_help = df[['i','jFE']].groupby('i').agg('mean').reset_index()
df_help['xcont'] = np.random.normal(1000,np.sqrt(600),w_obs) + 15*df_help['jFE'] 
del df_help['jFE']
df = pd.merge(df,df_help,on='i')
del df_help

# gen xcont coefficient 
df['xcont_coef'] = df['xcont'] * 0.01

# gen residual
df['res'] = np.random.normal(0,np.sqrt(0.1),len(df)) 

# gen wage
df['y'] = df['iFE'] + df['jFE'] + df['xcont_coef'] + df['res']

#aggregate true variances in full data set 
# var_y = df['y'].agg('var')
# var_iFE = df['iFE'].agg('var')
# var_jFE = df['jFE'].agg('var')
# var_xcont = df['xcont_coef'].agg('var')
# var_res = df['res'].agg('var')
# cov_iFE_jFE = np.cov([df['iFE'],df['jFE']])[0][1]
# cov_iFE_xcont = np.cov([df['iFE'],df['xcont_coef']])[0][1]
# cov_jFE_xcont = np.cov([df['jFE'],df['xcont_coef']])[0][1]

# print(var_iFE + var_jFE  + var_xcont + var_res + 2*cov_iFE_jFE + 2*cov_iFE_xcont+
#       2*cov_jFE_xcont)
# print(var_y)

###############################################################################
### isolate looc set                            
###############################################################################

df['worker_id'] = df['i']
df['firm_id'] = df['j']

clean_params = bpd.clean_params(
    {
        'connectedness': 'leave_out_spell',
        'collapse_at_connectedness_measure': True,
        'drop_single_stayers': False,
        'drop_returns': False,
        'copy': False
    }
)

# Convert into BipartitePandas DataFrame and clean
bdf = bpd.BipartiteDataFrame(df).clean(clean_params)

###############################################################################
### measure variances and covariances in the looc set                         
###############################################################################

# isolate looc set and make sure #obs lines up
looc_workers = len(bdf[['worker_id']].drop_duplicates())
looc = bdf[['firm_id']].drop_duplicates()
df_looc = pd.merge(df,looc,on=['firm_id'],how='inner')
df_looc_workers = len(df_looc[['worker_id']].drop_duplicates())
print(looc_workers, df_looc_workers)

# aggregate true variances in looc data set 
var_y = df_looc['y'].agg('var')
var_iFE = df_looc['iFE'].agg('var')
var_jFE = df_looc['jFE'].agg('var')
var_xcont = df_looc['xcont_coef'].agg('var')
var_res = df_looc['res'].agg('var')
cov_iFE_jFE = np.cov([df_looc['iFE'],df_looc['jFE']])[0][1]
cov_iFE_xcont = np.cov([df_looc['iFE'],df_looc['xcont_coef']])[0][1]
cov_jFE_xcont = np.cov([df_looc['jFE'],df_looc['xcont_coef']])[0][1]

# print(var_iFE + var_jFE  + var_xcont + var_res + 2*cov_iFE_jFE + 2*cov_iFE_xcont+
#       2*cov_jFE_xcont)
# print(var_y)

###############################################################################
### decompose wages variance                            
###############################################################################

cts_control = ['xcont']

fecontrol_params = tw.fecontrol_params(
    {
        'he': True,
        'ho': False,
        'continuous_controls': cts_control,
        'Q_var': [
            tw.Q.VarCovariate('psi'),
            tw.Q.VarCovariate('alpha'),
            tw.Q.VarCovariate(cts_control)
                  ],
        'Q_cov': [
            tw.Q.CovCovariate('psi', 'alpha'),
            tw.Q.CovCovariate('alpha', cts_control),
            tw.Q.CovCovariate('psi', cts_control),
        ],
        'ncore': 10
    }
)

# Initialize FE estimator
fe_estimator = tw.FEControlEstimator(bdf, fecontrol_params)
# Fit FE estimator
fe_estimator.fit()

print(fe_estimator.summary)

###############################################################################
### report the results in a table                          
###############################################################################

### save decomp results
results = pd.DataFrame([fe_estimator.summary])
results_plugin = results.copy(deep=True)
results_he = results.copy(deep=True)
#print(results.columns)

# plugin
results_plugin['var_total'] = results.iloc[0,0]
results_plugin['var_res'] = results.iloc[0,1]
results_plugin['var_alpha'] = results.iloc[0,3]
results_plugin['var_psi'] = results.iloc[0,5]
results_plugin['var_xcont'] = results.iloc[0,7]
results_plugin['2cov_alpha_xcont'] = 2*results.iloc[0,9]
results_plugin['2cov_psi_alpha'] = 2*results.iloc[0,11]
results_plugin['2cov_psi_xcont'] = 2*results.iloc[0,13] 

results_plugin = results_plugin[['var_total', 'var_alpha', 'var_psi',
                      'var_xcont', 'var_res', '2cov_psi_alpha',
                      '2cov_alpha_xcont', '2cov_psi_xcont']].transpose()

results_plugin = results_plugin.rename(columns={0:'var'})

results_plugin['total'] = results_plugin.iloc[0,0]
results_plugin['share'] = results_plugin['var'] / results_plugin['total']

print('results of plug-in (AKM) estimation:')
print(results_plugin)

print('sum of the variance shares (should be ~1) and is:')
# check that shares add up to 1
print(results_plugin.iloc[1,2] + results_plugin.iloc[2,2] + results_plugin.iloc[3,2] +
      results_plugin.iloc[4,2] + results_plugin.iloc[5,2] + results_plugin.iloc[6,2] +
      results_plugin.iloc[7,2])

# he 
results_he['var_total'] = results.iloc[0,0]
results_he['var_res'] = results.iloc[0,2]
results_he['var_alpha'] = results.iloc[0,4]
results_he['var_psi'] = results.iloc[0,6]
results_he['var_xcont'] = results.iloc[0,8]
results_he['2cov_alpha_xcont'] = 2*results.iloc[0,10]
results_he['2cov_psi_alpha'] = 2*results.iloc[0,12]
results_he['2cov_psi_xcont'] = 2*results.iloc[0,14] 

results_he = results_he[['var_total', 'var_alpha', 'var_psi',
                      'var_xcont', 'var_res', '2cov_psi_alpha',
                      '2cov_alpha_xcont', '2cov_psi_xcont']].transpose()


results_he = results_he.rename(columns={0:'var'})

results_he['total'] = results_he.iloc[0,0]
results_he['share'] = results_he['var'] / results_he['total']

print('results of he-robust (KSS) estimation:')
print(results_he)

print('sum of the variance shares (should be ~1) and is:')
# check that shares add up to 1
print(results_he.iloc[1,2] + results_he.iloc[2,2] + results_he.iloc[3,2] +
      results_he.iloc[4,2] + results_he.iloc[5,2] + results_he.iloc[6,2] +
      results_he.iloc[7,2])

### produce table on true and estmated variance components

data_table = {'var_total': var_y,'var_alpha':var_iFE, 'var_psi':var_jFE, 
              'var_xcont':var_xcont, 'var_res': var_res,
              '2cov_psi_alpha':2*cov_iFE_jFE,
              '2cov_alpha_xcont':2*cov_iFE_xcont,
              '2cov_psi_xcont':2*cov_jFE_xcont}
table_true_vs_estimate = pd.DataFrame(data_table, index=[0]).transpose()
table_true_vs_estimate['true']=table_true_vs_estimate[0]
del table_true_vs_estimate[0]

table_true_vs_estimate = pd.merge(table_true_vs_estimate,results_plugin,
                                  left_index=True, right_index=True)

table_true_vs_estimate['plugin'] = table_true_vs_estimate['var']
table_true_vs_estimate = table_true_vs_estimate[['true', 'plugin']]

table_true_vs_estimate = pd.merge(table_true_vs_estimate,results_he,
                                  left_index=True, right_index=True)

table_true_vs_estimate['he'] = table_true_vs_estimate['var']
table_true_vs_estimate = table_true_vs_estimate[['true', 'plugin', 'he']].reset_index()

print('compare true values to estimates:')
print(table_true_vs_estimate)

# table_true_vs_estimate.to_excel('True_vs_plugin_he_w{}_f{}_mprob{}_sort{}.xlsx'
#                                 .format(w_obs,f_obs,move_prob,sorting),
#                                 index=False)

end_time = time.time()
elapsed_time = (end_time - start_time)/60
print(f"Total Elapsed Time: {elapsed_time:.2f} minutes")

Output:

results of plug-in (AKM) estimation:
                       var     total     share
var_total         0.834411  0.834411  1.000000
var_alpha         0.372120  0.834411  0.445967
var_psi           0.191529  0.834411  0.229538
var_xcont         0.094334  0.834411  0.113054
var_res           0.002773  0.834411  0.003323
2cov_psi_alpha    0.118499  0.834411  0.142015
2cov_alpha_xcont  0.004128  0.834411  0.004947
2cov_psi_xcont    0.051694  0.834411  0.061952
sum of the variance shares (should be ~1) and is:
1.0007964596391798

results of he-robust (KSS) estimation:
                       var     total     share
var_total         0.834411  0.834411  1.000000
var_alpha         0.296477  0.834411  0.355313
var_psi           0.147513  0.834411  0.176787
var_xcont        -0.044161  0.834411 -0.052924
var_res           0.097978  0.834411  0.117422
2cov_psi_alpha    0.203481  0.834411  0.243862
2cov_alpha_xcont  0.004098  0.834411  0.004912
2cov_psi_xcont    0.051719  0.834411  0.061982
sum of the variance shares (should be ~1) and is:
0.9073540198317966

compare true values to estimates:
              index      true    plugin        he
0         var_total  0.897533  0.834411  0.834411
1         var_alpha  0.298951  0.372120  0.296477
2           var_psi  0.148539  0.191529  0.147513
3         var_xcont  0.063122  0.094334 -0.044161
4           var_res  0.099872  0.002773  0.097978
5    2cov_psi_alpha  0.214297  0.118499  0.203481
6  2cov_alpha_xcont  0.031172  0.004128  0.004098
7    2cov_psi_xcont  0.042653  0.051694  0.051719

Total Elapsed Time: 16.10 minutes

2 or more period-observations

I think BLM, FE and CRE classes are meant to work on 2-period data, but if one runs them on higher period data they work either way. I wanted to know if that is because the estimation only takes the first two period's observations, or is it using every observation on the data?
Thanks!

Several issues

Hi,

First I wanted to thank you for the great package. It is super useful! I've been using it for two projects, so I have a bunch of questions/issues and I didn't want to open a separate issue for each, but if you prefer that I can also do that.

Small issues:

  • On my personal computer, on windows, I have not been able to install pytwoway, even after a couple of hours of trying because of the dependency quadprog. This seems to be a known issue, and I tried the fixes, but havent been able to make it work. In particular, I think quadprog now made a version with a prebuilt wheel and is recommending to packages which use quadprog to put that as a dependency.
  • Wanted to mention 2 warnings which seem to have no effect on the commands running:
  1. I get the warning: "future version 'df.iloc[:,i] = newvals' will attempt to set the values in place instead of always setting in a new array". Just wanted to give a heads-up in case that will change something.
  2. When I run a model with fit(), it works, but I do get the following warning: "sys:1: ResourceWarning: unclosed socket" (and then a bit more detail)

New feature maybe?:

  • Is it currently possible to add a third fixed-effect (an origin firm fixed-effect) like in It Ain't Where You're From, It's Where You're At: Hiring Origins, Firm Heterogeneity, and Wages (Sabrina Di Addario, Patrick Kline, Raffaele Saggio, and Mikkel Sølvsten) Journal of Econometrics, 232 (April 2023), pp. 340-374.. In particular, does simply adding it as a control work (the connectedness might not be correct then?)?

(Apologies in what follows I don't have reproducible examples, because I'm using confidential admin data)
Bigger issues:

  • If I use summary on the cleaned data, and in the next line run FEEstimator, the variances of y are different (see examples below). This is also the case with the simulated datasets in the examples. This only happens with collapse_at_connectedness_measure: True. Is that because different weights are used?
  • Negative implied alphas or the terms don't add up to the variance. I am using a binary variable as a dependent variable (like in Employers and Unemployment Insurance Take-up (Lachowska, Sorkin and Woodbury)). Below are 3 different versions of the summary and estimation results:
  1. with collapse_at_connectedness_measure: True
    image
    image
  2. with collapse_at_connectedness_measure: False
    image
    image
  3. with collapse_at_connectedness_measure: False and controls
    image
    image

Sorry about the very long post and I hope that I didn't do something stupid!

Thanks a lot for your help!

AKM: statistical tests

Add statistical tests for AKM, e.g. from:

  • AMS - endogenous mobility

  • DJ - non-stationarity of parameters

  • BLM - identification of interacted BLM

  • SLM - endogenous mobility

CRE: clean up code

Clean up CRE code so it is structured more closely to the AKM estimator.

Extract effects from the tw.cre.CREEstimator() function

I have closed issue #2 as it concerned the Fixed Effects estimator. I had a side note in this issue regarding individual effects for the CRE estimator.

I am not 100% sure if I get the econometrics right here but is it possible the extract individual effects also for the CRE estimator? I suppose that is what the tw.cre.CREEstimator() function is designed for.

Is it possible to do oneway estimation?

Thanks for creating this package and making it public.

My problem is that the data I want try the various corrections has no worker identifiers and thus only firm effects and some worker characteristics (dummy variables). Is it still possible to do the estimation using this pkg (e.g. using all worker variables to create some pseudo worker id)?

If this is not directly available, is there any advice on how to learn the code (especially the fe.py) so that I can borrow some part of the code and make an ad-hoc solution?

Thanks for any suggestions in advance.

Retrieving the firm and worker identifiers (version 0.2.6)

This is a similar question to #5, but for an updated version.

My goal is to extract FEs from pytwoway and link them back to the original IDs. Following #2 (comment), I managed to extract the estimated FEs using 'attach_fe_estimates':True in the parameters of FEEstimator. I am having some trouble linking them back to my original worker and firm IDs. The reason is that Bipartite resets the index when using the clean method. (As you can see in the example here after cell 5.)

Is there a way to keep the original ids in Bipartite object to do this? Thanks!

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.