vdquadros / immigration_enclave Goto Github PK

View Code? Open in Web Editor NEW

2.0 2.0 3.0 32.86 MB

Stata 38.27% SAS 61.73%

immigration_enclave's People

Contributors

Stargazers

Forkers

paulgp

immigration_enclave's Issues

Regressions

Just want to let you know that I don’t have the regressions yet because there’s something strange in the regressions to get the residuals:

If I do:

use data/1980/nm.dta, clear
sort eclass xclass nonmover
by eclass xclass: reg logwage2 exp exp2 exp3 educ eclass#xclass inschool advanced ft lowhrs hisp_ed hisp_coll black_ed black_coll asian_ed asian_coll nonmover#eclass rczone0 [fweight=wt]

then the display window says:


-> eclass = 2, xclass = 1
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 2
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 3
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 4
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 5
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 6
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 7
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 8
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 9
no observations

But if I tab those variables, I get:


           |                                               xclass
    eclass |         1          2          3          4          5          6          7          8          9 |     Total
-----------+---------------------------------------------------------------------------------------------------+----------
         1 |    74,627     50,389     38,034     38,223     39,695     40,478     47,688     57,422     51,186 |   437,742 
         2 |   135,282    105,828     82,566     70,376     59,073     50,111     45,000     41,862     28,709 |   618,807 
         3 |   126,470    118,687     82,837     50,836     38,542     33,795     30,174     22,995     12,400 |   516,736 
         4 |    23,032     34,213     26,976     18,423     14,979     12,798      9,959      6,420      3,265 |   150,065 
-----------+---------------------------------------------------------------------------------------------------+----------
     Total |   359,411    309,117    230,413    177,858    152,289    137,182    132,821    128,699     95,560 | 1,723,350

So I don't know what's going on, and the residuals are very low (like .1, and not .3)

Table 2

Looks exactly the same as Card's.

Country of origin	Working age population (thousands)	Share of all Immigrants (percent)	After 1980	After 1990	Mean years completed	Dropouts	12-15 years	College or more
Natives	141,475				13.3	14.2	60.6	25.2
Immigrants	23,627	100	70.5	39.9	11.6	37.4	38.8	23.8

Mexico	7,267	30.8	75.1	43.8	8.6	69.8	26.5	3.7
Philippines	1,078	4.6	66.1	31.5	14.1	9.2	43.7	47
India	838	3.5	78.4	51.4	15.6	9.6	20.2	70.2
Vietnam	806	3.4	75.3	39.7	11.7	34.6	45.8	19.6
China	715	3	82	50.1	13.6	24.2	29.2	46.7
El Salvador	698	3	85.1	37	8.9	65	30.6	4.4
Korea	664	2.8	66.4	33.1	14	10.6	45.8	43.6
Cuba	586	2.5	52.3	29.1	12.5	30	48.3	21.7
Dominican Republic	536	2.3	74.2	38.1	10.8	48.8	41.9	9.3
Canada	517	2.2	47.6	31.9	14.3	8.9	49.8	41.3
Germany	455	1.9	32.6	21	13.9	8.3	59.3	32.4
Jamaica	429	1.8	66.7	27.3	12.6	23.8	57.8	18.4
Colombia	400	1.7	71.9	40.5	12.5	24.7	53.3	21.9
Guatemala	400	1.7	84	45.9	8.8	64.5	30.4	5.1
Haiti	333	1.4	75.1	34.5	11.8	35.2	51.3	13.5
Poland	310	1.3	74.5	42.3	13.3	16.3	58.2	25.6

Table 3

Imm status/gender	Year	Education	Experience	Employment rate( %)	Mean wage	Overall Variance (log wage)	Residual
Native men	1980	12.5	18.8	90.2	25.07	0.379	0.283
	1990	13.0	18.9	89.3	23.90	0.452	0.319
	2000	13.2	20.4	86.8	25.84	0.486	0.358

Native women	1980	12.2	19.7	65.4	16.73	0.315	0.267
	1990	12.8	19.4	74.9	17.07	0.381	0.294
	2000	13.3	20.7	77.1	19.52	0.408	0.320

Immigrant men	1980	11.6	19.1	87.5	24.49	0.435	0.327
	1990	11.4	18.1	87.1	21.83	0.513	0.370
	2000	11.6	18.8	86.5	23.21	0.557	0.409

Immigrant women	1980	11.0	20.6	60.0	17.15	0.342	0.295
	1990	11.2	19.9	65.1	16.96	0.413	0.331
	2000	11.7	20.0	64.8	19.27	0.484	0.381

Table 6

Columns (1)-(4) are still a bit strange, especially because the R2 is lower and the sign for the lagged dependent variable is negative (instead of positive as in Card's paper). Columns (5)-(8) are pretty close.

------------------------------------------------------------------------------------
                                1               2               3               4   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.020***       -0.020***       -0.025***       -0.026***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.076***       -0.077***       -0.083***       -0.086***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.077***        0.078***        0.086***        0.089***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980          0.078***        0.083***        0.091***        0.102***
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990         -0.044***       -0.050***       -0.048***       -0.062***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.139***        0.134***        0.137***        0.125***
                           (0.00)          (0.00)          (0.00)          (0.01)   
Wage res imm 1980          -0.166***       -0.160***       -0.166***       -0.152***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Mfg share in 1980          -0.049***       -0.038***       -0.080***       -0.057***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990          -0.055***       -0.070***       -0.023          -0.056***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                             -0.012***                       -0.026***
                                           (0.00)                          (0.00)   
constant                   -0.054***       -0.053***       -0.078***       -0.079***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.145           0.145           0.140           0.139   
------------------------------------------------------------------------------------


------------------------------------------------------------------------------------
                                5               6               7               8   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.055***       -0.048***       -0.076***       -0.066***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.021***       -0.023***       -0.039***       -0.036***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.011***        0.020***        0.035***        0.035***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980         -0.177***       -0.176***       -0.129***       -0.141***
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990          0.047***        0.085***        0.025           0.059***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.214***        0.323***        0.214***        0.295***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res imm 1980          -0.132***       -0.252***       -0.125***       -0.217***
                           (0.00)          (0.00)          (0.00)          (0.01)   
Mfg share in 1980          -0.367***       -0.410***       -0.388***       -0.415***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990           0.495***        0.540***        0.493***        0.527***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                              0.183***                        0.137***
                                           (0.00)                          (0.00)   
constant                   -0.048***       -0.081***       -0.132***       -0.135***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.344           0.373           0.313           0.355   
------------------------------------------------------------------------------------

Over ID Tests

Hi Isaac,

In the meeting we talked about the Over ID test.

Could you please confirm that our Over ID test would look like the below?

ivregress 2sls  `y'  `controls' (`x'= shric*)  [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"

where y is the difference in wages, the controls are the same as Card's controls, x is the relative share of natives and imm, and the shric* is the share of imm fixed in 1980 (for all countries by city).

If so, the Over ID test does not reject the null (p = 0.2204).

This is for High School workers.

Since you did not write the code, I am also copying below the equivalent piece of code for the canonical Bartik and the ADH example.

Canonical:

ivregress 2sls  `y'  `controls' czone_* year_*  (`x'= t1990_init_sh_ind_* t2000_init_sh_ind_*  )  [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"

ADH:

ivregress 2sls  `y'  `controls' i.t2  (`x'= t1990_sh_ind_2011-t1990_sh_ind_3931 t2000_sh_ind_2011-t2000_sh_ind_3931)  [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"

Bartik weights

Hi @econisaac,

Could you please take a look at this and make sure it makes sense/agrees with what we talked about in the meeting on Tuesday?

Thanks,
Victoria

Dataset

To calculate the Bartik weights, I created a dataset of the form:

124 observations (one for each MSA).
130 variables:
- rmsa
- Dependent variables (2 variables):
  - For high school-equivalent workers:
    - resgap2: Difference between the mean wage residuals of HS-equivalent immigrant and native workers in each MSA in 2000.
  - For college-equivalent workers:
    - resgap4: Difference between the mean wage residuals of college-equivalent immigrant and native workers in each MSA in 2000.
- Instruments (38 variables):
  - shric1-shric38: This corresponds to one variable for each of the 38 countries. It is the fraction of earlier immigrants from country k who lived in location l in 1980. For example, the variable shric1 takes on 124 distinct values, giving us the share of immigrants from Mexico that lived in each of the top 124 MSAs in 1980.
    In order to agree with Card's instrument, we divide the variables by the city population in 2000. So the values of shric1-shric38 for each row are divided by the corresponding MSA's population in 2000.
- Regression controls (12 variables):
  - logsize80: Log city size in 1980
  - logsize90: Log city size in 1990
  - coll80: Share of the MSA population with college in 1980
  - coll90: Share of the MSA population with college in 1990
  - nres80: Mean wage residuals for natives living in the MSA in 1980
  - ires80: Mean wage residuals for immigrants living in the MSA in 1980
  - nres90: Mean wage residuals for natives living in the MSA in 1990
  - ires90: Mean wage residuals for immigrants living in the MSA in 1990
  - mfg80: Share of MSA workers in manufacturing in 1980
  - mfg90: Share of MSA workers in manufacturing in 1990
  - resgap902: Lagged dependent variable for high school-equivalent workers. Currently not included in the list of controls.
  - resgap904: Lagged dependent variable for college-equivalent workers. Currently not included in the list of controls.
- "Growth rates" (76 variables):
  - For high school equivalent workers (38 variables):
    - hs_imm_ic1-hs_imm_ic38: Number of HS-equivalent immigrant workers who arrived in the US between 1990 and 2000. One variable for each country ic1-ic38. This variables are constant across MSAs, so their values are repeated for all of the 124 observations.
  - For college equivalent workers (38 variables):
    - coll_imm_ic1-coll_imm_ic38: Number of college-equivalent immigrant workers who arrived in the US between 1990 and 2000. One variable for each country ic1-ic38. This variables are constant across MSAs, so their values are repeated for all of the 124 observations.
- Regression weight (1 variable):
  - count90: MSA population in 1990

Code for weights

I am going to have two scripts, one for the high school-equivalent workers and one for the college-equivalent workers. I am copying below the code for HS-equivalent workers.

set seed 12345
use data/prepared_bartik.dta, clear

local controls logsize80 logsize90 coll80 coll90 ires80 nres80 mfg80 mfg90
local weight count90

local y resgap2
local x relshs
*local z shric*

local ind_stub shric*
local growth_stub hs_imm_ic*

* local time_var year
local cluster_var rmsa

foreach ind_var of varlist `ind_stub'* {
	replace `ind_var' = `ind_var' * 100
	}

forvalues k = 1(1)38 {
	egen agg_sh_ind_`k' = rowtotal(shric`k')
	}

	
bartik_weight, z(`ind_stub'*) weightstub(`growth_stub'*) x(`x') y(`y') controls(`controls') weight_var(`weight')

Data construction for Card

1980

read80.do - reads the state-specific files of the 1980 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is all80.dta. Takes as input:

i. Census of Population and Housing, 1980 [United States]: Public Use Microdata Sample (A Sample): 5-Percent Sample (ICPSR 8101). Download it here.
read_all80.sas, which creates all80.sas7bdat. Takes as input all80.dta.
Run the scripts provided by Card.
i. np2.sas - creates a working data set of wage-earners age 18+, with recodes, etc. This is np80.sas7bdat. These data are used to build wage outcomes. Takes as input all80.sas7bdat. *reads the code in smsarecode80.sas to re-code msa's.

ii. allnp2.sas - creates a working data set of EVERYONE age 18+, with recodes, etc. This is supp80.sas7bdat. These data are used to build supply variables. Takes as input all80.sas7bdat. *reads the code in smsarecode80.sas to re-code msa's.

iii. cell1.sas - creates a big summary of data by cell ==> bigcells.sas7bdat. Takes as input np80.sas7bdat.

iv.t1.sas- creates a big summary of data by cell ==> allcells.sas7bdat. Takes as input supp80.sas7bdat.

v. supply1.sas - gets supply measures ==> cellsupply.sas7bdat. Takes as input np80.sas7bdat.

vi. imm1.sas - gets counts of immigrants by sending country in each city ==>ic_city.sas7bdat (IC is Card's classification of sending countries). Takes as input `supp80.sas7bdat.

vii.indist.sas - gets fraction of workers in MFG by city. Takes as input np80.sas7bdat.
Export some datasets to Stata:
i. cell1_to_stata.sas - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (1980_bigcells_new1.dta, 1980_bigcells_new2.dta, nw80.dta, iw80.dta, nw801.dta, nw802.dta, nw803.dta, nw804.dta, iw801.dta, iw802.dta, iw803.dta, iw804.dta). Takes as input bigcells.sas7bdat.

ii. t1_to_stata.sas - creates 1980_allcells_new2.dta. Takes as input allcells.sas7bdat

iii. indist_to_stata.sas - creates 1980_mfg.dta. Takes as input mfg.sas7bdat

1990

read90.do - reads the state-specific files of the 1990 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is all90.dta. Takes as input:

i. Census of Population and Housing, 1990 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 9952). Download it here.
read_all90.sas, which creates all90.sas7bdat. Takes as input all90.dta.
Run the scripts provided by Card.
i. np2.sas - creates a working data set of wage-earners age 18+, with recodes, etc. This is np90.sas7bdat. These data are used to build wage outcomes. Takes as input all90.sas7bdat. *reads the code in smsarecode90.sas to re-code msa's.

ii. allnp2.sas - creates a working data set of EVERYONE age 18+, with recodes, etc. This is supp90.sas7bdat. These data are used to build supply variables. Takes as input all90.sas7bdat. *reads the code in smsarecode90.sas to re-code msa's.

iii. cell1.sas - creates a big summary of data by cell ==> bigcells.sas7bdat. Takes as input np90.sas7bdat.

iv.t1.sas- creates a big summary of data by cell ==> allcells.sas7bdat. Takes as input supp90.sas7bdat.

v. supply1.sas - gets supply measures ==> cellsupply.sas7bdat. Takes as input np90.sas7bdat.

vi. imm1.sas - gets counts of immigrants by sending country in each city ==>ic_city.sas7bdat (IC is Card's classification of sending countries). Takes as input `supp90.sas7bdat.

vii. indist.sas - gets fraction of workers in MFG by city. Takes as input np90.sas7bdat.
Export some datasets to Stata:
i. cell1_to_stata.sas - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (1990_bigcells_new1.dta, 1990_bigcells_new2.dta, nw90.dta, iw90.dta, nw901.dta, nw902.dta, nw903.dta, nw904.dta, iw901.dta, iw902.dta, iw903.dta, iw904.dta). Takes as input bigcells.sas7bdat.

ii. t1_to_stata.sas - creates 1990_allcells_new2.dta. Takes as input allcells.sas7bdat

iii. indist_to_stata.sas - creates 1990_mfg.dta. Takes as input mfg.sas7bdat

2000

read2000.do - reads the state-specific files of the 2000 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is all2000.dta. Takes as input:

i. Census of Population and Housing, 2000 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 13568). Download it here.
read_all2000.sas, which creates all2000.sas7bdat. Takes as input all2000.dta.
Run the scripts provided by Card.
i. np2.sas - creates a working data set of wage-earners age 18+, with recodes, etc. This is np2000.sas7bdat. These data are used to build wage outcomes. Takes as input all2000.sas7bdat.

ii. allnp2.sas - creates a working data set of EVERYONE age 18+, with recodes, etc. This is supp2000.sas7bdat. These data are used to build supply variables. Takes as input all2000.sas7bdat.

iii. cell1.sas - creates a big summary of data by cell ==> bigcells.sas7bdat. Takes as input np2000.sas7bdat.

iv. t1.sas - creates a big summary of data by cell ==> allcells.sas7bdat. Takes as input supp2000.sas7bdat.

v. supply1.sas - gets supply measures ==> cellsupply.sas7bdat. Takes as input np2000.sas7bdat.

vi. imm3.sas - gets counts of immigrants by sending country in each city ==> ic_citynew.sas7bdat (IC is Card's classification of sending countries). Takes as input supp2000.sas7bdat.

vii. imm2.sas - gets a count of immigrants present in 2000 by IC - this is used to construct the instrumental variable ==> byicnew.sas7bdat. Takes as input supp2000.

viii. inflow3.sas - constructs the supply push instrument by "education and experience cell" and city. This is newflows.sas7bdat'. Takes as input ic_city.sas7bdat(output ofimm1.sas' in 1980) and byicnew.sas7bdat (output of imm2.sas in 2000).
Export some datasets to Stata:
i. cell1_to_stata - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (2000_bigcells_new1.dta, 2000_bigcells_new2.dta, nw.dta, iw.dta, nw.dta, nw.dta, nw.dta, nw.dta, iw.dta, iw.dta, iw.dta, iw.dta). Takes as input bigcells.sas7bdat.

ii. t1_to_stata - creates 2000_allcells_new1.dta and 2000_allcells_new2.dta. Takes as input allcells.sas7bdat.

iii. inflow3_to_stata - exports `newflows.sas7bdat' to dta.

Replicate Table 6 of Card (2009)

table6.do - replicates Table 6 of Card (2009) and constructs the dataset input_card.dta. Takes as input the Stata datasets exported from SAS (cited above) for 1980, 1990, and 2000.

More comparisons

The main directories for this exercise are:

Card's original code and lst files: here
Our SAS code and lst files: here. Our code is a very slight modification of Card's code to adjust paths and things like that. We run the code using our dataset downloaded from ICPSR instead of Card's original dataset (which we don't have).
Our Stata code: here

For replicating table 6, we need data from 1980-2000, but not from 2005/06. The complete list of scripts needed to replicate Table 6 is below.

For 1980: the list of scripts can be found in Card's README.
For 1990: the list of scripts can be found in Card's README
For 2000: the list of scripts can be found in Card's README

What we already know that I won't repeat in length:

Table 2 (Characteristics of Immigrants in 2000): We already discussed this in Issues #9
and #11. Except for the fact that we have two-hundred thousand more natives in our sample as Card does, the summary statistics look exactly the same across the 3 exercises (Card's original results, Stata with our dataset, SAS with our dataset).
Table 3 (Summary statistics for samples from 1980, 1990, 2000, 2005/06): Also discussed in Issues #9 and #10. As we know, the summary statistics for the first 4 columns look (almost) exactly the same across exercises. We can see very small differences in 1990. What was worrying us more were the differences in the last two columns: the overall variance of log wage and the residual variance of log wage. I will talk about this below.

New developments

The last two columns of Table 3 look different if ran in SAS vs. Stata. In SAS, "PROC GLM" is used to first regress log wage on a bunch of things and get the residual. Then, the variance of the log wage and of the residuals are calculated across MSAs. The original code for 1980 can be found here (starting in line 176). In Stata, instead of using "PROC GLM", I just use "reg". This generates different results.

Note also that this Stata results for the variances look a bit different from the ones I reported in Issue #9. I fixed a couple of things since then and thus the results now are closer to Card's.

		Overall			Residual
		Card	SAS	Stata	Card	SAS	Stata
Native men	1980	0.385	0.387	0.386	0.288	0.288	0.288
	1990	0.462	0.452	0.452	0.322	0.319	0.317
	2000	0.487	0.486	0.486	0.353	0.358	0.358

Native women	1980	0.317	0.316	0.316	0.269	0.268	0.268
	1990	0.382	0.381	0.381	0.295	0.294	0.294
	2000	0.408	0.408	0.408	0.313	0.320	0.320

Immigrant men	1980	0.444	0.444	0.444	0.321	0.321	0.334
	1990	0.517	0.513	0.513	0.347	0.342	0.364
	2000	0.557	0.557	0.557	0.390	0.391	0.409

Immigrant women	1980	0.343	0.343	0.343	0.291	0.291	0.296
	1990	0.414	0.413	0.413	0.318	0.317	0.330
	2000	0.484	0.484	0.484	0.367	0.369	0.380

If I then export the SAS dataset that generated the variances above and use it for generating Table 6, I get a Table 6 that:

it's different from our previous table (in Issue #9)
it's even more different from Card's original table
but then the SAS and Stata tables agree (indicating that most differences between our results in Stata and SAS were coming from the script that generates the residuals and variances).

The table 6 we get from both SAS and Stata can be found below. The tables were generated by Stata, but the equivalent results in SAS can be found in this link:

OLS estimates for High School: The coefficient estimates and R2 agree exactly between SAS and Stata. The standard errors are different.
- Column 1 regression results start at line 595 (R2 at line 505 and coefficient at line 516)
- Column 2 regression results start at line 568 (R2 at line 578 and coefficient at line 589)
IV estimates for High School:
- Column 3 regression results start at line 1243.
  - Coefficient estimates don't agree exactly but very close.
  - Stata R2 is 0.203, SAS R2 is 0.145
  - The 1st stage of column Column 3 starts at line 818. Stata 1st stage t-statistic is 5.53, SAS 1st stage t-statistic is 7.87.

------------------------------------------------------------------------------------
                                1               2               3               4   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.030***       -0.030***       -0.036***       -0.036***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.095***       -0.094***       -0.104***       -0.106***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.101***        0.100***        0.113***        0.114***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980          0.099***        0.098***        0.121***        0.124***
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990         -0.007          -0.006          -0.014          -0.016   
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.135***        0.137***        0.141***        0.136***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res imm 1980          -0.160***       -0.162***       -0.169***       -0.164***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Mfg share in 1980          -0.225***       -0.226***       -0.258***       -0.256***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990           0.194***        0.195***        0.227***        0.223***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                              0.004                          -0.009***
                                           (0.00)                          (0.00)   
constant                   -0.128***       -0.127***       -0.159***       -0.160***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.210           0.210           0.203           0.203   
------------------------------------------------------------------------------------

------------------------------------------------------------------------------------
                                5               6               7               8   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.058***       -0.054***       -0.078***       -0.072***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.040***       -0.039***       -0.058***       -0.054***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.031***        0.034***        0.053***        0.052***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980         -0.055***       -0.114***       -0.003          -0.052** 
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990         -0.022           0.045***       -0.045***        0.005   
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.309***        0.363***        0.338***        0.371***
                           (0.00)          (0.01)          (0.01)          (0.01)   
Wage res imm 1980          -0.224***       -0.287***       -0.248***       -0.288***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Mfg share in 1980          -0.373***       -0.422***       -0.377***       -0.410***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990           0.499***        0.546***        0.484***        0.518***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                              0.137***                        0.095***
                                           (0.00)                          (0.00)   
constant                   -0.061***       -0.085***       -0.139***       -0.143***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.386           0.401           0.356           0.379   
------------------------------------------------------------------------------------

Many data checks using the .lst files

When dealing with SAS, we have two important files: the .sas and the .lst files. The .sas files are the scripts. The .lst files store any results that were printed while running the script. Thus, .lst saves the results from "PROC MEANS", "PROC PRINT", "POROC GLM", etc. Luckily, we have (almost) all of Card's .lst files. Thus, we can compare our results after running each single script by comparing the .lst files.

Take aways

1990 is the most different year. I think something changed with this dataset since Card used it. In 1980 and 2000, we have exactly the same number of immigrants as Card does (but more natives). In this dataset, both the number of immigrants and natives are different from Card's. The summary statistics are also a bit different, while the summary statistics from 1980 and 2000 look the same as Card's.

Data checks by yea using the .lst files

1980

	Us	Card
np2	link	link
allnp2	link	link
cell1	link	link
t1	link	link
supply1	link	link
imm1	link	link
indist	link	link

1990

	Us	Card
np2	link	link
allnp2	link	link
cell1	link	link
t1	link	link
supply1	link	link
imm1	link	link
indist	link	link

2000

	Us	Card
np2	link	link
allnp2	link	link
cell1	link	link
t1	link	link
supply1	link	link
imm3	link	link
imm2	link	link
inflow3	link	link

Tables using ICPSR in SAS

Table 2

Looks exactly the same as in Stata. For the Stata version, see #9.

Note that ic = -2 refers to everybody (natives + immigrants), ic = -1 refers to natives and ic = 0 refers to immigrants. And below is the dict for the remaining ic:

1 "mexico"
2 "phillipines"
3 "india"
4 "vietnam"
5 "el salvador"
6 "china"
7 "cuba"
8 "dominican republic"
9 "korea"
10 "jamaica"
11 "canada"
12 "colombia"
13 "guatemala"
14 "germany"
15 "haiti"
16 "poland"

Table 3

First 4 columns of Table 3 (I am omitting 2005/2006 because it uses ACS data, and not ICPSR data).
The first 4 columns look exactly the same as the corresponding columns in the Stata table found in #9.

The relevant observations are in rows 10-21.

Pre-trends

For each of the plots below, I do the following:

Run one regression for each year (1980, 1990, 2000) and store the coefficient on the instrument. For each regression
- the dependent variable is the appropriate one (either difference in mean wage residuals for HS-equivalent workers or for College-equivalent workers)
- the set of controls is the same across all regressions. Note: None of the regressions includes the lagged dependent variable in the set of controls
- for the "country" regressions, the instrument is the share of immigrants from that country living in location l in 1980
- for the "aggregate bartik" regressions, the instrument is the appropriate one (either predicted inflow of HS-equivalent workers or of College-equivalent workers)
Plot the coefficients on the instrument.

Note: The x-axis is showing 5-year intervals, but we only have data for 1980, 1990, and 2000. I will fix that.

High School equivalent workers

College equivalent workers

Data cleaning

1. Defining who is immigrant

Card:
Defines as immigrant people who were naturalized citizen or who are still not citizens.

citizen='0=us born 1=nat 2=not cit 3=born abroad us parents'
imm=(citizen in (1,2))

Victoria:
Card + 4 + 5

 /* CITIZEN:
           0 n/a
           1 born abroad of american parents
           2 naturalized citizen
           3 not a citizen
           4 not a citizen, but has received first papers
           5 foreign born, citizenship status not reported
*/
gen imm = .
replace imm = 1 if citizen == 2 | citizen == 3 | citizen == 4 | citizen == 5

2. Hours worked last year

Card:
His data seems to have the exact number of weeks people worked last year. His code is the following:

annhrs=weeks*hrswkly;

That is, total annual hours = weeks * weekly hours

Victoria:
The Bartik data has multiple bins for number of weeks worked last year.

WKSWORK2:
           0 n/a
           1 1-13 weeks
           2 14-26 weeks
           3 27-39 weeks
           4 40-47 weeks
           5 48-49 weeks
           6 50-52 weeks

Thus, I am currently getting an average of those.

gen weeks = .
replace weeks = 0 if wkswork2 == 0
replace weeks = 7 if wkswork2 == 1 
replace weeks = 20 if wkswork2 == 2
replace weeks = 33 if wkswork2 == 3
replace weeks = 43.5 if wkswork2 == 4
replace weeks = 48.5 if wkswork2 == 5
replace weeks = 51.5 if wkswork2 == 6

3. Education labels

Card: Census data has exactly one category for each grade and it also has information on whether the person completed the grade.

GRADE             2     40                                                 
                            Highest Year of School                              
                              Attended                                          
                  00        Never attended school or N/A (under 3               
                              years of age)                                     
                  01        Nursery school                                      
                  02        Kindergarten                                        
                            Elementary:                                         
                  03          First grade                                       
                  04          Second grade                                      
                  05          Third grade                                       
                  06          Fourth grade                                      
                  07          Fifth grade                                       
                  08          Sixth grade                                       
                  09          Seventh grade                                     
                  10          Eighth grade                                      
                            High school:                                        
                  11          Ninth grade                                       
                  12          Tenth grade                                       
                  13          Eleventh grade                                    
                  14          Twelfth grade                                     
                            College:                                            
                  15          First year                                        
                  16          Second year                                       
                  17          Third year                                        
                  18          Fourth year                                       
                  19          Fifth year                                        
                  20          Sixth year                                        
                  21          Seventh year                                      
                  22          Eighth year or more

Victoria:
Bartik data has too many categories and the numbers don't really add up:

EDUCD:
           0 n/a or no schooling
           1 n/a
           2 no schooling completed
          10 nursery school to grade 4
          11 nursery school, preschool
          12 kindergarten
          13 grade 1, 2, 3, or 4
          14 grade 1
          15 grade 2
          16 grade 3
          17 grade 4
          20 grade 5, 6, 7, or 8
          21 grade 5 or 6
          22 grade 5
          23 grade 6
          24 grade 7 or 8
          25 grade 7
          26 grade 8
          30 grade 9
          40 grade 10
          50 grade 11
          60 grade 12
          61 12th grade, no diploma
          62 high school graduate or ged
          63 regular high school diploma
          64 ged or alternative credential
          65 some college, but less than 1 year
          70 1 year of college
          71 1 or more years of college credit, no degree
          80 2 years of college
          81 associate's degree, type not specified
          82 associate's degree, occupational program
          83 associate's degree, academic program
          90 3 years of college
         100 4 years of college
         101 bachelor's degree
         110 5+ years of college
         111 6 years of college (6+ in 1960-1970)
         112 7 years of college
         113 8+ years of college
         114 master's degree
         115 professional degree beyond a bachelor's degree
         116 doctoral degree
         999 missing

To see what I mean by "they don't really add up", consider:

educational attainment [detailed version]	Freq.	Percent	Cum.
grade 1, 2, 3, or 4	59,078	35.54	35.54
grade 1	9,674	5.82	41.36
grade 2	21,079	12.68	54.04
grade 3	37,034	22.28	76.32
grade 4	39,371	23.68	100.00
Total	166,236	100.00

4. Income measures

Card:

wagesal: Wage or Salary Income (INCOME1 in the 1980n Census variable dictionary)
selfinc: Nonfarm Self-Employment Income (INCOME2 in the 1980 Census variable dictionary)
farminc: Farm Self-Employment Income (INCOME3 in the 1980 Census variable dictionary)
income: Income From All Sources (INCOME8 in the 1980 Census variable dictionary)

Then he defines self-employed as anyone who has a positive (selfinc + farminc)

Victoria:

inctot: total personal income
ftotinc: total family income
incwage: wage and salary income
incbus00: business and farm income, 2000
incearn: total personal earned income

Variable	Obs	Mean	Std. Dev.	Min	Max
inctot	21,864,217	23201.98	34290.62	-20000	1471000
ftotinc	21,864,217	249646.6	1404037	-30000	9999999
incwage	21,864,217	19101.37	19101.37	0	641000
incbus00	10,986,023	2195.484	15365.94	-10000	573000
incearn	16,810,374	24123.92	35430.46	-19996	1146000

Bartik dataset has no measure of self-employed earnings, so I will use this other variable to define self-employment:

CLASSWKRD:
           0 n/a
          10 self-employed
          11 employer
          12 working on own account
          13 self-employed, not incorporated
          14 self-employed, incorporated
          20 works for wages
          21 works on salary (1920)
          22 wage/salary, private
          23 wage/salary at non-profit
          24 wage/salary, government
          25 federal govt employee
          26 armed forces
          27 state govt employee
          28 local govt employee
          29 unpaid family worker

https://www.dropbox.com/s/8jiij8ntdq1lcau/Screenshot%202019-01-16%2014.56.49.png?dl=0

5. Country codes - grouping into 38 groups

Card: The country codes used by Card can be found in Appendix F of the Codebook for the 1980 5% extracts, available from ICPSR.

He groups countries into 38 groups:

mexico
phillip
india
vietnam
el salvador
china
cuba
dominican rep. 
korea
jamaica
canada
columbia
guatemala
germany
haiti 
poland
taiwan
england
italy
ecuador
japan
iran
honduras
peru
russia
nicaragua
guyana
pakistan
hong kong
trinidad-tobago
west europe+isreal+cyprus+auss+nz
east europe incl romania ukraine yugoslav
middle east turkey bulgaria and the stans
asia and oceana
s america + north am nec
africa
caribbean + central am
else

Somewhat unrelated note: Later on, Card creates even broader categories of countries (e.g., european, high asia, mid asia, mexico), and he includes Canada in the european group, Pakistan and Iran in the high asia group,

Victoria:
Issue: The Bartik dataset doesn't have 15/38 groups used by Card:

el salvador
dominican rep. 
jamaica 
colombia 
guatemala 
haiti
taiwan
ecuador
honduras 
peru
nicaragua
guyana
pakistan
hong kong
trinidad-tobago

For these groups, instead of using a person's place of birth I use whether the person is an immigrant combined with her primary ancestry (using the variable ancestr1). So if a person is an immigrant and her first response for ancestry is "salvadoran", I count her as having been born in El Salvador. This is of course not perfect, since some immigrants report being born in a country different than the ancestor.

6. Years in the US

Card:
Census data has the immigration year. So the 1980 Census, for example, has a variable that looks like

     IMMIGR            1     26                                                 
                            Year of Immigration                                 
                   0        N/A (born in the United States or                   
                              outlying areas or born abroad of                  
                              American parents)                                 
                   1        1975 to 1980                                        
                   2        1970 to 1974                                        
                   3        1965 to 1969                                        
                   4        1960 to 1964                                        
                   5        1950 to 1959                                        
                   6        Before 1950

So he approximates how many years the person has been in the U.S. using that variable. This allows him to distinguish between people who have been in the U.S. for 20+ years vs. 40+ years.

if immyr=1 then yrsinus=2.5;
else if immyr=2 then yrsinus=7.5;
else if immyr=3 then yrsinus=12.5;
else if immyr=4 then yrsinus=17.5;
else if immyr=5 then yrsinus=25.5;
else if immyr=6 then yrsinus=40;
else yrsinus=.;

Victoria:
The Bartik data, on the other hand, only says if the person has been in the US for 21+ years, so don't have the same level of granularity and I am not sure how many years to put. Note: For each obs, we have the person's date of birth, so maybe we can use that to approx how many years in the U.S.

Right now, I am using 30 years for anyone who has been in the U.S. for 21+ years.

YRSUSA2:
           0 n/a
           1 0-5 years
           2 6-10 years
           3 11-15 years
           4 16-20 years
           5 21+ years
           9 missing

CZs not a unique identifier

Today in the meeting we saw that I should be merging files with only one observation per CZ. However, some files seem to have by construction more than one obs per CZ.

Just to document a simple example.

The cell1 script in the 2000 folder collapses the data by the variables: rczone, native, male, eclass and xclass2. The resulting dataset is called bigcells.dta

In the table6 script, we load the bigcells dataset and for each value that eclass takes, we keep only the male and native observations. However, rczone is not yet a unique identifier since there are 4 values for xclass2 for each obs that is male, native, and in a specific eclass.

xclass2 is an experience variable and eclass is an education variable. Each of them has 4 categories.

Variable definitions in Card's code:

if educ<12 then eclass=1;
else if educ=12 then eclass=2;
else if educ<16 then eclass=3;
else eclass=4;

if exp<=10 then xclass2=1;
else if exp<=20 then xclass2=2;
else if exp<=30 then xclass2=3;
else xclass2=4;
c=1;

Below are snippets of the code that saves bigcells and also the ones that use bigcells

Script cell1 saves bigcells

SAS code

proc summary;
class rmsa native male eclass xclass2;
var logwage2 lw2sq res ressq pred predsq respred imm female educ exp c
         dropout hs somecoll collplus college advanced    ;
output out=here.bigcells
mean=
sum(c)=count;
weight wt;

Script table6 uses bigcells

SAS code

*this macro gets native wages by eclass;
%macro nwage(ed);
%let edg=&ed;
data nw&edg;
set here.bigcells;
if native=1 and male=1 and eclass=&edg and xclass2=. ;
nwage&edg=logwage2;
nres&edg=res;
npred&edg=pred;
ncountw&edg=count / 1000;
keep rmsa nwage&edg npred&edg nres&edg ncountw&edg;
proc sort; by rmsa;
%mend;

Stata code

use data/2000/bigcells.dta, clear
local education_groups 1 2 3 4 
forv i=1/4{
	local edg : word `i' of `education_groups'
	preserve
	keep if native == 1 & male == 1 & eclass == `edg' 
	gen nwage`edg' = logwage2
	gen nres`edg' = res
	gen npred`edg' = pred
	gen ncountw`edg' = count/1000
	keep rczone nwage`edg' npred`edg' nres`edg' ncountw`edg' 
	sort rczone
	save data/2000/nw`edg'.dta, replace
	restore
}

           |                   eclass
   xclass2 |         1          2          3          4 |     Total
-----------+--------------------------------------------+----------
         1 |   175,066    464,765    295,094    292,924 | 1,227,849 
         2 |   155,468    529,215    302,775    315,280 | 1,302,738 
         3 |   152,129    526,231    283,651    309,079 | 1,271,090 
         4 |   153,342    442,467    169,002    166,318 |   931,129 
-----------+--------------------------------------------+----------
     Total |   636,005  1,962,678  1,050,522  1,083,601 | 4,732,806

Data checks

First attempt at replicating Table 1

	Working age population	Share of US population	Pct Immigrant	Pct Hispanic	Pct Minority	Pct Dropout	Pct High shchool	Pct Some college	Pct college or more	Mean wage2
All US	206238	100.1	12	11	25	15	42	22	22	66.73
Larger czones (top 100)	115544	56.1	17	14	32	14	38	22	26	66.355
Rest of country	90694	44	4.8	6.9	16	15	46	22	17	67.209
1st largest czone	8820	4.3	41	39	58	22	33	22	23	78.18
2nd largest czone	6544	3.2	37	22	51	16	36	19	30	85.555
3rd largest czone	4510	2.2	23	17	41	14	35	21	30	66.851
4th largest czone	3245	1.6	29	17	39	11	37	18	34	69.28
5th largest czone	3021	1.5	8.8	5.9	29	11	42	19	28	67.813
6th largest czone	2989	1.5	8.2	2.9	26	11	41	24	24	67.921
7th largest czone	2906	1.4	16	6.4	16	9.1	34	20	37	62.88
8th largest czone	2821	1.4	22	9.3	42	10	30	19	42	58.961
9th largest czone	2657	1.3	33	17	46	11	28	24	37	69.416
10th largest czone	2643	1.3	25	27	49	20	36	20	24	71.92
11th largest czone	2527	1.2	13	6.7	39	13	36	21	30	61.567
12th largest czone	2238	1.1	13	5	17	8.1	35	27	30	62.136

vdquadros / immigration_enclave Goto Github PK

immigration_enclave's People

Contributors

Stargazers

Forkers

immigration_enclave's Issues

Table 2

Table 3

Table 6

Dataset

Code for weights

1980

1990

2000

Replicate Table 6 of Card (2009)

Many data checks using the .lst files

Table 2

Table 3

High School equivalent workers

College equivalent workers

1. Defining who is immigrant

2. Hours worked last year

3. Education labels

4. Income measures

5. Country codes - grouping into 38 groups

6. Years in the US

Script cell1 saves bigcells

Script table6 uses bigcells

First attempt at replicating Table 1

Recommend Projects

Recommend Topics

Recommend Org