Git Product home page Git Product logo

immigration_enclave's People

Contributors

vdquadros avatar

Stargazers

 avatar  avatar

Forkers

paulgp

immigration_enclave's Issues

Regressions

Just want to let you know that I don’t have the regressions yet because there’s something strange in the regressions to get the residuals:

If I do:

use data/1980/nm.dta, clear
sort eclass xclass nonmover
by eclass xclass: reg logwage2 exp exp2 exp3 educ eclass#xclass inschool advanced ft lowhrs hisp_ed hisp_coll black_ed black_coll asian_ed asian_coll nonmover#eclass rczone0 [fweight=wt]

then the display window says:


-> eclass = 2, xclass = 1
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 2
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 3
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 4
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 5
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 6
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 7
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 8
no observations

-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 9
no observations

But if I tab those variables, I get:


           |                                               xclass
    eclass |         1          2          3          4          5          6          7          8          9 |     Total
-----------+---------------------------------------------------------------------------------------------------+----------
         1 |    74,627     50,389     38,034     38,223     39,695     40,478     47,688     57,422     51,186 |   437,742 
         2 |   135,282    105,828     82,566     70,376     59,073     50,111     45,000     41,862     28,709 |   618,807 
         3 |   126,470    118,687     82,837     50,836     38,542     33,795     30,174     22,995     12,400 |   516,736 
         4 |    23,032     34,213     26,976     18,423     14,979     12,798      9,959      6,420      3,265 |   150,065 
-----------+---------------------------------------------------------------------------------------------------+----------
     Total |   359,411    309,117    230,413    177,858    152,289    137,182    132,821    128,699     95,560 | 1,723,350 

So I don't know what's going on, and the residuals are very low (like .1, and not .3)

Tables using ICPSR in Stata

Table 2

Looks exactly the same as Card's.

Country of origin Working age population (thousands) Share of all Immigrants (percent) After 1980 After 1990 Mean years completed Dropouts 12-15 years College or more
Natives 141,475 13.3 14.2 60.6 25.2
Immigrants 23,627 100 70.5 39.9 11.6 37.4 38.8 23.8
Mexico 7,267 30.8 75.1 43.8 8.6 69.8 26.5 3.7
Philippines 1,078 4.6 66.1 31.5 14.1 9.2 43.7 47
India 838 3.5 78.4 51.4 15.6 9.6 20.2 70.2
Vietnam 806 3.4 75.3 39.7 11.7 34.6 45.8 19.6
China 715 3 82 50.1 13.6 24.2 29.2 46.7
El Salvador 698 3 85.1 37 8.9 65 30.6 4.4
Korea 664 2.8 66.4 33.1 14 10.6 45.8 43.6
Cuba 586 2.5 52.3 29.1 12.5 30 48.3 21.7
Dominican Republic 536 2.3 74.2 38.1 10.8 48.8 41.9 9.3
Canada 517 2.2 47.6 31.9 14.3 8.9 49.8 41.3
Germany 455 1.9 32.6 21 13.9 8.3 59.3 32.4
Jamaica 429 1.8 66.7 27.3 12.6 23.8 57.8 18.4
Colombia 400 1.7 71.9 40.5 12.5 24.7 53.3 21.9
Guatemala 400 1.7 84 45.9 8.8 64.5 30.4 5.1
Haiti 333 1.4 75.1 34.5 11.8 35.2 51.3 13.5
Poland 310 1.3 74.5 42.3 13.3 16.3 58.2 25.6

Table 3

Imm status/gender Year Education Experience Employment rate( %) Mean wage Overall Variance (log wage) Residual
Native men 1980 12.5 18.8 90.2 25.07 0.379 0.283
1990 13.0 18.9 89.3 23.90 0.452 0.319
2000 13.2 20.4 86.8 25.84 0.486 0.358
Native women 1980 12.2 19.7 65.4 16.73 0.315 0.267
1990 12.8 19.4 74.9 17.07 0.381 0.294
2000 13.3 20.7 77.1 19.52 0.408 0.320
Immigrant men 1980 11.6 19.1 87.5 24.49 0.435 0.327
1990 11.4 18.1 87.1 21.83 0.513 0.370
2000 11.6 18.8 86.5 23.21 0.557 0.409
Immigrant women 1980 11.0 20.6 60.0 17.15 0.342 0.295
1990 11.2 19.9 65.1 16.96 0.413 0.331
2000 11.7 20.0 64.8 19.27 0.484 0.381

Table 6

Columns (1)-(4) are still a bit strange, especially because the R2 is lower and the sign for the lagged dependent variable is negative (instead of positive as in Card's paper). Columns (5)-(8) are pretty close.

------------------------------------------------------------------------------------
                                1               2               3               4   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.020***       -0.020***       -0.025***       -0.026***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.076***       -0.077***       -0.083***       -0.086***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.077***        0.078***        0.086***        0.089***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980          0.078***        0.083***        0.091***        0.102***
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990         -0.044***       -0.050***       -0.048***       -0.062***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.139***        0.134***        0.137***        0.125***
                           (0.00)          (0.00)          (0.00)          (0.01)   
Wage res imm 1980          -0.166***       -0.160***       -0.166***       -0.152***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Mfg share in 1980          -0.049***       -0.038***       -0.080***       -0.057***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990          -0.055***       -0.070***       -0.023          -0.056***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                             -0.012***                       -0.026***
                                           (0.00)                          (0.00)   
constant                   -0.054***       -0.053***       -0.078***       -0.079***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.145           0.145           0.140           0.139   
------------------------------------------------------------------------------------


------------------------------------------------------------------------------------
                                5               6               7               8   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.055***       -0.048***       -0.076***       -0.066***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.021***       -0.023***       -0.039***       -0.036***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.011***        0.020***        0.035***        0.035***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980         -0.177***       -0.176***       -0.129***       -0.141***
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990          0.047***        0.085***        0.025           0.059***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.214***        0.323***        0.214***        0.295***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res imm 1980          -0.132***       -0.252***       -0.125***       -0.217***
                           (0.00)          (0.00)          (0.00)          (0.01)   
Mfg share in 1980          -0.367***       -0.410***       -0.388***       -0.415***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990           0.495***        0.540***        0.493***        0.527***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                              0.183***                        0.137***
                                           (0.00)                          (0.00)   
constant                   -0.048***       -0.081***       -0.132***       -0.135***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.344           0.373           0.313           0.355   
------------------------------------------------------------------------------------


Over ID Tests

Hi Isaac,

In the meeting we talked about the Over ID test.

Could you please confirm that our Over ID test would look like the below?

ivregress 2sls  `y'  `controls' (`x'= shric*)  [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"

where y is the difference in wages, the controls are the same as Card's controls, x is the relative share of natives and imm, and the shric* is the share of imm fixed in 1980 (for all countries by city).

If so, the Over ID test does not reject the null (p = 0.2204).

This is for High School workers.

Since you did not write the code, I am also copying below the equivalent piece of code for the canonical Bartik and the ADH example.

Canonical:

ivregress 2sls  `y'  `controls' czone_* year_*  (`x'= t1990_init_sh_ind_* t2000_init_sh_ind_*  )  [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"

ADH:

ivregress 2sls  `y'  `controls' i.t2  (`x'= t1990_sh_ind_2011-t1990_sh_ind_3931 t2000_sh_ind_2011-t2000_sh_ind_3931)  [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"

Bartik weights

Hi @econisaac,

Could you please take a look at this and make sure it makes sense/agrees with what we talked about in the meeting on Tuesday?

Thanks,
Victoria

Dataset

To calculate the Bartik weights, I created a dataset of the form:

  • 124 observations (one for each MSA).
  • 130 variables:
    • rmsa
    • Dependent variables (2 variables):
      • For high school-equivalent workers:
        • resgap2: Difference between the mean wage residuals of HS-equivalent immigrant and native workers in each MSA in 2000.
      • For college-equivalent workers:
        • resgap4: Difference between the mean wage residuals of college-equivalent immigrant and native workers in each MSA in 2000.
    • Instruments (38 variables):
      • shric1-shric38: This corresponds to one variable for each of the 38 countries. It is the fraction of earlier immigrants from country k who lived in location l in 1980. For example, the variable shric1 takes on 124 distinct values, giving us the share of immigrants from Mexico that lived in each of the top 124 MSAs in 1980.
        In order to agree with Card's instrument, we divide the variables by the city population in 2000. So the values of shric1-shric38 for each row are divided by the corresponding MSA's population in 2000.
    • Regression controls (12 variables):
      • logsize80: Log city size in 1980
      • logsize90: Log city size in 1990
      • coll80: Share of the MSA population with college in 1980
      • coll90: Share of the MSA population with college in 1990
      • nres80: Mean wage residuals for natives living in the MSA in 1980
      • ires80: Mean wage residuals for immigrants living in the MSA in 1980
      • nres90: Mean wage residuals for natives living in the MSA in 1990
      • ires90: Mean wage residuals for immigrants living in the MSA in 1990
      • mfg80: Share of MSA workers in manufacturing in 1980
      • mfg90: Share of MSA workers in manufacturing in 1990
      • resgap902: Lagged dependent variable for high school-equivalent workers. Currently not included in the list of controls.
      • resgap904: Lagged dependent variable for college-equivalent workers. Currently not included in the list of controls.
    • "Growth rates" (76 variables):
      • For high school equivalent workers (38 variables):
        • hs_imm_ic1-hs_imm_ic38: Number of HS-equivalent immigrant workers who arrived in the US between 1990 and 2000. One variable for each country ic1-ic38. This variables are constant across MSAs, so their values are repeated for all of the 124 observations.
      • For college equivalent workers (38 variables):
        • coll_imm_ic1-coll_imm_ic38: Number of college-equivalent immigrant workers who arrived in the US between 1990 and 2000. One variable for each country ic1-ic38. This variables are constant across MSAs, so their values are repeated for all of the 124 observations.
    • Regression weight (1 variable):
      • count90: MSA population in 1990

Code for weights

I am going to have two scripts, one for the high school-equivalent workers and one for the college-equivalent workers. I am copying below the code for HS-equivalent workers.

set seed 12345
use data/prepared_bartik.dta, clear

local controls logsize80 logsize90 coll80 coll90 ires80 nres80 mfg80 mfg90
local weight count90

local y resgap2
local x relshs
*local z shric*

local ind_stub shric*
local growth_stub hs_imm_ic*

* local time_var year
local cluster_var rmsa

foreach ind_var of varlist `ind_stub'* {
	replace `ind_var' = `ind_var' * 100
	}

forvalues k = 1(1)38 {
	egen agg_sh_ind_`k' = rowtotal(shric`k')
	}

	
bartik_weight, z(`ind_stub'*) weightstub(`growth_stub'*) x(`x') y(`y') controls(`controls') weight_var(`weight')

Data construction for Card

1980

  1. read80.do - reads the state-specific files of the 1980 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is all80.dta. Takes as input:

    i. Census of Population and Housing, 1980 [United States]: Public Use Microdata Sample (A Sample): 5-Percent Sample (ICPSR 8101). Download it here.

  2. read_all80.sas, which creates all80.sas7bdat. Takes as input all80.dta.

  3. Run the scripts provided by Card.
    i. np2.sas - creates a working data set of wage-earners age 18+, with recodes, etc. This is np80.sas7bdat. These data are used to build wage outcomes. Takes as input all80.sas7bdat. *reads the code in smsarecode80.sas to re-code msa's.

    ii. allnp2.sas - creates a working data set of EVERYONE age 18+, with recodes, etc. This is supp80.sas7bdat. These data are used to build supply variables. Takes as input all80.sas7bdat. *reads the code in smsarecode80.sas to re-code msa's.

    iii. cell1.sas - creates a big summary of data by cell ==> bigcells.sas7bdat. Takes as input np80.sas7bdat.

    iv.t1.sas- creates a big summary of data by cell ==> allcells.sas7bdat. Takes as input supp80.sas7bdat.

    v. supply1.sas - gets supply measures ==> cellsupply.sas7bdat. Takes as input np80.sas7bdat.

    vi. imm1.sas - gets counts of immigrants by sending country in each city ==>ic_city.sas7bdat (IC is Card's classification of sending countries). Takes as input `supp80.sas7bdat.

    vii.indist.sas - gets fraction of workers in MFG by city. Takes as input np80.sas7bdat.

  4. Export some datasets to Stata:
    i. cell1_to_stata.sas - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (1980_bigcells_new1.dta, 1980_bigcells_new2.dta, nw80.dta, iw80.dta, nw801.dta, nw802.dta, nw803.dta, nw804.dta, iw801.dta, iw802.dta, iw803.dta, iw804.dta). Takes as input bigcells.sas7bdat.

    ii. t1_to_stata.sas - creates 1980_allcells_new2.dta. Takes as input allcells.sas7bdat

    iii. indist_to_stata.sas - creates 1980_mfg.dta. Takes as input mfg.sas7bdat

1990

  1. read90.do - reads the state-specific files of the 1990 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is all90.dta. Takes as input:

    i. Census of Population and Housing, 1990 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 9952). Download it here.

  2. read_all90.sas, which creates all90.sas7bdat. Takes as input all90.dta.

  3. Run the scripts provided by Card.
    i. np2.sas - creates a working data set of wage-earners age 18+, with recodes, etc. This is np90.sas7bdat. These data are used to build wage outcomes. Takes as input all90.sas7bdat. *reads the code in smsarecode90.sas to re-code msa's.

    ii. allnp2.sas - creates a working data set of EVERYONE age 18+, with recodes, etc. This is supp90.sas7bdat. These data are used to build supply variables. Takes as input all90.sas7bdat. *reads the code in smsarecode90.sas to re-code msa's.

    iii. cell1.sas - creates a big summary of data by cell ==> bigcells.sas7bdat. Takes as input np90.sas7bdat.

    iv.t1.sas- creates a big summary of data by cell ==> allcells.sas7bdat. Takes as input supp90.sas7bdat.

    v. supply1.sas - gets supply measures ==> cellsupply.sas7bdat. Takes as input np90.sas7bdat.

    vi. imm1.sas - gets counts of immigrants by sending country in each city ==>ic_city.sas7bdat (IC is Card's classification of sending countries). Takes as input `supp90.sas7bdat.

    vii. indist.sas - gets fraction of workers in MFG by city. Takes as input np90.sas7bdat.

  4. Export some datasets to Stata:
    i. cell1_to_stata.sas - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (1990_bigcells_new1.dta, 1990_bigcells_new2.dta, nw90.dta, iw90.dta, nw901.dta, nw902.dta, nw903.dta, nw904.dta, iw901.dta, iw902.dta, iw903.dta, iw904.dta). Takes as input bigcells.sas7bdat.

    ii. t1_to_stata.sas - creates 1990_allcells_new2.dta. Takes as input allcells.sas7bdat

    iii. indist_to_stata.sas - creates 1990_mfg.dta. Takes as input mfg.sas7bdat

2000

  1. read2000.do - reads the state-specific files of the 2000 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output is all2000.dta. Takes as input:

    i. Census of Population and Housing, 2000 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 13568). Download it here.

  2. read_all2000.sas, which creates all2000.sas7bdat. Takes as input all2000.dta.

  3. Run the scripts provided by Card.
    i. np2.sas - creates a working data set of wage-earners age 18+, with recodes, etc. This is np2000.sas7bdat. These data are used to build wage outcomes. Takes as input all2000.sas7bdat.

    ii. allnp2.sas - creates a working data set of EVERYONE age 18+, with recodes, etc. This is supp2000.sas7bdat. These data are used to build supply variables. Takes as input all2000.sas7bdat.

    iii. cell1.sas - creates a big summary of data by cell ==> bigcells.sas7bdat. Takes as input np2000.sas7bdat.

    iv. t1.sas - creates a big summary of data by cell ==> allcells.sas7bdat. Takes as input supp2000.sas7bdat.

    v. supply1.sas - gets supply measures ==> cellsupply.sas7bdat. Takes as input np2000.sas7bdat.

    vi. imm3.sas - gets counts of immigrants by sending country in each city ==> ic_citynew.sas7bdat (IC is Card's classification of sending countries). Takes as input supp2000.sas7bdat.

    vii. imm2.sas - gets a count of immigrants present in 2000 by IC - this is used to construct the instrumental variable ==> byicnew.sas7bdat. Takes as input supp2000.

    viii. inflow3.sas - constructs the supply push instrument by "education and experience cell" and city. This is newflows.sas7bdat'. Takes as input ic_city.sas7bdat(output ofimm1.sas' in 1980) and byicnew.sas7bdat (output of imm2.sas in 2000).

  4. Export some datasets to Stata:
    i. cell1_to_stata - creates datasets on wages of immigrants and natives by education class. Exports them to Stata (2000_bigcells_new1.dta, 2000_bigcells_new2.dta, nw.dta, iw.dta, nw.dta, nw.dta, nw.dta, nw.dta, iw.dta, iw.dta, iw.dta, iw.dta). Takes as input bigcells.sas7bdat.

    ii. t1_to_stata - creates 2000_allcells_new1.dta and 2000_allcells_new2.dta. Takes as input allcells.sas7bdat.

    iii. inflow3_to_stata - exports `newflows.sas7bdat' to dta.

Replicate Table 6 of Card (2009)

  1. table6.do - replicates Table 6 of Card (2009) and constructs the dataset input_card.dta. Takes as input the Stata datasets exported from SAS (cited above) for 1980, 1990, and 2000.

More comparisons

The main directories for this exercise are:

  1. Card's original code and lst files: here
  2. Our SAS code and lst files: here. Our code is a very slight modification of Card's code to adjust paths and things like that. We run the code using our dataset downloaded from ICPSR instead of Card's original dataset (which we don't have).
  3. Our Stata code: here

For replicating table 6, we need data from 1980-2000, but not from 2005/06. The complete list of scripts needed to replicate Table 6 is below.

What we already know that I won't repeat in length:

  • Table 2 (Characteristics of Immigrants in 2000): We already discussed this in Issues #9
    and #11. Except for the fact that we have two-hundred thousand more natives in our sample as Card does, the summary statistics look exactly the same across the 3 exercises (Card's original results, Stata with our dataset, SAS with our dataset).
  • Table 3 (Summary statistics for samples from 1980, 1990, 2000, 2005/06): Also discussed in Issues #9 and #10. As we know, the summary statistics for the first 4 columns look (almost) exactly the same across exercises. We can see very small differences in 1990. What was worrying us more were the differences in the last two columns: the overall variance of log wage and the residual variance of log wage. I will talk about this below.

New developments

The last two columns of Table 3 look different if ran in SAS vs. Stata. In SAS, "PROC GLM" is used to first regress log wage on a bunch of things and get the residual. Then, the variance of the log wage and of the residuals are calculated across MSAs. The original code for 1980 can be found here (starting in line 176). In Stata, instead of using "PROC GLM", I just use "reg". This generates different results.

Note also that this Stata results for the variances look a bit different from the ones I reported in Issue #9. I fixed a couple of things since then and thus the results now are closer to Card's.

Overall Residual
Card SAS Stata Card SAS Stata
Native men 1980 0.385 0.387 0.386 0.288 0.288 0.288
1990 0.462 0.452 0.452 0.322 0.319 0.317
2000 0.487 0.486 0.486 0.353 0.358 0.358
Native women 1980 0.317 0.316 0.316 0.269 0.268 0.268
1990 0.382 0.381 0.381 0.295 0.294 0.294
2000 0.408 0.408 0.408 0.313 0.320 0.320
Immigrant men 1980 0.444 0.444 0.444 0.321 0.321 0.334
1990 0.517 0.513 0.513 0.347 0.342 0.364
2000 0.557 0.557 0.557 0.390 0.391 0.409
Immigrant women 1980 0.343 0.343 0.343 0.291 0.291 0.296
1990 0.414 0.413 0.413 0.318 0.317 0.330
2000 0.484 0.484 0.484 0.367 0.369 0.380

If I then export the SAS dataset that generated the variances above and use it for generating Table 6, I get a Table 6 that:

  • it's different from our previous table (in Issue #9)
  • it's even more different from Card's original table
  • but then the SAS and Stata tables agree (indicating that most differences between our results in Stata and SAS were coming from the script that generates the residuals and variances).

The table 6 we get from both SAS and Stata can be found below. The tables were generated by Stata, but the equivalent results in SAS can be found in this link:

  • OLS estimates for High School: The coefficient estimates and R2 agree exactly between SAS and Stata. The standard errors are different.
    • Column 1 regression results start at line 595 (R2 at line 505 and coefficient at line 516)
    • Column 2 regression results start at line 568 (R2 at line 578 and coefficient at line 589)
  • IV estimates for High School:
    • Column 3 regression results start at line 1243.
      • Coefficient estimates don't agree exactly but very close.
      • Stata R2 is 0.203, SAS R2 is 0.145
      • The 1st stage of column Column 3 starts at line 818. Stata 1st stage t-statistic is 5.53, SAS 1st stage t-statistic is 7.87.
------------------------------------------------------------------------------------
                                1               2               3               4   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.030***       -0.030***       -0.036***       -0.036***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.095***       -0.094***       -0.104***       -0.106***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.101***        0.100***        0.113***        0.114***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980          0.099***        0.098***        0.121***        0.124***
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990         -0.007          -0.006          -0.014          -0.016   
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.135***        0.137***        0.141***        0.136***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res imm 1980          -0.160***       -0.162***       -0.169***       -0.164***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Mfg share in 1980          -0.225***       -0.226***       -0.258***       -0.256***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990           0.194***        0.195***        0.227***        0.223***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                              0.004                          -0.009***
                                           (0.00)                          (0.00)   
constant                   -0.128***       -0.127***       -0.159***       -0.160***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.210           0.210           0.203           0.203   
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
                                5               6               7               8   
                             b/se            b/se            b/se            b/se   
------------------------------------------------------------------------------------
Log rel supply imm~e       -0.058***       -0.054***       -0.078***       -0.072***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1980          -0.040***       -0.039***       -0.058***       -0.054***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Log msa size 1990           0.031***        0.034***        0.053***        0.052***
                           (0.00)          (0.00)          (0.00)          (0.00)   
College share 1980         -0.055***       -0.114***       -0.003          -0.052** 
                           (0.02)          (0.02)          (0.02)          (0.02)   
College share 1990         -0.022           0.045***       -0.045***        0.005   
                           (0.01)          (0.01)          (0.01)          (0.01)   
Wage res native 1980        0.309***        0.363***        0.338***        0.371***
                           (0.00)          (0.01)          (0.01)          (0.01)   
Wage res imm 1980          -0.224***       -0.287***       -0.248***       -0.288***
                           (0.00)          (0.00)          (0.00)          (0.00)   
Mfg share in 1980          -0.373***       -0.422***       -0.377***       -0.410***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Mfg share in 1990           0.499***        0.546***        0.484***        0.518***
                           (0.01)          (0.01)          (0.01)          (0.01)   
Lagged dep var                              0.137***                        0.095***
                                           (0.00)                          (0.00)   
constant                   -0.061***       -0.085***       -0.139***       -0.143***
                           (0.00)          (0.00)          (0.00)          (0.00)   
------------------------------------------------------------------------------------
r2                          0.386           0.401           0.356           0.379   
------------------------------------------------------------------------------------

Many data checks using the .lst files

When dealing with SAS, we have two important files: the .sas and the .lst files. The .sas files are the scripts. The .lst files store any results that were printed while running the script. Thus, .lst saves the results from "PROC MEANS", "PROC PRINT", "POROC GLM", etc. Luckily, we have (almost) all of Card's .lst files. Thus, we can compare our results after running each single script by comparing the .lst files.

Take aways

  • 1990 is the most different year. I think something changed with this dataset since Card used it. In 1980 and 2000, we have exactly the same number of immigrants as Card does (but more natives). In this dataset, both the number of immigrants and natives are different from Card's. The summary statistics are also a bit different, while the summary statistics from 1980 and 2000 look the same as Card's.

Data checks by yea using the .lst files

  • 1980
Us Card
np2 link link
allnp2 link link
cell1 link link
t1 link link
supply1 link link
imm1 link link
indist link link
  • 1990
Us Card
np2 link link
allnp2 link link
cell1 link link
t1 link link
supply1 link link
imm1 link link
indist link link
  • 2000
Us Card
np2 link link
allnp2 link link
cell1 link link
t1 link link
supply1 link link
imm3 link link
imm2 link link
inflow3 link link

Tables using ICPSR in SAS

Table 2

Looks exactly the same as in Stata. For the Stata version, see #9.

Note that ic = -2 refers to everybody (natives + immigrants), ic = -1 refers to natives and ic = 0 refers to immigrants. And below is the dict for the remaining ic:

1 "mexico"
2 "phillipines"
3 "india"
4 "vietnam"
5 "el salvador"
6 "china"
7 "cuba"
8 "dominican republic"
9 "korea"
10 "jamaica"
11 "canada"
12 "colombia"
13 "guatemala"
14 "germany"
15 "haiti"
16 "poland"

image

Table 3

First 4 columns of Table 3 (I am omitting 2005/2006 because it uses ACS data, and not ICPSR data).
The first 4 columns look exactly the same as the corresponding columns in the Stata table found in #9.

The relevant observations are in rows 10-21.

image

Pre-trends

For each of the plots below, I do the following:

  1. Run one regression for each year (1980, 1990, 2000) and store the coefficient on the instrument. For each regression

    • the dependent variable is the appropriate one (either difference in mean wage residuals for HS-equivalent workers or for College-equivalent workers)
    • the set of controls is the same across all regressions. Note: None of the regressions includes the lagged dependent variable in the set of controls
    • for the "country" regressions, the instrument is the share of immigrants from that country living in location l in 1980
    • for the "aggregate bartik" regressions, the instrument is the appropriate one (either predicted inflow of HS-equivalent workers or of College-equivalent workers)
  2. Plot the coefficients on the instrument.

Note: The x-axis is showing 5-year intervals, but we only have data for 1980, 1990, and 2000. I will fix that.

High School equivalent workers

image image
image image
image image

College equivalent workers

image image
image image
image image

Data cleaning

1. Defining who is immigrant

Card:
Defines as immigrant people who were naturalized citizen or who are still not citizens.

citizen='0=us born 1=nat 2=not cit 3=born abroad us parents'
imm=(citizen in (1,2))

Victoria:
Card + 4 + 5

 /* CITIZEN:
           0 n/a
           1 born abroad of american parents
           2 naturalized citizen
           3 not a citizen
           4 not a citizen, but has received first papers
           5 foreign born, citizenship status not reported
*/
gen imm = .
replace imm = 1 if citizen == 2 | citizen == 3 | citizen == 4 | citizen == 5

2. Hours worked last year

Card:
His data seems to have the exact number of weeks people worked last year. His code is the following:

annhrs=weeks*hrswkly;

That is, total annual hours = weeks * weekly hours

Victoria:
The Bartik data has multiple bins for number of weeks worked last year.

WKSWORK2:
           0 n/a
           1 1-13 weeks
           2 14-26 weeks
           3 27-39 weeks
           4 40-47 weeks
           5 48-49 weeks
           6 50-52 weeks

Thus, I am currently getting an average of those.

gen weeks = .
replace weeks = 0 if wkswork2 == 0
replace weeks = 7 if wkswork2 == 1 
replace weeks = 20 if wkswork2 == 2
replace weeks = 33 if wkswork2 == 3
replace weeks = 43.5 if wkswork2 == 4
replace weeks = 48.5 if wkswork2 == 5
replace weeks = 51.5 if wkswork2 == 6

3. Education labels

Card: Census data has exactly one category for each grade and it also has information on whether the person completed the grade.

GRADE             2     40                                                 
                            Highest Year of School                              
                              Attended                                          
                  00        Never attended school or N/A (under 3               
                              years of age)                                     
                  01        Nursery school                                      
                  02        Kindergarten                                        
                            Elementary:                                         
                  03          First grade                                       
                  04          Second grade                                      
                  05          Third grade                                       
                  06          Fourth grade                                      
                  07          Fifth grade                                       
                  08          Sixth grade                                       
                  09          Seventh grade                                     
                  10          Eighth grade                                      
                            High school:                                        
                  11          Ninth grade                                       
                  12          Tenth grade                                       
                  13          Eleventh grade                                    
                  14          Twelfth grade                                     
                            College:                                            
                  15          First year                                        
                  16          Second year                                       
                  17          Third year                                        
                  18          Fourth year                                       
                  19          Fifth year                                        
                  20          Sixth year                                        
                  21          Seventh year                                      
                  22          Eighth year or more

Victoria:
Bartik data has too many categories and the numbers don't really add up:

EDUCD:
           0 n/a or no schooling
           1 n/a
           2 no schooling completed
          10 nursery school to grade 4
          11 nursery school, preschool
          12 kindergarten
          13 grade 1, 2, 3, or 4
          14 grade 1
          15 grade 2
          16 grade 3
          17 grade 4
          20 grade 5, 6, 7, or 8
          21 grade 5 or 6
          22 grade 5
          23 grade 6
          24 grade 7 or 8
          25 grade 7
          26 grade 8
          30 grade 9
          40 grade 10
          50 grade 11
          60 grade 12
          61 12th grade, no diploma
          62 high school graduate or ged
          63 regular high school diploma
          64 ged or alternative credential
          65 some college, but less than 1 year
          70 1 year of college
          71 1 or more years of college credit, no degree
          80 2 years of college
          81 associate's degree, type not specified
          82 associate's degree, occupational program
          83 associate's degree, academic program
          90 3 years of college
         100 4 years of college
         101 bachelor's degree
         110 5+ years of college
         111 6 years of college (6+ in 1960-1970)
         112 7 years of college
         113 8+ years of college
         114 master's degree
         115 professional degree beyond a bachelor's degree
         116 doctoral degree
         999 missing

To see what I mean by "they don't really add up", consider:

educational attainment [detailed version] Freq. Percent Cum.
grade 1, 2, 3, or 4 59,078 35.54 35.54
grade 1 9,674 5.82 41.36
grade 2 21,079 12.68 54.04
grade 3 37,034 22.28 76.32
grade 4 39,371 23.68 100.00
Total 166,236 100.00

4. Income measures

Card:

  • wagesal: Wage or Salary Income (INCOME1 in the 1980n Census variable dictionary)
  • selfinc: Nonfarm Self-Employment Income (INCOME2 in the 1980 Census variable dictionary)
  • farminc: Farm Self-Employment Income (INCOME3 in the 1980 Census variable dictionary)
  • income: Income From All Sources (INCOME8 in the 1980 Census variable dictionary)

Then he defines self-employed as anyone who has a positive (selfinc + farminc)

Victoria:

  • inctot: total personal income
  • ftotinc: total family income
  • incwage: wage and salary income
  • incbus00: business and farm income, 2000
  • incearn: total personal earned income
Variable Obs Mean Std. Dev. Min Max
inctot 21,864,217 23201.98 34290.62 -20000 1471000
ftotinc 21,864,217 249646.6 1404037 -30000 9999999
incwage 21,864,217 19101.37 19101.37 0 641000
incbus00 10,986,023 2195.484 15365.94 -10000 573000
incearn 16,810,374 24123.92 35430.46 -19996 1146000

Bartik dataset has no measure of self-employed earnings, so I will use this other variable to define self-employment:

CLASSWKRD:
           0 n/a
          10 self-employed
          11 employer
          12 working on own account
          13 self-employed, not incorporated
          14 self-employed, incorporated
          20 works for wages
          21 works on salary (1920)
          22 wage/salary, private
          23 wage/salary at non-profit
          24 wage/salary, government
          25 federal govt employee
          26 armed forces
          27 state govt employee
          28 local govt employee
          29 unpaid family worker

https://www.dropbox.com/s/8jiij8ntdq1lcau/Screenshot%202019-01-16%2014.56.49.png?dl=0

5. Country codes - grouping into 38 groups

Card: The country codes used by Card can be found in Appendix F of the Codebook for the 1980 5% extracts, available from ICPSR.

He groups countries into 38 groups:

mexico
phillip
india
vietnam
el salvador
china
cuba
dominican rep. 
korea
jamaica
canada
columbia
guatemala
germany
haiti 
poland
taiwan
england
italy
ecuador
japan
iran
honduras
peru
russia
nicaragua
guyana
pakistan
hong kong
trinidad-tobago
west europe+isreal+cyprus+auss+nz
east europe incl romania ukraine yugoslav
middle east turkey bulgaria and the stans
asia and oceana
s america + north am nec
africa
caribbean + central am
else 

Somewhat unrelated note: Later on, Card creates even broader categories of countries (e.g., european, high asia, mid asia, mexico), and he includes Canada in the european group, Pakistan and Iran in the high asia group,

Victoria:
Issue: The Bartik dataset doesn't have 15/38 groups used by Card:

el salvador
dominican rep. 
jamaica 
colombia 
guatemala 
haiti
taiwan
ecuador
honduras 
peru
nicaragua
guyana
pakistan
hong kong
trinidad-tobago

For these groups, instead of using a person's place of birth I use whether the person is an immigrant combined with her primary ancestry (using the variable ancestr1). So if a person is an immigrant and her first response for ancestry is "salvadoran", I count her as having been born in El Salvador. This is of course not perfect, since some immigrants report being born in a country different than the ancestor.

6. Years in the US

Card:
Census data has the immigration year. So the 1980 Census, for example, has a variable that looks like

     IMMIGR            1     26                                                 
                            Year of Immigration                                 
                   0        N/A (born in the United States or                   
                              outlying areas or born abroad of                  
                              American parents)                                 
                   1        1975 to 1980                                        
                   2        1970 to 1974                                        
                   3        1965 to 1969                                        
                   4        1960 to 1964                                        
                   5        1950 to 1959                                        
                   6        Before 1950  

So he approximates how many years the person has been in the U.S. using that variable. This allows him to distinguish between people who have been in the U.S. for 20+ years vs. 40+ years.

if immyr=1 then yrsinus=2.5;
else if immyr=2 then yrsinus=7.5;
else if immyr=3 then yrsinus=12.5;
else if immyr=4 then yrsinus=17.5;
else if immyr=5 then yrsinus=25.5;
else if immyr=6 then yrsinus=40;
else yrsinus=.;

Victoria:
The Bartik data, on the other hand, only says if the person has been in the US for 21+ years, so don't have the same level of granularity and I am not sure how many years to put. Note: For each obs, we have the person's date of birth, so maybe we can use that to approx how many years in the U.S.

Right now, I am using 30 years for anyone who has been in the U.S. for 21+ years.

YRSUSA2:
           0 n/a
           1 0-5 years
           2 6-10 years
           3 11-15 years
           4 16-20 years
           5 21+ years
           9 missing

CZs not a unique identifier

Today in the meeting we saw that I should be merging files with only one observation per CZ. However, some files seem to have by construction more than one obs per CZ.

Just to document a simple example.

The cell1 script in the 2000 folder collapses the data by the variables: rczone, native, male, eclass and xclass2. The resulting dataset is called bigcells.dta

In the table6 script, we load the bigcells dataset and for each value that eclass takes, we keep only the male and native observations. However, rczone is not yet a unique identifier since there are 4 values for xclass2 for each obs that is male, native, and in a specific eclass.

xclass2 is an experience variable and eclass is an education variable. Each of them has 4 categories.

Variable definitions in Card's code:

if educ<12 then eclass=1;
else if educ=12 then eclass=2;
else if educ<16 then eclass=3;
else eclass=4;
if exp<=10 then xclass2=1;
else if exp<=20 then xclass2=2;
else if exp<=30 then xclass2=3;
else xclass2=4;
c=1;

Below are snippets of the code that saves bigcells and also the ones that use bigcells

Script cell1 saves bigcells

SAS code

proc summary;
class rmsa native male eclass xclass2;
var logwage2 lw2sq res ressq pred predsq respred imm female educ exp c
         dropout hs somecoll collplus college advanced    ;
output out=here.bigcells
mean=
sum(c)=count;
weight wt;

Script table6 uses bigcells

SAS code

*this macro gets native wages by eclass;
%macro nwage(ed);
%let edg=&ed;
data nw&edg;
set here.bigcells;
if native=1 and male=1 and eclass=&edg and xclass2=. ;
nwage&edg=logwage2;
nres&edg=res;
npred&edg=pred;
ncountw&edg=count / 1000;
keep rmsa nwage&edg npred&edg nres&edg ncountw&edg;
proc sort; by rmsa;
%mend;

Stata code

use data/2000/bigcells.dta, clear
local education_groups 1 2 3 4 
forv i=1/4{
	local edg : word `i' of `education_groups'
	preserve
	keep if native == 1 & male == 1 & eclass == `edg' 
	gen nwage`edg' = logwage2
	gen nres`edg' = res
	gen npred`edg' = pred
	gen ncountw`edg' = count/1000
	keep rczone nwage`edg' npred`edg' nres`edg' ncountw`edg' 
	sort rczone
	save data/2000/nw`edg'.dta, replace
	restore
}
           |                   eclass
   xclass2 |         1          2          3          4 |     Total
-----------+--------------------------------------------+----------
         1 |   175,066    464,765    295,094    292,924 | 1,227,849 
         2 |   155,468    529,215    302,775    315,280 | 1,302,738 
         3 |   152,129    526,231    283,651    309,079 | 1,271,090 
         4 |   153,342    442,467    169,002    166,318 |   931,129 
-----------+--------------------------------------------+----------
     Total |   636,005  1,962,678  1,050,522  1,083,601 | 4,732,806 

Data checks

First attempt at replicating Table 1

Working age population Share of US population Pct Immigrant Pct Hispanic Pct Minority Pct Dropout Pct High shchool Pct Some college Pct college or more Mean wage2
All US 206238 100.1 12 11 25 15 42 22 22 66.73
Larger czones (top 100) 115544 56.1 17 14 32 14 38 22 26 66.355
Rest of country 90694 44 4.8 6.9 16 15 46 22 17 67.209
1st largest czone 8820 4.3 41 39 58 22 33 22 23 78.18
2nd largest czone 6544 3.2 37 22 51 16 36 19 30 85.555
3rd largest czone 4510 2.2 23 17 41 14 35 21 30 66.851
4th largest czone 3245 1.6 29 17 39 11 37 18 34 69.28
5th largest czone 3021 1.5 8.8 5.9 29 11 42 19 28 67.813
6th largest czone 2989 1.5 8.2 2.9 26 11 41 24 24 67.921
7th largest czone 2906 1.4 16 6.4 16 9.1 34 20 37 62.88
8th largest czone 2821 1.4 22 9.3 42 10 30 19 42 58.961
9th largest czone 2657 1.3 33 17 46 11 28 24 37 69.416
10th largest czone 2643 1.3 25 27 49 20 36 20 24 71.92
11th largest czone 2527 1.2 13 6.7 39 13 36 21 30 61.567
12th largest czone 2238 1.1 13 5 17 8.1 35 27 30 62.136

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.