immigration_enclave's People
Forkers
paulgpimmigration_enclave's Issues
Regressions
Just want to let you know that I don’t have the regressions yet because there’s something strange in the regressions to get the residuals:
If I do:
use data/1980/nm.dta, clear
sort eclass xclass nonmover
by eclass xclass: reg logwage2 exp exp2 exp3 educ eclass#xclass inschool advanced ft lowhrs hisp_ed hisp_coll black_ed black_coll asian_ed asian_coll nonmover#eclass rczone0 [fweight=wt]
then the display window says:
-> eclass = 2, xclass = 1
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 2
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 3
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 4
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 5
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 6
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 7
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 8
no observations
-------------------------------------------------------------------------------------------------------------------------------------------------------------
-> eclass = 2, xclass = 9
no observations
But if I tab those variables, I get:
| xclass
eclass | 1 2 3 4 5 6 7 8 9 | Total
-----------+---------------------------------------------------------------------------------------------------+----------
1 | 74,627 50,389 38,034 38,223 39,695 40,478 47,688 57,422 51,186 | 437,742
2 | 135,282 105,828 82,566 70,376 59,073 50,111 45,000 41,862 28,709 | 618,807
3 | 126,470 118,687 82,837 50,836 38,542 33,795 30,174 22,995 12,400 | 516,736
4 | 23,032 34,213 26,976 18,423 14,979 12,798 9,959 6,420 3,265 | 150,065
-----------+---------------------------------------------------------------------------------------------------+----------
Total | 359,411 309,117 230,413 177,858 152,289 137,182 132,821 128,699 95,560 | 1,723,350
So I don't know what's going on, and the residuals are very low (like .1, and not .3)
Tables using ICPSR in Stata
Table 2
Looks exactly the same as Card's.
Country of origin | Working age population (thousands) | Share of all Immigrants (percent) | After 1980 | After 1990 | Mean years completed | Dropouts | 12-15 years | College or more |
---|---|---|---|---|---|---|---|---|
Natives | 141,475 | 13.3 | 14.2 | 60.6 | 25.2 | |||
Immigrants | 23,627 | 100 | 70.5 | 39.9 | 11.6 | 37.4 | 38.8 | 23.8 |
Mexico | 7,267 | 30.8 | 75.1 | 43.8 | 8.6 | 69.8 | 26.5 | 3.7 |
Philippines | 1,078 | 4.6 | 66.1 | 31.5 | 14.1 | 9.2 | 43.7 | 47 |
India | 838 | 3.5 | 78.4 | 51.4 | 15.6 | 9.6 | 20.2 | 70.2 |
Vietnam | 806 | 3.4 | 75.3 | 39.7 | 11.7 | 34.6 | 45.8 | 19.6 |
China | 715 | 3 | 82 | 50.1 | 13.6 | 24.2 | 29.2 | 46.7 |
El Salvador | 698 | 3 | 85.1 | 37 | 8.9 | 65 | 30.6 | 4.4 |
Korea | 664 | 2.8 | 66.4 | 33.1 | 14 | 10.6 | 45.8 | 43.6 |
Cuba | 586 | 2.5 | 52.3 | 29.1 | 12.5 | 30 | 48.3 | 21.7 |
Dominican Republic | 536 | 2.3 | 74.2 | 38.1 | 10.8 | 48.8 | 41.9 | 9.3 |
Canada | 517 | 2.2 | 47.6 | 31.9 | 14.3 | 8.9 | 49.8 | 41.3 |
Germany | 455 | 1.9 | 32.6 | 21 | 13.9 | 8.3 | 59.3 | 32.4 |
Jamaica | 429 | 1.8 | 66.7 | 27.3 | 12.6 | 23.8 | 57.8 | 18.4 |
Colombia | 400 | 1.7 | 71.9 | 40.5 | 12.5 | 24.7 | 53.3 | 21.9 |
Guatemala | 400 | 1.7 | 84 | 45.9 | 8.8 | 64.5 | 30.4 | 5.1 |
Haiti | 333 | 1.4 | 75.1 | 34.5 | 11.8 | 35.2 | 51.3 | 13.5 |
Poland | 310 | 1.3 | 74.5 | 42.3 | 13.3 | 16.3 | 58.2 | 25.6 |
Table 3
Imm status/gender | Year | Education | Experience | Employment rate( %) | Mean wage | Overall Variance (log wage) | Residual |
---|---|---|---|---|---|---|---|
Native men | 1980 | 12.5 | 18.8 | 90.2 | 25.07 | 0.379 | 0.283 |
1990 | 13.0 | 18.9 | 89.3 | 23.90 | 0.452 | 0.319 | |
2000 | 13.2 | 20.4 | 86.8 | 25.84 | 0.486 | 0.358 | |
Native women | 1980 | 12.2 | 19.7 | 65.4 | 16.73 | 0.315 | 0.267 |
1990 | 12.8 | 19.4 | 74.9 | 17.07 | 0.381 | 0.294 | |
2000 | 13.3 | 20.7 | 77.1 | 19.52 | 0.408 | 0.320 | |
Immigrant men | 1980 | 11.6 | 19.1 | 87.5 | 24.49 | 0.435 | 0.327 |
1990 | 11.4 | 18.1 | 87.1 | 21.83 | 0.513 | 0.370 | |
2000 | 11.6 | 18.8 | 86.5 | 23.21 | 0.557 | 0.409 | |
Immigrant women | 1980 | 11.0 | 20.6 | 60.0 | 17.15 | 0.342 | 0.295 |
1990 | 11.2 | 19.9 | 65.1 | 16.96 | 0.413 | 0.331 | |
2000 | 11.7 | 20.0 | 64.8 | 19.27 | 0.484 | 0.381 |
Table 6
Columns (1)-(4) are still a bit strange, especially because the R2 is lower and the sign for the lagged dependent variable is negative (instead of positive as in Card's paper). Columns (5)-(8) are pretty close.
------------------------------------------------------------------------------------
1 2 3 4
b/se b/se b/se b/se
------------------------------------------------------------------------------------
Log rel supply imm~e -0.020*** -0.020*** -0.025*** -0.026***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1980 -0.076*** -0.077*** -0.083*** -0.086***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1990 0.077*** 0.078*** 0.086*** 0.089***
(0.00) (0.00) (0.00) (0.00)
College share 1980 0.078*** 0.083*** 0.091*** 0.102***
(0.02) (0.02) (0.02) (0.02)
College share 1990 -0.044*** -0.050*** -0.048*** -0.062***
(0.01) (0.01) (0.01) (0.01)
Wage res native 1980 0.139*** 0.134*** 0.137*** 0.125***
(0.00) (0.00) (0.00) (0.01)
Wage res imm 1980 -0.166*** -0.160*** -0.166*** -0.152***
(0.00) (0.00) (0.00) (0.00)
Mfg share in 1980 -0.049*** -0.038*** -0.080*** -0.057***
(0.01) (0.01) (0.01) (0.01)
Mfg share in 1990 -0.055*** -0.070*** -0.023 -0.056***
(0.01) (0.01) (0.01) (0.01)
Lagged dep var -0.012*** -0.026***
(0.00) (0.00)
constant -0.054*** -0.053*** -0.078*** -0.079***
(0.00) (0.00) (0.00) (0.00)
------------------------------------------------------------------------------------
r2 0.145 0.145 0.140 0.139
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
5 6 7 8
b/se b/se b/se b/se
------------------------------------------------------------------------------------
Log rel supply imm~e -0.055*** -0.048*** -0.076*** -0.066***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1980 -0.021*** -0.023*** -0.039*** -0.036***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1990 0.011*** 0.020*** 0.035*** 0.035***
(0.00) (0.00) (0.00) (0.00)
College share 1980 -0.177*** -0.176*** -0.129*** -0.141***
(0.02) (0.02) (0.02) (0.02)
College share 1990 0.047*** 0.085*** 0.025 0.059***
(0.01) (0.01) (0.01) (0.01)
Wage res native 1980 0.214*** 0.323*** 0.214*** 0.295***
(0.01) (0.01) (0.01) (0.01)
Wage res imm 1980 -0.132*** -0.252*** -0.125*** -0.217***
(0.00) (0.00) (0.00) (0.01)
Mfg share in 1980 -0.367*** -0.410*** -0.388*** -0.415***
(0.01) (0.01) (0.01) (0.01)
Mfg share in 1990 0.495*** 0.540*** 0.493*** 0.527***
(0.01) (0.01) (0.01) (0.01)
Lagged dep var 0.183*** 0.137***
(0.00) (0.00)
constant -0.048*** -0.081*** -0.132*** -0.135***
(0.00) (0.00) (0.00) (0.00)
------------------------------------------------------------------------------------
r2 0.344 0.373 0.313 0.355
------------------------------------------------------------------------------------
Over ID Tests
Hi Isaac,
In the meeting we talked about the Over ID test.
Could you please confirm that our Over ID test would look like the below?
ivregress 2sls `y' `controls' (`x'= shric*) [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"
where y
is the difference in wages, the controls
are the same as Card's controls, x
is the relative share of natives and imm, and the shric*
is the share of imm fixed in 1980 (for all countries by city).
If so, the Over ID test does not reject the null (p = 0.2204).
This is for High School workers.
Since you did not write the code, I am also copying below the equivalent piece of code for the canonical Bartik and the ADH example.
Canonical:
ivregress 2sls `y' `controls' czone_* year_* (`x'= t1990_init_sh_ind_* t2000_init_sh_ind_* ) [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"
ADH:
ivregress 2sls `y' `controls' i.t2 (`x'= t1990_sh_ind_2011-t1990_sh_ind_3931 t2000_sh_ind_2011-t2000_sh_ind_3931) [aweight=`weight'], vce(robust)
estat overid, forceweights
local J_2sls = string(r(score), "%12.2f")
local Jp_2sls = "[" + string(r(p_score), "%12.2f") + "]"
Bartik weights
Hi @econisaac,
Could you please take a look at this and make sure it makes sense/agrees with what we talked about in the meeting on Tuesday?
Thanks,
Victoria
Dataset
To calculate the Bartik weights, I created a dataset of the form:
- 124 observations (one for each MSA).
- 130 variables:
rmsa
- Dependent variables (2 variables):
- For high school-equivalent workers:
resgap2
: Difference between the mean wage residuals of HS-equivalent immigrant and native workers in each MSA in 2000.
- For college-equivalent workers:
resgap4
: Difference between the mean wage residuals of college-equivalent immigrant and native workers in each MSA in 2000.
- For high school-equivalent workers:
- Instruments (38 variables):
shric1
-shric38
: This corresponds to one variable for each of the 38 countries. It is the fraction of earlier immigrants from countryk
who lived in locationl
in 1980. For example, the variableshric1
takes on 124 distinct values, giving us the share of immigrants from Mexico that lived in each of the top 124 MSAs in 1980.
In order to agree with Card's instrument, we divide the variables by the city population in 2000. So the values ofshric1
-shric38
for each row are divided by the corresponding MSA's population in 2000.
- Regression controls (12 variables):
logsize80
: Log city size in 1980logsize90
: Log city size in 1990coll80
: Share of the MSA population with college in 1980coll90
: Share of the MSA population with college in 1990nres80
: Mean wage residuals for natives living in the MSA in 1980ires80
: Mean wage residuals for immigrants living in the MSA in 1980nres90
: Mean wage residuals for natives living in the MSA in 1990ires90
: Mean wage residuals for immigrants living in the MSA in 1990mfg80
: Share of MSA workers in manufacturing in 1980mfg90
: Share of MSA workers in manufacturing in 1990resgap902
: Lagged dependent variable for high school-equivalent workers. Currently not included in the list of controls.resgap904
: Lagged dependent variable for college-equivalent workers. Currently not included in the list of controls.
- "Growth rates" (76 variables):
- For high school equivalent workers (38 variables):
hs_imm_ic1
-hs_imm_ic38
: Number of HS-equivalent immigrant workers who arrived in the US between 1990 and 2000. One variable for each country ic1-ic38. This variables are constant across MSAs, so their values are repeated for all of the 124 observations.
- For college equivalent workers (38 variables):
coll_imm_ic1
-coll_imm_ic38
: Number of college-equivalent immigrant workers who arrived in the US between 1990 and 2000. One variable for each country ic1-ic38. This variables are constant across MSAs, so their values are repeated for all of the 124 observations.
- For high school equivalent workers (38 variables):
- Regression weight (1 variable):
- count90: MSA population in 1990
Code for weights
I am going to have two scripts, one for the high school-equivalent workers and one for the college-equivalent workers. I am copying below the code for HS-equivalent workers.
set seed 12345
use data/prepared_bartik.dta, clear
local controls logsize80 logsize90 coll80 coll90 ires80 nres80 mfg80 mfg90
local weight count90
local y resgap2
local x relshs
*local z shric*
local ind_stub shric*
local growth_stub hs_imm_ic*
* local time_var year
local cluster_var rmsa
foreach ind_var of varlist `ind_stub'* {
replace `ind_var' = `ind_var' * 100
}
forvalues k = 1(1)38 {
egen agg_sh_ind_`k' = rowtotal(shric`k')
}
bartik_weight, z(`ind_stub'*) weightstub(`growth_stub'*) x(`x') y(`y') controls(`controls') weight_var(`weight')
Data construction for Card
1980
-
read80.do
- reads the state-specific files of the 1980 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output isall80.dta
. Takes as input:i. Census of Population and Housing, 1980 [United States]: Public Use Microdata Sample (A Sample): 5-Percent Sample (ICPSR 8101). Download it here.
-
read_all80.sas
, which createsall80.sas7bdat
. Takes as inputall80.dta
. -
Run the scripts provided by Card.
i.np2.sas
- creates a working data set of wage-earners age 18+, with recodes, etc. This isnp80.sas7bdat
. These data are used to build wage outcomes. Takes as inputall80.sas7bdat
. *reads the code insmsarecode80.sas
to re-code msa's.ii.
allnp2.sas
- creates a working data set of EVERYONE age 18+, with recodes, etc. This issupp80.sas7bdat
. These data are used to build supply variables. Takes as inputall80.sas7bdat
. *reads the code insmsarecode80.sas
to re-code msa's.iii.
cell1.sas
- creates a big summary of data by cell ==>bigcells.sas7bdat
. Takes as inputnp80.sas7bdat
.iv.
t1.sas
- creates a big summary of data by cell ==>allcells.sas7bdat
. Takes as inputsupp80.sas7bdat
.v.
supply1.sas
- gets supply measures ==>cellsupply.sas7bdat
. Takes as inputnp80.sas7bdat
.vi.
imm1.sas
- gets counts of immigrants by sending country in each city ==>ic_city.sas7bdat
(IC is Card's classification of sending countries). Takes as input `supp80.sas7bdat.vii.
indist.sas
- gets fraction of workers in MFG by city. Takes as inputnp80.sas7bdat
. -
Export some datasets to Stata:
i.cell1_to_stata.sas
- creates datasets on wages of immigrants and natives by education class. Exports them to Stata (1980_bigcells_new1.dta
,1980_bigcells_new2.dta
,nw80.dta
,iw80.dta
,nw801.dta
,nw802.dta
,nw803.dta
,nw804.dta
,iw801.dta
,iw802.dta
,iw803.dta
,iw804.dta
). Takes as inputbigcells.sas7bdat
.ii.
t1_to_stata.sas
- creates1980_allcells_new2.dta
. Takes as inputallcells.sas7bdat
iii.
indist_to_stata.sas
- creates1980_mfg.dta
. Takes as inputmfg.sas7bdat
1990
-
read90.do
- reads the state-specific files of the 1990 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output isall90.dta
. Takes as input:i. Census of Population and Housing, 1990 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 9952). Download it here.
-
read_all90.sas
, which createsall90.sas7bdat
. Takes as inputall90.dta
. -
Run the scripts provided by Card.
i.np2.sas
- creates a working data set of wage-earners age 18+, with recodes, etc. This isnp90.sas7bdat
. These data are used to build wage outcomes. Takes as inputall90.sas7bdat
. *reads the code insmsarecode90.sas
to re-code msa's.ii.
allnp2.sas
- creates a working data set of EVERYONE age 18+, with recodes, etc. This issupp90.sas7bdat
. These data are used to build supply variables. Takes as inputall90.sas7bdat
. *reads the code insmsarecode90.sas
to re-code msa's.iii. cell1.sas - creates a big summary of data by cell ==>
bigcells.sas7bdat
. Takes as inputnp90.sas7bdat
.iv.
t1.sas
- creates a big summary of data by cell ==>allcells.sas7bdat
. Takes as inputsupp90.sas7bdat
.v.
supply1.sas
- gets supply measures ==>cellsupply.sas7bdat
. Takes as inputnp90.sas7bdat
.vi.
imm1.sas
- gets counts of immigrants by sending country in each city ==>ic_city.sas7bdat
(IC is Card's classification of sending countries). Takes as input `supp90.sas7bdat.vii.
indist.sas
- gets fraction of workers in MFG by city. Takes as inputnp90.sas7bdat
. -
Export some datasets to Stata:
i.cell1_to_stata.sas
- creates datasets on wages of immigrants and natives by education class. Exports them to Stata (1990_bigcells_new1.dta
,1990_bigcells_new2.dta
,nw90.dta
,iw90.dta
,nw901.dta
,nw902.dta
,nw903.dta
,nw904.dta
,iw901.dta
,iw902.dta
,iw903.dta
,iw904.dta
). Takes as inputbigcells.sas7bdat
.ii.
t1_to_stata.sas
- creates1990_allcells_new2.dta
. Takes as inputallcells.sas7bdat
iii.
indist_to_stata.sas
- creates1990_mfg.dta
. Takes as inputmfg.sas7bdat
2000
-
read2000.do
- reads the state-specific files of the 2000 5% extracts (available from ICPSR), does minimal data cleaning, merges all state-specific files. The output isall2000.dta
. Takes as input:i. Census of Population and Housing, 2000 [United States]: Public Use Microdata Sample: 5-Percent Sample (ICPSR 13568). Download it here.
-
read_all2000.sas
, which createsall2000.sas7bdat
. Takes as inputall2000.dta
. -
Run the scripts provided by Card.
i.np2.sas
- creates a working data set of wage-earners age 18+, with recodes, etc. This isnp2000.sas7bdat
. These data are used to build wage outcomes. Takes as inputall2000.sas7bdat
.ii.
allnp2.sas
- creates a working data set of EVERYONE age 18+, with recodes, etc. This issupp2000.sas7bdat
. These data are used to build supply variables. Takes as inputall2000.sas7bdat
.iii.
cell1.sas
- creates a big summary of data by cell ==>bigcells.sas7bdat
. Takes as inputnp2000.sas7bdat
.iv.
t1.sas
- creates a big summary of data by cell ==>allcells.sas7bdat
. Takes as inputsupp2000.sas7bdat
.v.
supply1.sas
- gets supply measures ==>cellsupply.sas7bdat
. Takes as inputnp2000.sas7bdat
.vi.
imm3.sas
- gets counts of immigrants by sending country in each city ==>ic_citynew.sas7bdat
(IC is Card's classification of sending countries). Takes as inputsupp2000.sas7bdat
.vii.
imm2.sas
- gets a count of immigrants present in 2000 by IC - this is used to construct the instrumental variable ==>byicnew.sas7bdat
. Takes as inputsupp2000
.viii.
inflow3.sas
- constructs the supply push instrument by "education and experience cell" and city. This isnewflows.sas7bdat'. Takes as input
ic_city.sas7bdat(output of
imm1.sas' in 1980) andbyicnew.sas7bdat
(output ofimm2.sas
in 2000). -
Export some datasets to Stata:
i.cell1_to_stata
- creates datasets on wages of immigrants and natives by education class. Exports them to Stata (2000_bigcells_new1.dta
,2000_bigcells_new2.dta
,nw.dta
,iw.dta
,nw.dta
,nw.dta
,nw.dta
,nw.dta
,iw.dta
,iw.dta
,iw.dta
,iw.dta
). Takes as inputbigcells.sas7bdat
.ii.
t1_to_stata
- creates2000_allcells_new1.dta
and2000_allcells_new2.dta
. Takes as inputallcells.sas7bdat
.iii.
inflow3_to_stata
- exports `newflows.sas7bdat' to dta.
Replicate Table 6 of Card (2009)
table6.do
- replicates Table 6 of Card (2009) and constructs the datasetinput_card.dta
. Takes as input the Stata datasets exported from SAS (cited above) for 1980, 1990, and 2000.
More comparisons
The main directories for this exercise are:
- Card's original code and lst files: here
- Our SAS code and lst files: here. Our code is a very slight modification of Card's code to adjust paths and things like that. We run the code using our dataset downloaded from ICPSR instead of Card's original dataset (which we don't have).
- Our Stata code: here
For replicating table 6, we need data from 1980-2000, but not from 2005/06. The complete list of scripts needed to replicate Table 6 is below.
- For 1980: the list of scripts can be found in Card's README.
- For 1990: the list of scripts can be found in Card's README
- For 2000: the list of scripts can be found in Card's README
What we already know that I won't repeat in length:
- Table 2 (Characteristics of Immigrants in 2000): We already discussed this in Issues #9
and #11. Except for the fact that we have two-hundred thousand more natives in our sample as Card does, the summary statistics look exactly the same across the 3 exercises (Card's original results, Stata with our dataset, SAS with our dataset). - Table 3 (Summary statistics for samples from 1980, 1990, 2000, 2005/06): Also discussed in Issues #9 and #10. As we know, the summary statistics for the first 4 columns look (almost) exactly the same across exercises. We can see very small differences in 1990. What was worrying us more were the differences in the last two columns: the overall variance of log wage and the residual variance of log wage. I will talk about this below.
New developments
The last two columns of Table 3 look different if ran in SAS vs. Stata. In SAS, "PROC GLM" is used to first regress log wage on a bunch of things and get the residual. Then, the variance of the log wage and of the residuals are calculated across MSAs. The original code for 1980 can be found here (starting in line 176). In Stata, instead of using "PROC GLM", I just use "reg". This generates different results.
Note also that this Stata results for the variances look a bit different from the ones I reported in Issue #9. I fixed a couple of things since then and thus the results now are closer to Card's.
Overall | Residual | ||||||||
---|---|---|---|---|---|---|---|---|---|
Card | SAS | Stata | Card | SAS | Stata | ||||
Native men | 1980 | 0.385 | 0.387 | 0.386 | 0.288 | 0.288 | 0.288 | ||
1990 | 0.462 | 0.452 | 0.452 | 0.322 | 0.319 | 0.317 | |||
2000 | 0.487 | 0.486 | 0.486 | 0.353 | 0.358 | 0.358 | |||
Native women | 1980 | 0.317 | 0.316 | 0.316 | 0.269 | 0.268 | 0.268 | ||
1990 | 0.382 | 0.381 | 0.381 | 0.295 | 0.294 | 0.294 | |||
2000 | 0.408 | 0.408 | 0.408 | 0.313 | 0.320 | 0.320 | |||
Immigrant men | 1980 | 0.444 | 0.444 | 0.444 | 0.321 | 0.321 | 0.334 | ||
1990 | 0.517 | 0.513 | 0.513 | 0.347 | 0.342 | 0.364 | |||
2000 | 0.557 | 0.557 | 0.557 | 0.390 | 0.391 | 0.409 | |||
Immigrant women | 1980 | 0.343 | 0.343 | 0.343 | 0.291 | 0.291 | 0.296 | ||
1990 | 0.414 | 0.413 | 0.413 | 0.318 | 0.317 | 0.330 | |||
2000 | 0.484 | 0.484 | 0.484 | 0.367 | 0.369 | 0.380 |
If I then export the SAS dataset that generated the variances above and use it for generating Table 6, I get a Table 6 that:
- it's different from our previous table (in Issue #9)
- it's even more different from Card's original table
- but then the SAS and Stata tables agree (indicating that most differences between our results in Stata and SAS were coming from the script that generates the residuals and variances).
The table 6 we get from both SAS and Stata can be found below. The tables were generated by Stata, but the equivalent results in SAS can be found in this link:
- OLS estimates for High School: The coefficient estimates and R2 agree exactly between SAS and Stata. The standard errors are different.
- Column 1 regression results start at line 595 (R2 at line 505 and coefficient at line 516)
- Column 2 regression results start at line 568 (R2 at line 578 and coefficient at line 589)
- IV estimates for High School:
- Column 3 regression results start at line 1243.
- Coefficient estimates don't agree exactly but very close.
- Stata R2 is 0.203, SAS R2 is 0.145
- The 1st stage of column Column 3 starts at line 818. Stata 1st stage t-statistic is 5.53, SAS 1st stage t-statistic is 7.87.
- Column 3 regression results start at line 1243.
------------------------------------------------------------------------------------
1 2 3 4
b/se b/se b/se b/se
------------------------------------------------------------------------------------
Log rel supply imm~e -0.030*** -0.030*** -0.036*** -0.036***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1980 -0.095*** -0.094*** -0.104*** -0.106***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1990 0.101*** 0.100*** 0.113*** 0.114***
(0.00) (0.00) (0.00) (0.00)
College share 1980 0.099*** 0.098*** 0.121*** 0.124***
(0.02) (0.02) (0.02) (0.02)
College share 1990 -0.007 -0.006 -0.014 -0.016
(0.01) (0.01) (0.01) (0.01)
Wage res native 1980 0.135*** 0.137*** 0.141*** 0.136***
(0.01) (0.01) (0.01) (0.01)
Wage res imm 1980 -0.160*** -0.162*** -0.169*** -0.164***
(0.00) (0.00) (0.00) (0.00)
Mfg share in 1980 -0.225*** -0.226*** -0.258*** -0.256***
(0.01) (0.01) (0.01) (0.01)
Mfg share in 1990 0.194*** 0.195*** 0.227*** 0.223***
(0.01) (0.01) (0.01) (0.01)
Lagged dep var 0.004 -0.009***
(0.00) (0.00)
constant -0.128*** -0.127*** -0.159*** -0.160***
(0.00) (0.00) (0.00) (0.00)
------------------------------------------------------------------------------------
r2 0.210 0.210 0.203 0.203
------------------------------------------------------------------------------------
------------------------------------------------------------------------------------
5 6 7 8
b/se b/se b/se b/se
------------------------------------------------------------------------------------
Log rel supply imm~e -0.058*** -0.054*** -0.078*** -0.072***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1980 -0.040*** -0.039*** -0.058*** -0.054***
(0.00) (0.00) (0.00) (0.00)
Log msa size 1990 0.031*** 0.034*** 0.053*** 0.052***
(0.00) (0.00) (0.00) (0.00)
College share 1980 -0.055*** -0.114*** -0.003 -0.052**
(0.02) (0.02) (0.02) (0.02)
College share 1990 -0.022 0.045*** -0.045*** 0.005
(0.01) (0.01) (0.01) (0.01)
Wage res native 1980 0.309*** 0.363*** 0.338*** 0.371***
(0.00) (0.01) (0.01) (0.01)
Wage res imm 1980 -0.224*** -0.287*** -0.248*** -0.288***
(0.00) (0.00) (0.00) (0.00)
Mfg share in 1980 -0.373*** -0.422*** -0.377*** -0.410***
(0.01) (0.01) (0.01) (0.01)
Mfg share in 1990 0.499*** 0.546*** 0.484*** 0.518***
(0.01) (0.01) (0.01) (0.01)
Lagged dep var 0.137*** 0.095***
(0.00) (0.00)
constant -0.061*** -0.085*** -0.139*** -0.143***
(0.00) (0.00) (0.00) (0.00)
------------------------------------------------------------------------------------
r2 0.386 0.401 0.356 0.379
------------------------------------------------------------------------------------
Many data checks using the .lst files
When dealing with SAS, we have two important files: the .sas and the .lst files. The .sas files are the scripts. The .lst files store any results that were printed while running the script. Thus, .lst saves the results from "PROC MEANS", "PROC PRINT", "POROC GLM", etc. Luckily, we have (almost) all of Card's .lst files. Thus, we can compare our results after running each single script by comparing the .lst files.
Take aways
- 1990 is the most different year. I think something changed with this dataset since Card used it. In 1980 and 2000, we have exactly the same number of immigrants as Card does (but more natives). In this dataset, both the number of immigrants and natives are different from Card's. The summary statistics are also a bit different, while the summary statistics from 1980 and 2000 look the same as Card's.
Data checks by yea using the .lst files
- 1980
Us | Card | |
---|---|---|
np2 | link | link |
allnp2 | link | link |
cell1 | link | link |
t1 | link | link |
supply1 | link | link |
imm1 | link | link |
indist | link | link |
- 1990
Us | Card | |
---|---|---|
np2 | link | link |
allnp2 | link | link |
cell1 | link | link |
t1 | link | link |
supply1 | link | link |
imm1 | link | link |
indist | link | link |
- 2000
Us | Card | |
---|---|---|
np2 | link | link |
allnp2 | link | link |
cell1 | link | link |
t1 | link | link |
supply1 | link | link |
imm3 | link | link |
imm2 | link | link |
inflow3 | link | link |
Tables using ICPSR in SAS
Table 2
Looks exactly the same as in Stata. For the Stata version, see #9.
Note that ic = -2 refers to everybody (natives + immigrants), ic = -1 refers to natives and ic = 0 refers to immigrants. And below is the dict for the remaining ic:
1 "mexico"
2 "phillipines"
3 "india"
4 "vietnam"
5 "el salvador"
6 "china"
7 "cuba"
8 "dominican republic"
9 "korea"
10 "jamaica"
11 "canada"
12 "colombia"
13 "guatemala"
14 "germany"
15 "haiti"
16 "poland"
Table 3
First 4 columns of Table 3 (I am omitting 2005/2006 because it uses ACS data, and not ICPSR data).
The first 4 columns look exactly the same as the corresponding columns in the Stata table found in #9.
The relevant observations are in rows 10-21.
Pre-trends
For each of the plots below, I do the following:
-
Run one regression for each year (1980, 1990, 2000) and store the coefficient on the instrument. For each regression
- the dependent variable is the appropriate one (either difference in mean wage residuals for HS-equivalent workers or for College-equivalent workers)
- the set of controls is the same across all regressions. Note: None of the regressions includes the lagged dependent variable in the set of controls
- for the "country" regressions, the instrument is the share of immigrants from that country living in location l in 1980
- for the "aggregate bartik" regressions, the instrument is the appropriate one (either predicted inflow of HS-equivalent workers or of College-equivalent workers)
-
Plot the coefficients on the instrument.
Note: The x-axis is showing 5-year intervals, but we only have data for 1980, 1990, and 2000. I will fix that.
High School equivalent workers
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
College equivalent workers
![]() |
![]() |
---|---|
![]() |
![]() |
![]() |
![]() |
Data cleaning
1. Defining who is immigrant
Card:
Defines as immigrant people who were naturalized citizen or who are still not citizens.
citizen='0=us born 1=nat 2=not cit 3=born abroad us parents'
imm=(citizen in (1,2))
Victoria:
Card + 4 + 5
/* CITIZEN:
0 n/a
1 born abroad of american parents
2 naturalized citizen
3 not a citizen
4 not a citizen, but has received first papers
5 foreign born, citizenship status not reported
*/
gen imm = .
replace imm = 1 if citizen == 2 | citizen == 3 | citizen == 4 | citizen == 5
2. Hours worked last year
Card:
His data seems to have the exact number of weeks people worked last year. His code is the following:
annhrs=weeks*hrswkly;
That is, total annual hours = weeks * weekly hours
Victoria:
The Bartik data has multiple bins for number of weeks worked last year.
WKSWORK2:
0 n/a
1 1-13 weeks
2 14-26 weeks
3 27-39 weeks
4 40-47 weeks
5 48-49 weeks
6 50-52 weeks
Thus, I am currently getting an average of those.
gen weeks = .
replace weeks = 0 if wkswork2 == 0
replace weeks = 7 if wkswork2 == 1
replace weeks = 20 if wkswork2 == 2
replace weeks = 33 if wkswork2 == 3
replace weeks = 43.5 if wkswork2 == 4
replace weeks = 48.5 if wkswork2 == 5
replace weeks = 51.5 if wkswork2 == 6
3. Education labels
Card: Census data has exactly one category for each grade and it also has information on whether the person completed the grade.
GRADE 2 40
Highest Year of School
Attended
00 Never attended school or N/A (under 3
years of age)
01 Nursery school
02 Kindergarten
Elementary:
03 First grade
04 Second grade
05 Third grade
06 Fourth grade
07 Fifth grade
08 Sixth grade
09 Seventh grade
10 Eighth grade
High school:
11 Ninth grade
12 Tenth grade
13 Eleventh grade
14 Twelfth grade
College:
15 First year
16 Second year
17 Third year
18 Fourth year
19 Fifth year
20 Sixth year
21 Seventh year
22 Eighth year or more
Victoria:
Bartik data has too many categories and the numbers don't really add up:
EDUCD:
0 n/a or no schooling
1 n/a
2 no schooling completed
10 nursery school to grade 4
11 nursery school, preschool
12 kindergarten
13 grade 1, 2, 3, or 4
14 grade 1
15 grade 2
16 grade 3
17 grade 4
20 grade 5, 6, 7, or 8
21 grade 5 or 6
22 grade 5
23 grade 6
24 grade 7 or 8
25 grade 7
26 grade 8
30 grade 9
40 grade 10
50 grade 11
60 grade 12
61 12th grade, no diploma
62 high school graduate or ged
63 regular high school diploma
64 ged or alternative credential
65 some college, but less than 1 year
70 1 year of college
71 1 or more years of college credit, no degree
80 2 years of college
81 associate's degree, type not specified
82 associate's degree, occupational program
83 associate's degree, academic program
90 3 years of college
100 4 years of college
101 bachelor's degree
110 5+ years of college
111 6 years of college (6+ in 1960-1970)
112 7 years of college
113 8+ years of college
114 master's degree
115 professional degree beyond a bachelor's degree
116 doctoral degree
999 missing
To see what I mean by "they don't really add up", consider:
educational attainment [detailed version] | Freq. | Percent | Cum. |
---|---|---|---|
grade 1, 2, 3, or 4 | 59,078 | 35.54 | 35.54 |
grade 1 | 9,674 | 5.82 | 41.36 |
grade 2 | 21,079 | 12.68 | 54.04 |
grade 3 | 37,034 | 22.28 | 76.32 |
grade 4 | 39,371 | 23.68 | 100.00 |
Total | 166,236 | 100.00 |
4. Income measures
Card:
- wagesal: Wage or Salary Income (INCOME1 in the 1980n Census variable dictionary)
- selfinc: Nonfarm Self-Employment Income (INCOME2 in the 1980 Census variable dictionary)
- farminc: Farm Self-Employment Income (INCOME3 in the 1980 Census variable dictionary)
- income: Income From All Sources (INCOME8 in the 1980 Census variable dictionary)
Then he defines self-employed as anyone who has a positive (selfinc + farminc)
Victoria:
- inctot: total personal income
- ftotinc: total family income
- incwage: wage and salary income
- incbus00: business and farm income, 2000
- incearn: total personal earned income
Variable | Obs | Mean | Std. Dev. | Min | Max |
---|---|---|---|---|---|
inctot | 21,864,217 | 23201.98 | 34290.62 | -20000 | 1471000 |
ftotinc | 21,864,217 | 249646.6 | 1404037 | -30000 | 9999999 |
incwage | 21,864,217 | 19101.37 | 19101.37 | 0 | 641000 |
incbus00 | 10,986,023 | 2195.484 | 15365.94 | -10000 | 573000 |
incearn | 16,810,374 | 24123.92 | 35430.46 | -19996 | 1146000 |
Bartik dataset has no measure of self-employed earnings, so I will use this other variable to define self-employment:
CLASSWKRD:
0 n/a
10 self-employed
11 employer
12 working on own account
13 self-employed, not incorporated
14 self-employed, incorporated
20 works for wages
21 works on salary (1920)
22 wage/salary, private
23 wage/salary at non-profit
24 wage/salary, government
25 federal govt employee
26 armed forces
27 state govt employee
28 local govt employee
29 unpaid family worker
https://www.dropbox.com/s/8jiij8ntdq1lcau/Screenshot%202019-01-16%2014.56.49.png?dl=0
5. Country codes - grouping into 38 groups
Card: The country codes used by Card can be found in Appendix F of the Codebook for the 1980 5% extracts, available from ICPSR.
He groups countries into 38 groups:
mexico
phillip
india
vietnam
el salvador
china
cuba
dominican rep.
korea
jamaica
canada
columbia
guatemala
germany
haiti
poland
taiwan
england
italy
ecuador
japan
iran
honduras
peru
russia
nicaragua
guyana
pakistan
hong kong
trinidad-tobago
west europe+isreal+cyprus+auss+nz
east europe incl romania ukraine yugoslav
middle east turkey bulgaria and the stans
asia and oceana
s america + north am nec
africa
caribbean + central am
else
Somewhat unrelated note: Later on, Card creates even broader categories of countries (e.g., european, high asia, mid asia, mexico), and he includes Canada in the european group, Pakistan and Iran in the high asia group,
Victoria:
Issue: The Bartik dataset doesn't have 15/38 groups used by Card:
el salvador
dominican rep.
jamaica
colombia
guatemala
haiti
taiwan
ecuador
honduras
peru
nicaragua
guyana
pakistan
hong kong
trinidad-tobago
For these groups, instead of using a person's place of birth I use whether the person is an immigrant combined with her primary ancestry (using the variable ancestr1
). So if a person is an immigrant and her first response for ancestry is "salvadoran", I count her as having been born in El Salvador. This is of course not perfect, since some immigrants report being born in a country different than the ancestor.
6. Years in the US
Card:
Census data has the immigration year. So the 1980 Census, for example, has a variable that looks like
IMMIGR 1 26
Year of Immigration
0 N/A (born in the United States or
outlying areas or born abroad of
American parents)
1 1975 to 1980
2 1970 to 1974
3 1965 to 1969
4 1960 to 1964
5 1950 to 1959
6 Before 1950
So he approximates how many years the person has been in the U.S. using that variable. This allows him to distinguish between people who have been in the U.S. for 20+ years vs. 40+ years.
if immyr=1 then yrsinus=2.5;
else if immyr=2 then yrsinus=7.5;
else if immyr=3 then yrsinus=12.5;
else if immyr=4 then yrsinus=17.5;
else if immyr=5 then yrsinus=25.5;
else if immyr=6 then yrsinus=40;
else yrsinus=.;
Victoria:
The Bartik data, on the other hand, only says if the person has been in the US for 21+ years, so don't have the same level of granularity and I am not sure how many years to put. Note: For each obs, we have the person's date of birth, so maybe we can use that to approx how many years in the U.S.
Right now, I am using 30 years for anyone who has been in the U.S. for 21+ years.
YRSUSA2:
0 n/a
1 0-5 years
2 6-10 years
3 11-15 years
4 16-20 years
5 21+ years
9 missing
CZs not a unique identifier
Today in the meeting we saw that I should be merging files with only one observation per CZ. However, some files seem to have by construction more than one obs per CZ.
Just to document a simple example.
The cell1
script in the 2000 folder collapses the data by the variables: rczone, native, male, eclass and xclass2. The resulting dataset is called bigcells.dta
In the table6
script, we load the bigcells dataset and for each value that eclass takes, we keep only the male and native observations. However, rczone is not yet a unique identifier since there are 4 values for xclass2 for each obs that is male, native, and in a specific eclass.
xclass2 is an experience variable and eclass is an education variable. Each of them has 4 categories.
Variable definitions in Card's code:
if educ<12 then eclass=1;
else if educ=12 then eclass=2;
else if educ<16 then eclass=3;
else eclass=4;
if exp<=10 then xclass2=1;
else if exp<=20 then xclass2=2;
else if exp<=30 then xclass2=3;
else xclass2=4;
c=1;
Below are snippets of the code that saves bigcells and also the ones that use bigcells
Script cell1 saves bigcells
SAS code
proc summary;
class rmsa native male eclass xclass2;
var logwage2 lw2sq res ressq pred predsq respred imm female educ exp c
dropout hs somecoll collplus college advanced ;
output out=here.bigcells
mean=
sum(c)=count;
weight wt;
Script table6 uses bigcells
SAS code
*this macro gets native wages by eclass;
%macro nwage(ed);
%let edg=&ed;
data nw&edg;
set here.bigcells;
if native=1 and male=1 and eclass=&edg and xclass2=. ;
nwage&edg=logwage2;
nres&edg=res;
npred&edg=pred;
ncountw&edg=count / 1000;
keep rmsa nwage&edg npred&edg nres&edg ncountw&edg;
proc sort; by rmsa;
%mend;
Stata code
use data/2000/bigcells.dta, clear
local education_groups 1 2 3 4
forv i=1/4{
local edg : word `i' of `education_groups'
preserve
keep if native == 1 & male == 1 & eclass == `edg'
gen nwage`edg' = logwage2
gen nres`edg' = res
gen npred`edg' = pred
gen ncountw`edg' = count/1000
keep rczone nwage`edg' npred`edg' nres`edg' ncountw`edg'
sort rczone
save data/2000/nw`edg'.dta, replace
restore
}
| eclass
xclass2 | 1 2 3 4 | Total
-----------+--------------------------------------------+----------
1 | 175,066 464,765 295,094 292,924 | 1,227,849
2 | 155,468 529,215 302,775 315,280 | 1,302,738
3 | 152,129 526,231 283,651 309,079 | 1,271,090
4 | 153,342 442,467 169,002 166,318 | 931,129
-----------+--------------------------------------------+----------
Total | 636,005 1,962,678 1,050,522 1,083,601 | 4,732,806
Data checks
First attempt at replicating Table 1
Working age population | Share of US population | Pct Immigrant | Pct Hispanic | Pct Minority | Pct Dropout | Pct High shchool | Pct Some college | Pct college or more | Mean wage2 | |
---|---|---|---|---|---|---|---|---|---|---|
All US | 206238 | 100.1 | 12 | 11 | 25 | 15 | 42 | 22 | 22 | 66.73 |
Larger czones (top 100) | 115544 | 56.1 | 17 | 14 | 32 | 14 | 38 | 22 | 26 | 66.355 |
Rest of country | 90694 | 44 | 4.8 | 6.9 | 16 | 15 | 46 | 22 | 17 | 67.209 |
1st largest czone | 8820 | 4.3 | 41 | 39 | 58 | 22 | 33 | 22 | 23 | 78.18 |
2nd largest czone | 6544 | 3.2 | 37 | 22 | 51 | 16 | 36 | 19 | 30 | 85.555 |
3rd largest czone | 4510 | 2.2 | 23 | 17 | 41 | 14 | 35 | 21 | 30 | 66.851 |
4th largest czone | 3245 | 1.6 | 29 | 17 | 39 | 11 | 37 | 18 | 34 | 69.28 |
5th largest czone | 3021 | 1.5 | 8.8 | 5.9 | 29 | 11 | 42 | 19 | 28 | 67.813 |
6th largest czone | 2989 | 1.5 | 8.2 | 2.9 | 26 | 11 | 41 | 24 | 24 | 67.921 |
7th largest czone | 2906 | 1.4 | 16 | 6.4 | 16 | 9.1 | 34 | 20 | 37 | 62.88 |
8th largest czone | 2821 | 1.4 | 22 | 9.3 | 42 | 10 | 30 | 19 | 42 | 58.961 |
9th largest czone | 2657 | 1.3 | 33 | 17 | 46 | 11 | 28 | 24 | 37 | 69.416 |
10th largest czone | 2643 | 1.3 | 25 | 27 | 49 | 20 | 36 | 20 | 24 | 71.92 |
11th largest czone | 2527 | 1.2 | 13 | 6.7 | 39 | 13 | 36 | 21 | 30 | 61.567 |
12th largest czone | 2238 | 1.1 | 13 | 5 | 17 | 8.1 | 35 | 27 | 30 | 62.136 |
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.