Chapter 8 Projects
This chapter contains several empirical projects.
8.1 Opportunity Atlas
In this empirical project you will have a close look at what is commonly known as the American Dream - the idea that in a society with few barriers, everybody can achieve upward social mobility and be better off than their parents, if only they work hard enough.13
The Opportunity Atlas is a website maintained by the US Census Bureau and fed with data from recent high-quality research on upward social mobility at the Census Tract level. Census Tracts are geographic areas which contain on average less than 4000 residents, and which cover the entire United States. The Atlas has been widely used in recent newspaper reporting, and you should start this project by reading the corresponding piece from the New York Times Upshot series.
This project focuses on developing your descriptive data analysis skills. In particular you will
- use mapping tools to visualize geospatial data, and
- compute simple descriptive statistics and report regression results to shed light on the relationship between two or more random variables.
In order to achieve those goals, you will interact with the Atlas website at https://www.opportunityatlas.org, and you will use the raw data behind the website to compute additional statistics.
8.1.1 Instructions
- Your submission will be a statistical report addressing the below questions, produced from a single
R
markdown file (ending in.Rmd
). You can create a simpleRmd
template in Rstudio by clicking top left on the “new file” symbol, then selecting R Markdown. A quick but effective guide on using Markdown is available here. - Please submit two files: the
.Rmd
source file (which creates the report), and the report itself, in either HTML or PDF format (Rstudio dropdown menu knit) - You can do this exercise in teams of up to 3 people: make sure to edit the
author
field in theRmd
correspondingly. - Submit by sending a private message with the
Rmd
as attachment on slack.
8.1.2 Questions
Referring to the NYT article:
Why does the Seattle Housing Authority give away vouchers to pay for rent in the area between 100th and 115th Streets, east of Meridian, west of 35th Avenue?
The data here shows how easy it is for children of poor parents to escape poverty themselves. In explaining what makes a good neighborhood, how much of the variation we see is actually explained by things like school boundary lines and poverty levels alone? In other words, how important is the school you go to in isoluation of other factors, if you want to escape the low rank of your parents in the income distribution?
Some select census tracts have received up to 500 million USD since 1990 in place-based neighborhood improvement policies. Do we know if and to what extent those investments were efficacious?
Go to https://www.opportunityatlas.org and look at the census tracts around where your home is, if you grew up in the United States. If you grew up outside the US, randomly choose one of the State Capitals from this list and select any census tract at random by zooming in. Make sure to immediately post your choice of city in the corresponding slack channel (e.g. “florian-atlas” for my group) to avoid having multiple reports on the same city. Let’s call the chosen census tract your home.
Create Figure 1 in your report, which should display the map shown by https://www.opportunityatlas.org for the census tracts around your home (you can download that map as an image bottom left). Notice that you can include a figure in an
.Rmd
like follows. Remember that allR
-code chunks in your .Rmd are inside a block delimited by tripple backticks```
.```{r your-chunk-label,echo = FALSE, out.width = "75%"} # your-chunk-label: name you give that code chunk (optional) # echo = TRUE/FALSE: display the code in output? # out.width = "75%": scale image to 75% of page width # all options: https://yihui.name/knitr/options/ knitr::include_graphics("path_to_your_grahpic.png") ```
Your accompanying text should describe what is shown in that map, in particular what data are being visualized. Examine next the patterns for a number of different groups (e.g., lowest income children, high income children) and outcomes (e.g., earnings in adulthood, incarceration rates). Only choose one or two of these to include in your report.
To answer the next question, read the Opportunity Atlas Manual: What period do the data you are analyzing come from? Are you concerned that the neighborhoods you are studying may have changed for kids now growing up there? What evidence do the authors of the manual provide suggesting that such changes are or are not important? What type of data could you use to test whether your neighborhood has changed in recent years?
Now turn to the
atlas.Rds
data set, which you can load as shown below. How does average upward mobility, pooling races and genders, for children with parents at the 25th percentile (kfr_pooled_p25
) in your home Census tract compare to mean (population-weighted, usingcount_pooled
) upward mobility in your state and in the U.S. overall? Do kids where you grew up have better or worse chances of climbing the income ladder than the average child in America? Hint: The Opportunity Atlas website will give you the tract, county, and state FIPS codes for your home address. For example, searching for “Lynwood Road, Verona, New Jersey” will display Tract 34013021000, Verona, NJ. The first two digits refer to the state code, the next three digits refer to the county code, and the last 6 digits refer to the tract code. InR
, listing this observation can be done as follows:What is the standard deviation of upward mobility (population-weighted) in your home county, and what does this number tell you? Is it larger or smaller than the standard deviation across tracts in your state (i.e. compare your county to all other counties in your state)? Across tracts in the entire country? What do you learn from these comparisons? Notice that you can compute a weighted standard deviation for vector
x
using weightsweight_variable
inR
like thisNow let’s turn to downward mobility: repeat questions 3. and 4. looking at children who start with parents at the 75th and 100th percentiles. How do the patterns differ?
Using a linear regression, estimate the relationship between outcomes of children at the 25th and 75th percentile for the Census tracts in your home county. Generate a scatter plot to visualize this regression. Do areas where children from low-income families do well generally have better outcomes for those from high-income families, too?
Next, examine whether the patterns you have looked at above are similar by race. If there is not enough racial heterogeneity in the area of interest (i.e., data is missing for most racial groups), then choose a different area to examine.
Using the Census tracts in your home county, can you identify any covariates which help explain some of the patterns you have identified above? Some examples of covariates you might examine include housing prices, income inequality, fraction of children with single parents, job density, etc. For 2 or 3 of these, report estimated correlation coefficients along with their 95% confidence intervals.
Open question: formulate a hypothesis for why you see the variation in upward mobility for children who grew up in the Census tracts near your home and provide correlational evidence testing that hypothesis. For this question, many covariates have been provided to you in the
atlas.Rds
file, which are described in subsection 8.1.3.
8.1.3 Detailed Data Description
The data consist of \(n = 73,278\) U.S. Census tracts. For more details on the construction of the variables included in this data set, please see Chetty, Raj, John Friedman, Nathaniel Hendren, Maggie R. Jones, and Sonya R. Porter. 2018. “The Opportunity Atlas: Mapping the Childhood Roots of Social Mobility.” NBER Working Paper No. 25147
Variable name | Label | Obs. |
---|---|---|
tract | Tract FIPS Code (6-digit) 2010 | 73,278 |
county | County FIPS Code (3-digit) | 73,278 |
state | State FIPS Code (2-digit) | 73,278 |
cz | Commuting Zone Identifier (1990 Definition) | 72,473 |
hhinc_mean2000 | Mean Household Income 2000 | 72,302 |
mean_commutetime2000 | Average Commute Time of Working Adults in 2000 | 72,313 |
frac_coll_plus2010 | Fraction of Residents with a College Degree or More in 2010 | 72,993 |
foreign_share2010 | Share of Population Born Outside the U.S. | 72,279 |
med_hhinc2016 | Median Household Income in 2016 | 72,763 |
med_hhinc1990 | Median Household Income in 1999 | 72,313 |
popdensity2000 | Population Density (per square mile) in 2000 | 72,469 |
poor_share2010 | Poverty Rate 2010 | 72,933 |
poor_share2000 | Poverty Rate 2000 | 72,315 |
poor_share1990 | Poverty Rate 1990 | 72,323 |
share_black2010 | Share black 2010 | 73,111 |
share_hisp2010 | Share Hispanic 2010 | 73,111 |
share_asian2010 | Share Asian 2010 | 71,945 |
share_black2000 | Share black 2000 | 72,368 |
share_white2000 | Share white 2000 | 72,368 |
share_hisp2000 | Share Hispanic 2000 | 72,368 |
share_asian2000 | Share Asian 2000 | 71,050 |
gsmn_math_g3_2013 | Average School District Level Standardized Test Scores in 3rd Grade in 2013 | 72,090 |
rent_twobed2015 | Average Rent for Two-Bedroom Apartment in 2015 | 56,607 |
singleparent_share2010 | Share of Single-Headed Households with Children 2010 | 72,564 |
singleparent_share1990 | Share of Single-Headed Households with Children 1990 | 72,196 |
singleparent_share2000 | Share of Single-Headed Households with Children 2000 | 72,285 |
traveltime15_2010 | Share of Working Adults w/ Commute Time of 15 Minutes Or Less in 2010 | 72,939 |
emp2000 | Employment Rate 2000 | 72,344 |
mail_return_rate2010 | Census Form Rate Return Rate 2010 | 72,547 |
ln_wage_growth_hs_grad | Log wage growth for HS Grad., 2005-2014 | 51,635 |
jobs_total_5mi_2015 | Number of Primary Jobs within 5 Miles in 2015 | 72,311 |
jobs_highpay_5mi_2015 | Number of High-Paying (>USD40,000 annually) Jobs within 5 Miles in 2015 | 72,311 |
nonwhite_share2010 | Share of People who are not white 2010 | 73,111 |
popdensity2010 | Population Density (per square mile) in 2010 | 73,194 |
ann_avg_job_growth_2004_2013 | Average Annual Job Growth Rate 2004-2013 | 70,664 |
job_density_2013 | Job Density (in square miles) in 2013 | 72,463 |
kfr_pooled_p25 | Household income ($) at age 31-37 for children with parents at the 25th percentile of the national income distribution | 72,011 |
kfr_pooled_p75 | Household income ($) at age 31-37 for children with parents at the 75th percentile of the national income distribution | 72,012 |
kfr_pooled_p100 | Household income ($) at age 31-37 for children with parents at the 100th percentile of the national income distribution | 71,968 |
kfr_natam_p25 | Household income ($) at age 31-37 for Native American children with parents at the 25th percentile of the national income distribution | 1,733 |
kfr_natam_p75 | Household income ($) at age 31-37 for Native American children with parents at the 75th percentile of the national income distribution | 1,728 |
kfr_natam_p100 | Household income ($) at age 31-37 for Native American children with parents at the 100th percentile of the national income distribution | 1,594 |
kfr_asian_p25 | Household income ($) at age 31-37 for Asian children with parents at the 25th percentile of the national income distribution | 15,434 |
kfr_asian_p75 | Household income ($) at age 31-37 for Asian children with parents at the 75th percentile of the national income distribution | 15,360 |
kfr_asian_p100 | Household income ($) at age 31-37 for Asian children with parents at the 100th percentile of the national income distribution | 13,480 |
kfr_black_p25 | Household income ($) at age 31-37 for Black children with parents at the 25th percentile of the national income distribution | 34,086 |
kfr_black_p75 | Household income ($) at age 31-37 for Black children with parents at the 75th percentile of the national income distribution | 34,049 |
kfr_black_p100 | Household income ($) at age 31-37 for Black children with parents at the 100th percentile of the national income distribution | 32,536 |
kfr_hisp_p25 | Household income ($) at age 31-37 for Hispanic children with parents at the 25th percentile of the national income distribution | 37,611 |
kfr_hisp_p75 | Household income ($) at age 31-37 for Hispanic children with parents at the 75th percentile of the national income distribution | 37,579 |
kfr_hisp_p100 | Household income ($) at age 31-37 for Hispanic children with parents at the 100th percentile of the national income distribution | 35,987 |
kfr_white_p25 | Household income ($) at age 31-37 for white children with parents at the 25th percentile of the national income distribution | 67,978 |
kfr_white_p75 | Household income ($) at age 31-37 for white children with parents at the 75th percentile of the national income distribution | 67,968 |
kfr_white_p100 | Household income ($) at age 31-37 for white children with parents at the 100th percentile of the national income distribution | 67,627 |
count_pooled | Count of all children | 72,451 |
count_white | Count of White children | 72,451 |
count_black | Count of Black children | 72,451 |
count_asian | Count of Asian children | 72,451 |
count_hisp | Count of Hispanic children | 72,451 |
count_natam | Count of Native American children | 72,451 |