Iterating on
Data and Models

Modern Plain Text Social Science
Week 09

Kieran Healy

October 2025

Iterating on data with purrr

Load the packages, as always

library(here)      # manage file paths
here() starts at /Users/kjhealy/Documents/courses/mptc
library(socviz)    # data and some useful functions
library(tidyverse) # your friend and mine
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.1     ✔ stringr   1.5.2
✔ ggplot2   4.0.0     ✔ tibble    3.3.0
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.1.0     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

Split

Apply

Combine

The Core Idea

  • Split or group the data into pieces
  • Apply a function, i.e. act on the data
  • Combine the results back into a table

Sometimes we start out in a “split” state, as when we have multiple data files, and we need to read them in and then combine them.

More than one data file

  • Inside files/examples/ is a folder named congress/
# A little trick from the fs package:
fs::dir_tree(here("files", "examples", "congress"))
/Users/kjhealy/Documents/courses/mptc/files/examples/congress
├── 01_79_congress.csv
├── 02_80_congress.csv
├── 03_81_congress.csv
├── 04_82_congress.csv
├── 05_83_congress.csv
├── 06_84_congress.csv
├── 07_85_congress.csv
├── 08_86_congress.csv
├── 09_87_congress.csv
├── 10_88_congress.csv
├── 11_89_congress.csv
├── 12_90_congress.csv
├── 13_91_congress.csv
├── 14_92_congress.csv
├── 15_93_congress.csv
├── 16_94_congress.csv
├── 17_95_congress.csv
├── 18_96_congress.csv
├── 19_97_congress.csv
├── 20_98_congress.csv
├── 21_99_congress.csv
├── 22_100_congress.csv
├── 23_101_congress.csv
├── 24_102_congress.csv
├── 25_103_congress.csv
├── 26_104_congress.csv
├── 27_105_congress.csv
├── 28_106_congress.csv
├── 29_107_congress.csv
├── 30_108_congress.csv
├── 31_109_congress.csv
├── 32_110_congress.csv
├── 33_111_congress.csv
├── 34_112_congress.csv
├── 35_113_congress.csv
├── 36_114_congress.csv
├── 37_115_congress.csv
└── 38_116_congress.csv

More than one data file

Let’s look at one.

read_csv(here("files", "examples", "congress", "17_95_congress.csv")) |>
  janitor::clean_names() |>
  head()
# A tibble: 6 × 25
  last     first   middle suffix nickname born  death sex   position party state
  <chr>    <chr>   <chr>  <chr>  <chr>    <chr> <chr> <chr> <chr>    <chr> <chr>
1 Abdnor   James   <NA>   <NA>   <NA>     02/1… 11/0… M     U.S. Re… Repu… SD   
2 Abourezk James   George <NA>   <NA>     02/2… <NA>  M     U.S. Se… Demo… SD   
3 Adams    Brockm… <NA>   <NA>   Brock    01/1… 09/1… M     U.S. Re… Demo… WA   
4 Addabbo  Joseph  Patri… <NA>   <NA>     03/1… 04/1… M     U.S. Re… Demo… NY   
5 Aiken    George  David  <NA>   <NA>     08/2… 11/1… M     U.S. Se… Repu… VT   
6 Akaka    Daniel  Kahik… <NA>   <NA>     09/1… 04/0… M     U.S. Re… Demo… HI   
# ℹ 14 more variables: district <chr>, start <chr>, end <chr>, religion <chr>,
#   race <chr>, educational_attainment <chr>, job_type1 <chr>, job_type2 <chr>,
#   job_type3 <chr>, job_type4 <chr>, job_type5 <lgl>, mil1 <chr>, mil2 <chr>,
#   mil3 <chr>

We often find ourselves in this situation. We know each file has the same structure, and we would like to use them all at once.

Loops?

How to read them all in?

One traditional way, which we could do in R, is to write an explicit loop that iterated over a vector of filenames, read each file, and then stack the results all together in a tall rectangle.

## Pseudocode (i.e. will not really run)
## Also, if you do write loops, do not use them to grow dataframes in this way.
filenames <- c("01_79_congress.csv", "02_80_congress.csv", "03_81_congress.csv",
                "04_82_congress.csv" [etc etc])

collected_files <- NULL

for(i in 1:length(filenames)) {
      new_file <- read_file(filenames[i])
      collected_files <- append_to(collected_files, new_files)
}

A for loop initializes a counter and goes around and around, doing the thing and incrementing the counter until reaching a defined max value. A while loop keeps going until some condition is met.

Loops?

  • You may have noticed we have not written any loops, however.
  • While loops are still lurking there underneath the surface, what we will do instead is to take advantage of the combination of functions and vectors (or tables) and apply or map one to the other in order to generate results.
  • Speaking loosely, think of map() as a way of iterating without explicitly writing loops. You start with a vector of things. You feed it one thing at a time to some function. The function does whatever it does. You get back output that is the same length as your input, and of a specific type.
  • The reason we do this is that it leads to code we can pipeline as a sequence of function calls.

Mapping is just a kind of iteration

  • The purrr package provides a big family of mapping functions. One reason there are a lot of them is that purrr, like the rest of the tidyverse, is picky about data types.
  • So in addition to the basic map(), which always returns a list, we also have map_chr(), map_int(), map_dbl(), map_lgl() and others.
  • There are also variants for mapping over multiple inputs at once, like map2(), for when we are passing two arugments to some function, and pmap(), when we want to pass a list of any number of arguments.
  • The first part of the name tells you how many arguments are passing (1, 2, many). The part after the underscore tells you what type of output to expect. If there’s no suffix, you get a list back. These functions return their promised type or die trying.

Vectorized arithmetic again

The simplest cases are not that different from the vectorized arithmetic we’re already familiar with.

a <- c(1:10)

b <- 1

# You know what R will do here
a + b
 [1]  2  3  4  5  6  7  8  9 10 11

R’s vectorized rules add b to every element of a. The + operation can be thought of as a function that takes each element of a and does something with it. In this case “add b”. This is already a kind of iteration, in the sense that the operation recycles b down the vector a (unless they are the same length).

Vectorized arithmetic again

We can make this explicit by writing a function:

add_b <- function(x) {
  b <- 1 # fixed
  x + b  # for any valid x
}

Vectorized arithmetic again

We can make this explicit by writing a function:

add_b <- function(x) {
  b <- 1
  x + b # for any x
}

Now:

add_b(x = a)
 [1]  2  3  4  5  6  7  8  9 10 11

Vectorized arithmetic again

Again, R’s vectorized approach means it automatically adds b to every element of the x we give it.

add_b(x = 10)
[1] 11
add_b(x = c(1, 99, 1000))
[1]    2  100 1001

Iterating in a pipeline

Some operations can’t directly be vectorized in this way, which is why we need to manually iterate, which will make us want to write loops.

library(gapminder)

# Not what we want in this case
gapminder |> n_distinct()
[1] 1704

When you find yourself saying “No, I meant for each …”

gapminder |>
  summarize(country_n = n_distinct(country),
            continent_n = n_distinct(continent),
            year_n = n_distinct(year),
            lifeExp_n = n_distinct(lifeExp),
            population_n = n_distinct(population))
# A tibble: 1 × 5
  country_n continent_n year_n lifeExp_n population_n
      <int>       <int>  <int>     <int>        <int>
1       142           5     12      1626         4060

That’s tedious to write! Computers are meant to let us avoid this sort of thing.

Iterating in a pipeline

So how would we iterate this? What we want is to apply the n_distinct() function to each column of gapminder, but in a way that still allows us to use pipelines and so on. Here we leave out the names so we can see that the result is literally the same function being applied to each column.

library(gapminder)
gapminder |>
  summarize(n_distinct(country),
            n_distinct(continent),
            n_distinct(year),
            n_distinct(lifeExp),
            n_distinct(population))
# A tibble: 1 × 5
  `n_distinct(country)` `n_distinct(continent)` `n_distinct(year)`
                  <int>                   <int>              <int>
1                   142                       5                 12
# ℹ 2 more variables: `n_distinct(lifeExp)` <int>,
#   `n_distinct(population)` <int>

Iterating in a pipeline

In reality you’d use across(), like this:

gapminder |>
  summarize(across(everything(), n_distinct))
# A tibble: 1 × 6
  country continent  year lifeExp   pop gdpPercap
    <int>     <int> <int>   <int> <int>     <int>
1     142         5    12    1626  1704      1704

In this context, the function is applied across the specified columns; you’re always getting a tibble back; and the names are taken care of automatically.

Iterating in a pipeline

But you could also do this …

  map(gapminder, n_distinct)
$country
[1] 142

$continent
[1] 5

$year
[1] 12

$lifeExp
[1] 1626

$pop
[1] 1704

$gdpPercap
[1] 1704
  • Read it as “Feed each column of gapminder to the n_distinct() function.
  • (This is pretty much what across() is doing more nicely.)

Iterating in a pipeline

Or, in pipeline form:

gapminder |>
  map(n_distinct)
$country
[1] 142

$continent
[1] 5

$year
[1] 12

$lifeExp
[1] 1626

$pop
[1] 1704

$gdpPercap
[1] 1704

You can see we are getting a list back.

Iterating in a pipeline

Or, in pipeline form:

result <- gapminder |>
  map(n_distinct)

class(result)
[1] "list"
result$continent
[1] 5
result[[2]]
[1] 5

Iterating in a pipeline

But we know n_distinct() should always return an integer. We just want a vector of integers back, not a list of them. So we use map_int() instead of the generic map().

gapminder |>
  map_int(n_distinct)
  country continent      year   lifeExp       pop gdpPercap 
      142         5        12      1626      1704      1704 

The map() family can deal with all kinds of input types and output types. We choose some specific variant of map based on two things: what (and how many) things we are feeding the function we’re mapping, and what type of thing it should return.

Get a vector of filenames

filenames <- dir(path = here("files", "examples", "congress"),
                 pattern = "*.csv",
                 full.names = TRUE)

filenames[1:15] # Just displaying the first 15, to save slide space
 [1] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/01_79_congress.csv"
 [2] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/02_80_congress.csv"
 [3] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/03_81_congress.csv"
 [4] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/04_82_congress.csv"
 [5] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/05_83_congress.csv"
 [6] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/06_84_congress.csv"
 [7] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/07_85_congress.csv"
 [8] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/08_86_congress.csv"
 [9] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/09_87_congress.csv"
[10] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/10_88_congress.csv"
[11] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/11_89_congress.csv"
[12] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/12_90_congress.csv"
[13] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/13_91_congress.csv"
[14] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/14_92_congress.csv"
[15] "/Users/kjhealy/Documents/courses/mptc/files/examples/congress/15_93_congress.csv"

And feed it to read_csv()

… using map() and binding the resulting list into a tibble.

df <- filenames |>
  map(read_csv) |> 
  list_rbind(names_to = "congress") |>
  janitor::clean_names()

df
# A tibble: 20,580 × 26
   congress last   first middle suffix nickname born  death sex   position party
      <int> <chr>  <chr> <chr>  <chr>  <chr>    <chr> <chr> <chr> <chr>    <chr>
 1        1 Abern… Thom… Gerst… <NA>   <NA>     05/1… 01/2… M     U.S. Re… Demo…
 2        1 Adams  Sher… <NA>   <NA>   <NA>     01/0… 10/2… M     U.S. Re… Repu…
 3        1 Aiken  Geor… David  <NA>   <NA>     08/2… 11/1… M     U.S. Se… Repu…
 4        1 Allen  Asa   Leona… <NA>   <NA>     01/0… 01/0… M     U.S. Re… Demo…
 5        1 Allen  Leo   Elwood <NA>   <NA>     10/0… 01/1… M     U.S. Re… Repu…
 6        1 Almond J.    Linds… Jr.    <NA>     06/1… 04/1… M     U.S. Re… Demo…
 7        1 Ander… Herm… Carl   <NA>   <NA>     01/2… 07/2… M     U.S. Re… Repu…
 8        1 Ander… Clin… Presba <NA>   <NA>     10/2… 11/1… M     U.S. Re… Demo…
 9        1 Ander… John  Zuing… <NA>   <NA>     03/2… 02/0… M     U.S. Re… Repu…
10        1 Andre… Augu… Herman <NA>   <NA>     10/1… 01/1… M     U.S. Re… Repu…
# ℹ 20,570 more rows
# ℹ 15 more variables: state <chr>, district <chr>, start <chr>, end <chr>,
#   religion <chr>, race <chr>, educational_attainment <chr>, job_type1 <chr>,
#   job_type2 <chr>, job_type3 <chr>, job_type4 <chr>, job_type5 <chr>,
#   mil1 <chr>, mil2 <chr>, mil3 <chr>

read_csv() can do this directly

In fact map() is not required for this particular use:

tmp <- read_csv(filenames, id = "path",
                name_repair = janitor::make_clean_names)

tmp |>
  mutate(congress = stringr::str_extract(path, "_\\d{2,3}_congress"),
         congress = stringr::str_extract(congress, "\\d{2,3}")) |>
  relocate(congress)
# A tibble: 20,580 × 27
   congress path   last  first middle suffix nickname born  death sex   position
   <chr>    <chr>  <chr> <chr> <chr>  <chr>  <chr>    <chr> <chr> <chr> <chr>   
 1 79       /User… Aber… Thom… Gerst… <NA>   <NA>     05/1… 01/2… M     U.S. Re…
 2 79       /User… Adams Sher… <NA>   <NA>   <NA>     01/0… 10/2… M     U.S. Re…
 3 79       /User… Aiken Geor… David  <NA>   <NA>     08/2… 11/1… M     U.S. Se…
 4 79       /User… Allen Asa   Leona… <NA>   <NA>     01/0… 01/0… M     U.S. Re…
 5 79       /User… Allen Leo   Elwood <NA>   <NA>     10/0… 01/1… M     U.S. Re…
 6 79       /User… Almo… J.    Linds… Jr.    <NA>     06/1… 04/1… M     U.S. Re…
 7 79       /User… Ande… Herm… Carl   <NA>   <NA>     01/2… 07/2… M     U.S. Re…
 8 79       /User… Ande… Clin… Presba <NA>   <NA>     10/2… 11/1… M     U.S. Re…
 9 79       /User… Ande… John  Zuing… <NA>   <NA>     03/2… 02/0… M     U.S. Re…
10 79       /User… Andr… Augu… Herman <NA>   <NA>     10/1… 01/1… M     U.S. Re…
# ℹ 20,570 more rows
# ℹ 16 more variables: party <chr>, state <chr>, district <chr>, start <chr>,
#   end <chr>, religion <chr>, race <chr>, educational_attainment <chr>,
#   job_type1 <chr>, job_type2 <chr>, job_type3 <chr>, job_type4 <chr>,
#   job_type5 <chr>, mil1 <chr>, mil2 <chr>, mil3 <chr>

Wrangling Models

This is not a statistics seminar!

  • I’ll just give you an example of the sort of thing that many other modeling packages implement for all kinds of modeling techniques.
  • Again, the principle is tidy incorporation of models and their output.
  • Which means, as always: tibbles. We want a data frame of our results. A nice rectangular table that we know what to do with regardless of what its contents mean.

Tidy regression output with broom

library(broom)
library(gapminder)
out <- lm(formula = lifeExp ~ gdpPercap + log(pop) + continent,
          data = gapminder)

Tidy regression output with broom

summary(out)

Call:
lm(formula = lifeExp ~ gdpPercap + log(pop) + continent, data = gapminder)

Residuals:
    Min      1Q  Median      3Q     Max 
-47.490  -4.614   0.250   5.293  26.094 

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)    
(Intercept)       3.816e+01  2.050e+00  18.618  < 2e-16 ***
gdpPercap         4.557e-04  2.345e-05  19.435  < 2e-16 ***
log(pop)          6.394e-01  1.329e-01   4.810 1.64e-06 ***
continentAmericas 1.308e+01  6.063e-01  21.579  < 2e-16 ***
continentAsia     7.784e+00  5.810e-01  13.398  < 2e-16 ***
continentEurope   1.695e+01  6.350e-01  26.691  < 2e-16 ***
continentOceania  1.764e+01  1.779e+00   9.916  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 8.336 on 1697 degrees of freedom
Multiple R-squared:  0.585, Adjusted R-squared:  0.5835 
F-statistic: 398.7 on 6 and 1697 DF,  p-value: < 2.2e-16

We can’t do anything with this output, programatically. A summary like this is essentially a printout, once again a legacy of the teletype.

Look inside the box

Poke around inside

What about the object created by summary()?

Use the Object Inspector to take a look

There’s a bunch of different stuff in here. It’s a little awkward to work with directly all the time.

Tidy regression output with broom

library(broom)
tidy(out)
# A tibble: 7 × 5
  term               estimate std.error statistic   p.value
  <chr>                 <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)       38.2      2.05          18.6  1.50e- 70
2 gdpPercap          0.000456 0.0000234     19.4  3.98e- 76
3 log(pop)           0.639    0.133          4.81 1.64e-  6
4 continentAmericas 13.1      0.606         21.6  1.85e- 91
5 continentAsia      7.78     0.581         13.4  5.52e- 39
6 continentEurope   16.9      0.635         26.7  2.43e-131
7 continentOceania  17.6      1.78           9.92 1.43e- 22

That’s a lot nicer. Now it’s just a tibble. We know those. This is only a small piece of what’s in the summary object, but we’ll get to the rest in a minute.

Tidy regression output with broom

tidy() has some optional arguments.

out_conf <- tidy(out, conf.int = TRUE)
out_conf
# A tibble: 7 × 7
  term               estimate std.error statistic   p.value  conf.low conf.high
  <chr>                 <dbl>     <dbl>     <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)       38.2      2.05          18.6  1.50e- 70 34.1      42.2     
2 gdpPercap          0.000456 0.0000234     19.4  3.98e- 76  0.000410  0.000502
3 log(pop)           0.639    0.133          4.81 1.64e-  6  0.379     0.900   
4 continentAmericas 13.1      0.606         21.6  1.85e- 91 11.9      14.3     
5 continentAsia      7.78     0.581         13.4  5.52e- 39  6.64      8.92    
6 continentEurope   16.9      0.635         26.7  2.43e-131 15.7      18.2     
7 continentOceania  17.6      1.78           9.92 1.43e- 22 14.2      21.1     

Tidy regression output with broom

out_conf |>
    filter(term %nin% "(Intercept)") |>
    mutate(nicelabs = prefix_strip(term, "continent")) |>
    select(nicelabs, everything())
# A tibble: 6 × 8
  nicelabs  term       estimate std.error statistic   p.value conf.low conf.high
  <chr>     <chr>         <dbl>     <dbl>     <dbl>     <dbl>    <dbl>     <dbl>
1 gdpPercap gdpPercap   4.56e-4 0.0000234     19.4  3.98e- 76  4.10e-4  0.000502
2 log(Pop)  log(pop)    6.39e-1 0.133          4.81 1.64e-  6  3.79e-1  0.900   
3 Americas  continent…  1.31e+1 0.606         21.6  1.85e- 91  1.19e+1 14.3     
4 Asia      continent…  7.78e+0 0.581         13.4  5.52e- 39  6.64e+0  8.92    
5 Europe    continent…  1.69e+1 0.635         26.7  2.43e-131  1.57e+1 18.2     
6 Oceania   continent…  1.76e+1 1.78           9.92 1.43e- 22  1.42e+1 21.1     

Three ways to tidy

  • Component level: tidy()

Three ways to tidy

  • Component level: tidy()
  • Observation level: augment()

Three ways to tidy

  • Component level: tidy()
  • Observation level: augment()
  • Model level: glance()

Model output again

> summary(out)
Call:
lm(formula = lifeExp ~ gdpPercap + log(pop) + continent, data = gapminder)

Residuals:
    Min      1Q  Median      3Q     Max
-47.490  -4.614   0.250   5.293  26.094
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)       3.816e+01  2.050e+00  18.618  < 2e-16 ***
gdpPercap         4.557e-04  2.345e-05  19.435  < 2e-16 ***
log(pop)          6.394e-01  1.329e-01   4.810 1.64e-06 ***
continentAmericas 1.308e+01  6.063e-01  21.579  < 2e-16 ***
continentAsia     7.784e+00  5.810e-01  13.398  < 2e-16 ***
continentEurope   1.695e+01  6.350e-01  26.691  < 2e-16 ***
continentOceania  1.764e+01  1.779e+00   9.916  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.336 on 1697 degrees of freedom
Multiple R-squared:  0.585, Adjusted R-squared:  0.5835
F-statistic: 398.7 on 6 and 1697 DF,  p-value: < 2.2e-16

Component level

> summary(out)
Call:
lm(formula = lifeExp ~ gdpPercap + log(pop) + continent, data = gapminder)

Residuals:
    Min      1Q  Median      3Q     Max
-47.490  -4.614   0.250   5.293  26.094
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)       3.816e+01  2.050e+00  18.618  < 2e-16 ***
gdpPercap         4.557e-04  2.345e-05  19.435  < 2e-16 ***
log(pop)          6.394e-01  1.329e-01   4.810 1.64e-06 ***
continentAmericas 1.308e+01  6.063e-01  21.579  < 2e-16 ***
continentAsia     7.784e+00  5.810e-01  13.398  < 2e-16 ***
continentEurope   1.695e+01  6.350e-01  26.691  < 2e-16 ***
continentOceania  1.764e+01  1.779e+00   9.916  < 2e-16 ***
---
Signif. codes:  0***0.001**0.01*0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.336 on 1697 degrees of freedom
Multiple R-squared:  0.585, Adjusted R-squared:  0.5835
F-statistic: 398.7 on 6 and 1697 DF,  p-value: < 2.2e-16

Component level

tidy(out)
# A tibble: 7 × 5
  term               estimate std.error statistic   p.value
  <chr>                 <dbl>     <dbl>     <dbl>     <dbl>
1 (Intercept)       38.2      2.05          18.6  1.50e- 70
2 gdpPercap          0.000456 0.0000234     19.4  3.98e- 76
3 log(pop)           0.639    0.133          4.81 1.64e-  6
4 continentAmericas 13.1      0.606         21.6  1.85e- 91
5 continentAsia      7.78     0.581         13.4  5.52e- 39
6 continentEurope   16.9      0.635         26.7  2.43e-131
7 continentOceania  17.6      1.78           9.92 1.43e- 22

Observation level

> summary(out)
Call:
lm(formula = lifeExp ~ gdpPercap + log(pop) + continent, data = gapminder)
Residuals:
    Min      1Q  Median      3Q     Max
-47.490  -4.614   0.250   5.293  26.094
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)       3.816e+01  2.050e+00  18.618  < 2e-16 ***
gdpPercap         4.557e-04  2.345e-05  19.435  < 2e-16 ***
log(pop)          6.394e-01  1.329e-01   4.810 1.64e-06 ***
continentAmericas 1.308e+01  6.063e-01  21.579  < 2e-16 ***
continentAsia     7.784e+00  5.810e-01  13.398  < 2e-16 ***
continentEurope   1.695e+01  6.350e-01  26.691  < 2e-16 ***
continentOceania  1.764e+01  1.779e+00   9.916  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.336 on 1697 degrees of freedom
Multiple R-squared:  0.585, Adjusted R-squared:  0.5835
F-statistic: 398.7 on 6 and 1697 DF,  p-value: < 2.2e-16

Observation level

augment(out)
# A tibble: 1,704 × 10
   lifeExp gdpPercap `log(pop)` continent .fitted .resid    .hat .sigma .cooksd
     <dbl>     <dbl>      <dbl> <fct>       <dbl>  <dbl>   <dbl>  <dbl>   <dbl>
 1    28.8      779.       15.9 Asia         56.5  -27.7 0.00302   8.31 0.00479
 2    30.3      821.       16.0 Asia         56.6  -26.2 0.00299   8.31 0.00426
 3    32.0      853.       16.1 Asia         56.7  -24.7 0.00296   8.32 0.00372
 4    34.0      836.       16.3 Asia         56.7  -22.7 0.00294   8.32 0.00313
 5    36.1      740.       16.4 Asia         56.8  -20.7 0.00294   8.32 0.00259
 6    38.4      786.       16.5 Asia         56.9  -18.4 0.00292   8.33 0.00205
 7    39.9      978.       16.4 Asia         56.9  -17.0 0.00291   8.33 0.00174
 8    40.8      852.       16.4 Asia         56.9  -16.0 0.00292   8.33 0.00155
 9    41.7      649.       16.6 Asia         56.9  -15.2 0.00294   8.33 0.00140
10    41.8      635.       16.9 Asia         57.1  -15.3 0.00297   8.33 0.00144
# ℹ 1,694 more rows
# ℹ 1 more variable: .std.resid <dbl>

Observation level

  • For OLS models:

  • .fitted — The fitted values of the model.

  • .se.fit — The standard errors of the fitted values.

  • .resid — The residuals.

  • .hat — The diagonal of the hat matrix.

  • .sigma — An estimate of the residual standard deviation when the corresponding observation is dropped from the model.

  • .cooksd — Cook’s distance, a common regression diagnostic.

  • .std.resid — The standardized residuals.

Observation level

# Adding the data argument puts back any additional columns from the original
# tibble
out_aug <-  augment(out, data = gapminder)
head(out_aug)
# A tibble: 6 × 12
  country continent  year lifeExp    pop gdpPercap .fitted .resid    .hat .sigma
  <fct>   <fct>     <int>   <dbl>  <int>     <dbl>   <dbl>  <dbl>   <dbl>  <dbl>
1 Afghan… Asia       1952    28.8 8.43e6      779.    56.5  -27.7 0.00302   8.31
2 Afghan… Asia       1957    30.3 9.24e6      821.    56.6  -26.2 0.00299   8.31
3 Afghan… Asia       1962    32.0 1.03e7      853.    56.7  -24.7 0.00296   8.32
4 Afghan… Asia       1967    34.0 1.15e7      836.    56.7  -22.7 0.00294   8.32
5 Afghan… Asia       1972    36.1 1.31e7      740.    56.8  -20.7 0.00294   8.32
6 Afghan… Asia       1977    38.4 1.49e7      786.    56.9  -18.4 0.00292   8.33
# ℹ 2 more variables: .cooksd <dbl>, .std.resid <dbl>
## Residuals vs Fitted Values
p <- ggplot(data = out_aug,
            mapping = aes(x = .fitted, y = .resid))
p_out <- p + geom_point()

(I told you it was misspecified)

Model level

> summary(out)
Call:
lm(formula = lifeExp ~ gdpPercap + log(pop) + continent, data = gapminder)

Residuals:
    Min      1Q  Median      3Q     Max
-47.490  -4.614   0.250   5.293  26.094
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)
(Intercept)       3.816e+01  2.050e+00  18.618  < 2e-16 ***
gdpPercap         4.557e-04  2.345e-05  19.435  < 2e-16 ***
log(pop)          6.394e-01  1.329e-01   4.810 1.64e-06 ***
continentAmericas 1.308e+01  6.063e-01  21.579  < 2e-16 ***
continentAsia     7.784e+00  5.810e-01  13.398  < 2e-16 ***
continentEurope   1.695e+01  6.350e-01  26.691  < 2e-16 ***
continentOceania  1.764e+01  1.779e+00   9.916  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 8.336 on 1697 degrees of freedom
Multiple R-squared:  0.585, Adjusted R-squared:  0.5835
F-statistic: 398.7 on 6 and 1697 DF,  p-value: < 2.2e-16

Model level

glance(out)
# A tibble: 1 × 12
  r.squared adj.r.squared sigma statistic   p.value    df logLik    AIC    BIC
      <dbl>         <dbl> <dbl>     <dbl>     <dbl> <dbl>  <dbl>  <dbl>  <dbl>
1     0.585         0.584  8.34      399. 1.01e-319     6 -6028. 12072. 12115.
# ℹ 3 more variables: deviance <dbl>, df.residual <int>, nobs <int>

The usefulness of glance() becomes clearer when dealing with ensembles of models.

Grouped Analysis

Grouped analysis and list columns

European countries in 1977:

gapminder |>
    filter(continent == "Europe", year == 1977)
# A tibble: 30 × 6
   country                continent  year lifeExp      pop gdpPercap
   <fct>                  <fct>     <int>   <dbl>    <int>     <dbl>
 1 Albania                Europe     1977    68.9  2509048     3533.
 2 Austria                Europe     1977    72.2  7568430    19749.
 3 Belgium                Europe     1977    72.8  9821800    19118.
 4 Bosnia and Herzegovina Europe     1977    69.9  4086000     3528.
 5 Bulgaria               Europe     1977    70.8  8797022     7612.
 6 Croatia                Europe     1977    70.6  4318673    11305.
 7 Czech Republic         Europe     1977    70.7 10161915    14800.
 8 Denmark                Europe     1977    74.7  5088419    20423.
 9 Finland                Europe     1977    72.5  4738902    15605.
10 France                 Europe     1977    73.8 53165019    18293.
# ℹ 20 more rows

Let’s fit a simple model predicting life expectancy from GDP per capita for just these countries.

Grouped analysis and list columns

eu77 <- gapminder |> filter(continent == "Europe", year == 1977)
fit <- lm(lifeExp ~ log(gdpPercap), data = eu77)
summary(fit)

Call:
lm(formula = lifeExp ~ log(gdpPercap), data = eu77)

Residuals:
    Min      1Q  Median      3Q     Max 
-7.4956 -1.0306  0.0935  1.1755  3.7125 

Coefficients:
               Estimate Std. Error t value Pr(>|t|)    
(Intercept)      29.489      7.161   4.118 0.000306 ***
log(gdpPercap)    4.488      0.756   5.936 2.17e-06 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.114 on 28 degrees of freedom
Multiple R-squared:  0.5572,    Adjusted R-squared:  0.5414 
F-statistic: 35.24 on 1 and 28 DF,  p-value: 2.173e-06

What if we want to do this for every continent in every year?

Grouped analysis and list columns

df_nest<- gapminder |>
    group_by(continent, year) |>
    nest()

df_nest
# A tibble: 60 × 3
# Groups:   continent, year [60]
   continent  year data             
   <fct>     <int> <list>           
 1 Asia       1952 <tibble [33 × 4]>
 2 Asia       1957 <tibble [33 × 4]>
 3 Asia       1962 <tibble [33 × 4]>
 4 Asia       1967 <tibble [33 × 4]>
 5 Asia       1972 <tibble [33 × 4]>
 6 Asia       1977 <tibble [33 × 4]>
 7 Asia       1982 <tibble [33 × 4]>
 8 Asia       1987 <tibble [33 × 4]>
 9 Asia       1992 <tibble [33 × 4]>
10 Asia       1997 <tibble [33 × 4]>
# ℹ 50 more rows

Think of nesting as a kind of “super-grouping”. Look in the object inspector.

Grouped analysis and list columns

It’s still in there.

df_nest|> filter(continent == "Europe" & year == 1977) |>
    unnest(cols = c(data))
# A tibble: 30 × 6
# Groups:   continent, year [1]
   continent  year country                lifeExp      pop gdpPercap
   <fct>     <int> <fct>                    <dbl>    <int>     <dbl>
 1 Europe     1977 Albania                   68.9  2509048     3533.
 2 Europe     1977 Austria                   72.2  7568430    19749.
 3 Europe     1977 Belgium                   72.8  9821800    19118.
 4 Europe     1977 Bosnia and Herzegovina    69.9  4086000     3528.
 5 Europe     1977 Bulgaria                  70.8  8797022     7612.
 6 Europe     1977 Croatia                   70.6  4318673    11305.
 7 Europe     1977 Czech Republic            70.7 10161915    14800.
 8 Europe     1977 Denmark                   74.7  5088419    20423.
 9 Europe     1977 Finland                   72.5  4738902    15605.
10 Europe     1977 France                    73.8 53165019    18293.
# ℹ 20 more rows

Grouped analysis and list columns

Here we map() a custom function to every row in the data column.

fit_ols <- function(df) {
    lm(lifeExp ~ log(gdpPercap), data = df)
}

df_nest<- gapminder |>
    group_by(continent, year) |>
    nest() |>
    mutate(model = map(data, fit_ols)) 

Grouped analysis and list columns

df_nest
# A tibble: 60 × 4
# Groups:   continent, year [60]
   continent  year data              model 
   <fct>     <int> <list>            <list>
 1 Asia       1952 <tibble [33 × 4]> <lm>  
 2 Asia       1957 <tibble [33 × 4]> <lm>  
 3 Asia       1962 <tibble [33 × 4]> <lm>  
 4 Asia       1967 <tibble [33 × 4]> <lm>  
 5 Asia       1972 <tibble [33 × 4]> <lm>  
 6 Asia       1977 <tibble [33 × 4]> <lm>  
 7 Asia       1982 <tibble [33 × 4]> <lm>  
 8 Asia       1987 <tibble [33 × 4]> <lm>  
 9 Asia       1992 <tibble [33 × 4]> <lm>  
10 Asia       1997 <tibble [33 × 4]> <lm>  
# ℹ 50 more rows

Grouped analysis and list columns

We can tidy the nested models, too.

fit_ols <- function(df) {
    lm(lifeExp ~ log(gdpPercap), data = df)
}

out_tidy <- gapminder |>
    group_by(continent, year) |>
    nest() |>
    mutate(model = map(data, fit_ols),
           tidied = map(model, tidy)) |>
    unnest(cols = c(tidied)) |>
    filter(term %nin% "(Intercept)" &
           continent %nin% "Oceania")

Grouped analysis and list columns

out_tidy
# A tibble: 48 × 9
# Groups:   continent, year [48]
   continent  year data     model  term     estimate std.error statistic p.value
   <fct>     <int> <list>   <list> <chr>       <dbl>     <dbl>     <dbl>   <dbl>
 1 Asia       1952 <tibble> <lm>   log(gdp…     4.16     1.25       3.33 2.28e-3
 2 Asia       1957 <tibble> <lm>   log(gdp…     4.17     1.28       3.26 2.71e-3
 3 Asia       1962 <tibble> <lm>   log(gdp…     4.59     1.24       3.72 7.94e-4
 4 Asia       1967 <tibble> <lm>   log(gdp…     4.50     1.15       3.90 4.77e-4
 5 Asia       1972 <tibble> <lm>   log(gdp…     4.44     1.01       4.41 1.16e-4
 6 Asia       1977 <tibble> <lm>   log(gdp…     4.87     1.03       4.75 4.42e-5
 7 Asia       1982 <tibble> <lm>   log(gdp…     4.78     0.852      5.61 3.77e-6
 8 Asia       1987 <tibble> <lm>   log(gdp…     5.17     0.727      7.12 5.31e-8
 9 Asia       1992 <tibble> <lm>   log(gdp…     5.09     0.649      7.84 7.60e-9
10 Asia       1997 <tibble> <lm>   log(gdp…     5.11     0.628      8.15 3.35e-9
# ℹ 38 more rows

Grouped analysis and list columns

out_tidy |>
    ungroup() |>
    sample_n(5)
# A tibble: 5 × 9
  continent  year data     model  term      estimate std.error statistic p.value
  <fct>     <int> <list>   <list> <chr>        <dbl>     <dbl>     <dbl>   <dbl>
1 Americas   2007 <tibble> <lm>   log(gdpP…     4.49     0.752      5.98 4.26e-6
2 Africa     1997 <tibble> <lm>   log(gdpP…     6.89     1.04       6.64 2.16e-8
3 Americas   1987 <tibble> <lm>   log(gdpP…     7.10     1.14       6.25 2.22e-6
4 Europe     1992 <tibble> <lm>   log(gdpP…     3.48     0.545      6.39 6.44e-7
5 Americas   1957 <tibble> <lm>   log(gdpP…    10.3      2.40       4.31 2.61e-4

Join us now for the
Crossover Event
of the Year

ATT estimates

library(WeightIt)
library(MatchIt)
data("lalonde", package = "MatchIt")
lalonde
        treat age educ   race married nodegree    re74    re75    re78
NSW1        1  37   11  black       1        1     0.0     0.0  9930.0
NSW2        1  22    9 hispan       0        1     0.0     0.0  3595.9
NSW3        1  30   12  black       0        0     0.0     0.0 24909.5
NSW4        1  27   11  black       0        1     0.0     0.0  7506.1
NSW5        1  33    8  black       0        1     0.0     0.0   289.8
NSW6        1  22    9  black       0        1     0.0     0.0  4056.5
NSW7        1  23   12  black       0        0     0.0     0.0     0.0
NSW8        1  32   11  black       0        1     0.0     0.0  8472.2
NSW9        1  22   16  black       0        0     0.0     0.0  2164.0
NSW10       1  33   12  white       1        0     0.0     0.0 12418.1
NSW11       1  19    9  black       0        1     0.0     0.0  8173.9
NSW12       1  21   13  black       0        0     0.0     0.0 17094.6
NSW13       1  18    8  black       0        1     0.0     0.0     0.0
NSW14       1  27   10  black       1        1     0.0     0.0 18739.9
NSW15       1  17    7  black       0        1     0.0     0.0  3023.9
NSW16       1  19   10  black       0        1     0.0     0.0  3228.5
NSW17       1  27   13  black       0        0     0.0     0.0 14581.9
NSW18       1  23   10  black       0        1     0.0     0.0  7693.4
NSW19       1  40   12  black       0        0     0.0     0.0 10804.3
NSW20       1  26   12  black       0        0     0.0     0.0 10747.4
NSW21       1  23   11  black       0        1     0.0     0.0     0.0
NSW22       1  41   14  white       0        0     0.0     0.0  5149.5
NSW23       1  38    9  white       0        1     0.0     0.0  6408.9
NSW24       1  24   11  black       0        1     0.0     0.0  1991.4
NSW25       1  18   10  black       0        1     0.0     0.0 11163.2
NSW26       1  29   11  black       1        1     0.0     0.0  9643.0
NSW27       1  25   11  black       0        1     0.0     0.0  9897.0
NSW28       1  27   10 hispan       0        1     0.0     0.0 11142.9
NSW29       1  17   10  black       0        1     0.0     0.0 16218.0
NSW30       1  24   11  black       0        1     0.0     0.0   995.7
NSW31       1  17   10  black       0        1     0.0     0.0     0.0
NSW32       1  48    4  black       0        1     0.0     0.0  6551.6
NSW33       1  25   11  black       1        1     0.0     0.0  1574.4
NSW34       1  20   12  black       0        0     0.0     0.0     0.0
NSW35       1  25   12  black       0        0     0.0     0.0  3191.8
NSW36       1  42   14  black       0        0     0.0     0.0 20505.9
NSW37       1  25    5  black       0        1     0.0     0.0  6181.9
NSW38       1  23   12  black       1        0     0.0     0.0  5911.6
NSW39       1  46    8  black       1        1     0.0     0.0  3094.2
NSW40       1  24   10  black       0        1     0.0     0.0     0.0
NSW41       1  21   12  black       0        0     0.0     0.0  1254.6
NSW42       1  19    9  white       0        1     0.0     0.0 13188.8
NSW43       1  17    8  black       0        1     0.0     0.0  8061.5
NSW44       1  18    8 hispan       1        1     0.0     0.0  2788.0
NSW45       1  20   11  black       0        1     0.0     0.0  3972.5
NSW46       1  25   11  black       1        1     0.0     0.0     0.0
NSW47       1  17    8  black       0        1     0.0     0.0     0.0
NSW48       1  17    9  black       0        1     0.0     0.0     0.0
NSW49       1  25    5  black       0        1     0.0     0.0 12187.4
NSW50       1  23   12  black       0        0     0.0     0.0  4843.2
NSW51       1  28    8  black       0        1     0.0     0.0     0.0
NSW52       1  31   11  black       1        1     0.0     0.0  8087.5
NSW53       1  18   11  black       0        1     0.0     0.0     0.0
NSW54       1  25   12  black       0        0     0.0     0.0  2349.0
NSW55       1  30   11  black       1        1     0.0     0.0   590.8
NSW56       1  17   10  black       0        1     0.0     0.0     0.0
NSW57       1  37    9  black       0        1     0.0     0.0  1067.5
NSW58       1  41    4  black       1        1     0.0     0.0  7285.0
NSW59       1  42   14  black       1        0     0.0     0.0 13167.5
NSW60       1  22   11  white       0        1     0.0     0.0  1048.4
NSW61       1  17    8  black       0        1     0.0     0.0     0.0
NSW62       1  29    8  black       0        1     0.0     0.0  1923.9
NSW63       1  35   10  black       0        1     0.0     0.0  4666.2
NSW64       1  27   11  black       0        1     0.0     0.0   549.3
NSW65       1  29    4  black       0        1     0.0     0.0   762.9
NSW66       1  28    9  black       0        1     0.0     0.0 10694.3
NSW67       1  27   11  black       0        1     0.0     0.0     0.0
NSW68       1  23    7  white       0        1     0.0     0.0     0.0
NSW69       1  45    5  black       1        1     0.0     0.0  8546.7
NSW70       1  29   13  black       0        0     0.0     0.0  7479.7
NSW71       1  27    9  black       0        1     0.0     0.0     0.0
NSW72       1  46   13  black       0        0     0.0     0.0   647.2
NSW73       1  18    6  black       0        1     0.0     0.0     0.0
NSW74       1  25   12  black       0        0     0.0     0.0 11965.8
NSW75       1  28   15  black       0        0     0.0     0.0  9598.5
NSW76       1  25   11  white       0        1     0.0     0.0 18783.3
NSW77       1  22   12  black       0        0     0.0     0.0 18678.1
NSW78       1  21    9  black       0        1     0.0     0.0     0.0
NSW79       1  40   11  black       0        1     0.0     0.0 23005.6
NSW80       1  22   11  black       0        1     0.0     0.0  6456.7
NSW81       1  25   12  black       0        0     0.0     0.0     0.0
NSW82       1  18   12  black       0        0     0.0     0.0  2321.1
NSW83       1  38   12  white       0        0     0.0     0.0  4941.8
NSW84       1  27   13  black       0        0     0.0     0.0     0.0
NSW85       1  27    8  black       0        1     0.0     0.0     0.0
NSW86       1  38   11  black       0        1     0.0     0.0     0.0
NSW87       1  23    8 hispan       0        1     0.0     0.0  3881.3
NSW88       1  26   11  black       0        1     0.0     0.0 17231.0
NSW89       1  21   12  white       0        0     0.0     0.0  8048.6
NSW90       1  25    8  black       0        1     0.0     0.0     0.0
NSW91       1  31   11  black       1        1     0.0     0.0 14509.9
NSW92       1  17   10  black       0        1     0.0     0.0     0.0
NSW93       1  25   11  black       0        1     0.0     0.0     0.0
NSW94       1  21   12  black       0        0     0.0     0.0  9983.8
NSW95       1  44   11  black       0        1     0.0     0.0     0.0
NSW96       1  25   12  white       0        0     0.0     0.0  5587.5
NSW97       1  18    9  black       0        1     0.0     0.0  4482.8
NSW98       1  42   12  black       0        0     0.0     0.0  2456.2
NSW99       1  25   10  black       0        1     0.0     0.0     0.0
NSW100      1  31    9 hispan       0        1     0.0     0.0 26817.6
NSW101      1  24   10  black       0        1     0.0     0.0     0.0
NSW102      1  26   10  black       0        1     0.0     0.0  9265.8
NSW103      1  25   11  black       0        1     0.0     0.0   485.2
NSW104      1  18   11  black       0        1     0.0     0.0  4814.6
NSW105      1  19   11  black       0        1     0.0     0.0  7458.1
NSW106      1  43    9  black       0        1     0.0     0.0     0.0
NSW107      1  27   13  black       0        0     0.0     0.0 34099.3
NSW108      1  17    9  black       0        1     0.0     0.0  1953.3
NSW109      1  30   11  black       0        1     0.0     0.0     0.0
NSW110      1  26   10  black       1        1  2028.0     0.0     0.0
NSW111      1  20    9  black       0        1  6084.0     0.0  8881.7
NSW112      1  17    9 hispan       0        1   445.2    74.3  6210.7
NSW113      1  20   12  black       0        0   989.3   165.2     0.0
NSW114      1  18   11  black       0        1   858.3   214.6   929.9
NSW115      1  27   12  black       1        0  3670.9   334.0     0.0
NSW116      1  21   12  white       0        0  3670.9   334.0 12558.0
NSW117      1  27   12  black       0        0  2143.4   357.9 22163.2
NSW118      1  20   12  black       0        0     0.0   377.6  1652.6
NSW119      1  19   10  black       0        1     0.0   385.3  8124.7
NSW120      1  23   12  black       0        0  5506.3   501.1   671.3
NSW121      1  29   14  black       0        0     0.0   679.7 17815.0
NSW122      1  18   10  black       0        1     0.0   798.9  9737.2
NSW123      1  19    9  black       0        1     0.0   798.9 17685.2
NSW124      1  27   13  white       1        0  9381.6   853.7     0.0
NSW125      1  18   11  white       0        1  3678.2   919.6  4321.7
NSW126      1  27    9  black       1        1     0.0   934.4  1773.4
NSW127      1  22   12  black       0        0  5605.9   936.2     0.0
NSW128      1  23   10  black       1        1     0.0   936.4 11233.3
NSW129      1  23   12 hispan       0        0  9385.7  1117.4   559.4
NSW130      1  20   11  black       0        1  3637.5  1220.8  1085.4
NSW131      1  17    9  black       0        1  1716.5  1253.4  5445.2
NSW132      1  28   11  black       0        1     0.0  1284.1 60307.9
NSW133      1  26   11  black       1        1     0.0  1392.9  1460.4
NSW134      1  20   11  black       0        1 16318.6  1485.0  6943.3
NSW135      1  24   11  black       1        1   824.4  1666.1  4032.7
NSW136      1  31    9  black       0        1     0.0  1698.6 10363.3
NSW137      1  23    8  white       1        1     0.0  1713.2  4232.3
NSW138      1  18   10  black       0        1  2143.4  1784.3 11141.4
NSW139      1  29   12  black       0        0 10881.9  1817.3     0.0
NSW140      1  26   11  white       0        1     0.0  2226.3 13385.9
NSW141      1  24    9  black       0        1  9154.7  2288.7  4849.6
NSW142      1  25   12  black       0        0 14426.8  2409.3     0.0
NSW143      1  24   10  black       0        1  4250.4  2421.9  1660.5
NSW144      1  46    8  black       0        1  3165.7  2594.7     0.0
NSW145      1  31   12  white       0        0     0.0  2611.2  2484.5
NSW146      1  19   11  black       0        1  2305.0  2615.3  4146.6
NSW147      1  19    8  black       0        1     0.0  2657.1  9970.7
NSW148      1  27   11  black       0        1  2206.9  2666.3     0.0
NSW149      1  26   11  black       1        1     0.0  2754.6 26372.3
NSW150      1  20   10  black       0        1  5005.7  2777.4  5615.2
NSW151      1  28   10  black       0        1     0.0  2836.5  3196.6
NSW152      1  24   12  black       0        0 13765.8  2842.8  6167.7
NSW153      1  19    8  black       0        1  2636.4  2937.3  7535.9
NSW154      1  23   12  black       0        0  6269.3  3040.0  8484.2
NSW155      1  42    9  black       1        1     0.0  3058.5  1294.4
NSW156      1  25   13  black       0        0 12362.9  3090.7     0.0
NSW157      1  18    9  black       0        1     0.0  3287.4  5010.3
NSW158      1  21   12  black       0        0  6473.7  3332.4  9371.0
NSW159      1  27   10  black       0        1  1001.1  3550.1     0.0
NSW160      1  21    8  black       0        1   989.3  3695.9  4279.6
NSW161      1  22    9  black       0        1  2192.9  3837.0  3462.6
NSW162      1  31    4  black       0        1  8517.6  4023.2  7382.5
NSW163      1  24   10  black       1        1 11703.2  4078.2     0.0
NSW164      1  29   10  black       0        1     0.0  4398.9     0.0
NSW165      1  29   12  black       0        0  9748.4  4878.9 10976.5
NSW166      1  19   10  white       0        1     0.0  5324.1 13829.6
NSW167      1  19   11 hispan       1        1  5424.5  5463.8  6788.5
NSW168      1  31    9  black       0        1 10717.0  5517.8  9558.5
NSW169      1  22   10  black       1        1  1468.3  5588.7 13228.3
NSW170      1  21    9  black       0        1  6416.5  5749.3   743.7
NSW171      1  17   10  black       0        1  1291.5  5793.9  5522.8
NSW172      1  26   12  black       1        0  8408.8  5794.8  1424.9
NSW173      1  20    9 hispan       0        1 12260.8  5875.0  1358.6
NSW174      1  19   10  black       0        1  4121.9  6056.8     0.0
NSW175      1  26   10  black       0        1 25929.7  6789.0   672.9
NSW176      1  28   11  black       0        1  1929.0  6871.9     0.0
NSW177      1  22   12 hispan       1        0   492.2  7055.7 10092.8
NSW178      1  33   11  black       0        1     0.0  7867.9  6281.4
NSW179      1  22   12  white       0        0  6760.0  8455.5 12590.7
NSW180      1  29   10 hispan       0        1     0.0  8853.7  5112.0
NSW181      1  33   12  black       1        0 20280.0 10941.4 15952.6
NSW182      1  25   14  black       1        0 35040.1 11536.6 36646.9
NSW183      1  35    9  black       1        1 13602.4 13830.6 12804.0
NSW184      1  35    8  black       1        1 13732.1 17976.2  3786.6
NSW185      1  33   11  black       1        1 14660.7 25142.2  4181.9
PSID1       0  30   12  white       1        0 20166.7 18347.2 25564.7
PSID2       0  26   12  white       1        0 25862.3 17806.5 25564.7
PSID3       0  25   16  white       1        0 25862.3 15316.2 25564.7
PSID4       0  42   11  white       1        1 21787.0 14265.3 15491.0
PSID5       0  25    9  black       1        1 14829.7 13776.5     0.0
PSID6       0  37    9  black       1        1 13685.5 12756.0 17833.2
PSID7       0  32   12  white       1        0 19067.6 12625.4 14146.3
PSID8       0  20   12  black       0        0  7392.3 12396.2 17765.2
PSID9       0  38    9 hispan       1        1 16826.2 12029.2     0.0
PSID10      0  39   10  white       1        1 16767.4 12022.0  4433.2
PSID11      0  41    5  white       1        1 10785.8 11991.6 19451.3
PSID12      0  31   14  white       1        0 17831.3 11563.7 22095.0
PSID13      0  34    8  white       1        1  8038.9 11404.4  5486.8
PSID14      0  29   12  white       1        0 14769.0 11146.5  6420.7
PSID15      0  22   14  black       1        0   748.4 11105.4 18208.5
PSID16      0  42    0 hispan       1        1  2797.8 10929.9  9922.9
PSID17      0  25    9 hispan       0        1  5460.5 10589.8  7539.4
PSID18      0  28    9  white       1        1 11091.4 10357.0 15406.8
PSID19      0  40   13  white       1        0  3577.6 10301.5 11912.0
PSID20      0  35    9  white       1        1 11475.4  9397.4 11087.4
PSID21      0  27   10 hispan       1        1 15711.4  9098.4 17023.4
PSID22      0  27    6 hispan       1        1  7831.2  9071.6  5661.2
PSID23      0  36   12  white       1        0 25535.1  8695.6 21905.8
PSID24      0  47    8  black       1        1  9275.2  8543.4     0.0
PSID25      0  40   11  white       1        1 20666.3  8502.2 25564.7
PSID26      0  27    7  white       1        1  3064.3  8461.1 11149.5
PSID27      0  36    9  black       1        1 13256.4  8457.5     0.0
PSID28      0  39    6 hispan       1        1 13279.9  8441.4 25048.9
PSID29      0  21    9  white       1        1 11156.1  8441.4  1213.2
PSID30      0  29   12  white       1        0 11199.2  8081.5     0.0
PSID31      0  22   13 hispan       0        0  6404.8  7882.8  9453.0
PSID32      0  25   10  white       1        1 13634.5  7793.3 11688.8
PSID33      0  27   12  white       1        0 12270.9  7709.1  7806.8
PSID34      0  45    8  white       1        1 22416.0  7635.7 15931.4
PSID35      0  26   12  white       1        0  2345.2  7565.9  2838.7
PSID36      0  27   12  white       1        0  9788.5  7496.1 14038.4
PSID37      0  33    8  white       1        1 12312.0  7474.6 25514.4
PSID38      0  25   12  white       1        0 11381.4  7467.4  4162.8
PSID39      0  49    8  white       1        1  6459.7  7431.6  7503.9
PSID40      0  40    3 hispan       1        1  7576.5  7426.3 12104.1
PSID41      0  22   12  black       1        0  9729.7  7372.5  2231.4
PSID42      0  25    5  white       1        1  7891.9  7293.8 14617.7
PSID43      0  25   12  white       1        0 11516.6  7263.3 19588.7
PSID44      0  21   12  white       1        0 13601.2  7202.5 10746.0
PSID45      0  33    9 hispan       1        1 11959.4  7087.9 25564.7
PSID46      0  20   12  black       1        0  9555.3  7055.7     0.0
PSID47      0  19   11  white       1        1  4306.5  6978.7   837.9
PSID48      0  25   12  black       1        0   295.8  6942.9   461.1
PSID49      0  29   12  white       1        0 15303.8  6932.1 24290.9
PSID50      0  20   12  white       1        0  3558.0  6797.9  6680.8
PSID51      0  29    6 hispan       1        1  8542.4  6701.2  7196.5
PSID52      0  25   13  white       1        0 19259.6  6652.8 13015.8
PSID53      0  41   15  white       1        0 25862.3  6563.3 24647.0
PSID54      0  39   10  white       1        1 22745.1  6493.5 25564.7
PSID55      0  33   12  white       1        0 10819.1  6370.0  2936.2
PSID56      0  29    8  white       1        1  9169.4  6352.1 20575.9
PSID57      0  21   11  white       1        1 10680.0  6276.9 10923.4
PSID58      0  31   12  white       1        0 23652.3  6228.5 22403.8
PSID59      0  36   12  black       1        0 11040.5  6221.4  7215.7
PSID60      0  25    7  white       1        1  5597.6  6099.6   122.7
PSID61      0  35    7  white       1        1 10715.2  6087.1 15177.7
PSID62      0  22    9  white       1        1  5683.8  6038.8  4742.0
PSID63      0  31    2 hispan       1        1  3262.2  5965.4  9732.3
PSID64      0  40   15  white       1        0 10907.2  5922.4  6239.0
PSID65      0  47    3  white       1        1  9047.9  5911.6  6145.9
PSID66      0  26    8 hispan       0        1  3168.1  5872.3 11136.1
PSID67      0  42    7  white       1        1 10971.9  5806.0  9241.7
PSID68      0  53   12  white       0        0 17104.4  5775.6 19965.6
PSID69      0  30   17  black       0        0 17827.4  5546.4 14421.1
PSID70      0  28   10  white       1        1 10415.5  5544.6 10289.4
PSID71      0  46   11  white       1        1 14753.3  5299.4     0.0
PSID72      0  28   12  white       0        0  8256.4  5279.7 21602.9
PSID73      0  27   12 hispan       1        0 17604.0  5222.4 25564.7
PSID74      0  25   10  white       1        1  4335.9  5181.2 12418.8
PSID75      0  38    8  white       1        1 11242.3  5174.0     0.0
PSID76      0  26   12 hispan       0        0  7968.3  5109.6  4182.0
PSID77      0  54   12  white       0        0  7165.0  5012.9     0.0
PSID78      0  38    8 hispan       1        1 22606.0  4978.9  8720.1
PSID79      0  23   17  white       0        0     0.0  4876.8 16747.1
PSID80      0  23    8  white       1        1  3595.3  4866.1  2782.6
PSID81      0  23   12  white       1        0 11691.0  4764.0 14065.0
PSID82      0  25   12 hispan       1        0  8746.2  4762.3   379.8
PSID83      0  25   15  white       1        0  7386.4  4739.0 12705.5
PSID84      0  37   11 hispan       0        1   615.2  4713.9     0.0
PSID85      0  40   12  white       1        0 18389.7  4688.9 21857.0
PSID86      0  19   10  white       0        1  5777.9  4672.7   136.0
PSID87      0  48    7  white       1        1 13326.9  4636.9     0.0
PSID88      0  19   12  white       0        0  8530.6  4620.8     0.0
PSID89      0  16    9  white       0        1  2539.2  4579.6     0.0
PSID90      0  29   10  white       1        1   713.2  4542.0  7781.7
PSID91      0  30   16  white       0        0  3093.7  4468.6 15538.3
PSID92      0  22   11  white       1        1  8761.8  4463.3 10642.6
PSID93      0  22   10  white       0        1 17269.0  4400.6  2453.0
PSID94      0  47   10  black       1        1 13311.3  4397.0 19330.1
PSID95      0  25   12 hispan       1        0  2266.9  4361.2  3020.5
PSID96      0  47   10  black       0        1 21918.3  4323.6 19438.0
PSID97      0  24   12  black       1        0  8573.8  4293.2     0.0
PSID98      0  20   12  black       1        0  2648.9  4273.5     0.0
PSID99      0  28   12  black       0        0 16722.3  4253.8  7314.7
PSID100     0  47   11  white       0        1  8060.4  4232.3  3358.9
PSID101     0  50    0  white       1        1 10162.7  4218.0   220.2
PSID102     0  18   12  white       0        0  2217.9  4191.1  8958.0
PSID103     0  21   12  white       0        0  9665.1  4110.6  1687.6
PSID104     0  47   11  white       1        1 23924.6  4096.3 17358.8
PSID105     0  21   12  white       0        0  2827.2  4056.9  5937.5
PSID106     0  34   11  white       1        1     0.0  4010.3 18133.2
PSID107     0  19   12  white       1        0  5817.1  3919.0  1066.9
PSID108     0  44   13  white       1        0  8033.0  3881.4  3104.7
PSID109     0  21   15  white       1        0  6951.5  3879.6     0.0
PSID110     0  20   12  black       0        0  5100.0  3842.0 12718.8
PSID111     0  51   11  white       0        1    49.0  3813.4  1525.0
PSID112     0  28   13  white       0        0  5260.6  3790.1  9253.5
PSID113     0  24   15  white       0        0 12747.0  3743.6     0.0
PSID114     0  28    8 hispan       1        1  8305.3  3718.5     0.0
PSID115     0  20   11  white       1        1  5822.9  3532.3 11075.6
PSID116     0  29   12  white       1        0 14288.9  3503.7  8133.4
PSID117     0  23   12  white       1        0 14347.7  3482.2  3818.4
PSID118     0  20   11  black       0        1     0.0  3480.4  5495.7
PSID119     0  42    7  white       1        1  4324.1  3457.1  9856.4
PSID120     0  43   12  white       1        0 14328.1  3453.5 18781.9
PSID121     0  27   13  white       0        0 16406.9  3426.7  5344.9
PSID122     0  27    4 hispan       1        1   627.0  3410.6  3367.7
PSID123     0  25   12  white       1        0 21469.7  3405.2  7981.2
PSID124     0  18   12  white       0        0  4729.7  3328.2 12602.0
PSID125     0  31   16  white       1        0 25862.3  3254.8 25564.7
PSID126     0  27   12  white       1        0  4043.9  3231.5  7240.9
PSID127     0  18   11  white       0        1     0.0  3226.2 15814.6
PSID128     0  24    7  white       1        1  7860.6  3213.6     0.0
PSID129     0  23   12  white       1        0  7856.7  3213.6  5535.6
PSID130     0  50   12  white       1        0 19929.7  3190.4 18597.2
PSID131     0  19   12  white       0        0    99.9  3172.5 15436.3
PSID132     0  23   10  white       1        1 15811.3  3145.6  6398.6
PSID133     0  51   12  white       1        0 21001.4  3140.2 16015.6
PSID134     0  19   11  black       0        1  5607.4  3054.3    94.6
PSID135     0  20   10  white       1        1  3099.6  2970.1 21141.8
PSID136     0  20   11 hispan       0        1  2868.4  2968.4  7403.4
PSID137     0  21   12  white       0        0  8129.0  2939.7     0.0
PSID138     0  39   10  white       1        1     0.0  2886.0 18761.2
PSID139     0  36    5  white       0        1  3814.7  2873.5  2751.5
PSID140     0  19    9  black       0        1  1079.6  2873.5 14344.3
PSID141     0  42    6 hispan       1        1  2425.6  2832.3  1907.7
PSID142     0  20    7  white       0        1  1902.4  2792.9  6098.6
PSID143     0  23   12  white       1        0  4955.0  2771.4     0.0
PSID144     0  35   12  white       1        0  1469.5  2719.5     0.0
PSID145     0  18   12  white       0        0   881.7  2696.2 12120.3
PSID146     0  43    8  white       1        1 18338.7  2674.7  6395.6
PSID147     0  37   14  white       1        0 18501.4  2638.9 13429.6
PSID148     0  24   10  white       1        1  4719.9  2565.5  2173.7
PSID149     0  51   12  white       0        0 20742.8  2538.7  1019.6
PSID150     0  22   11 hispan       0        1  7341.4  2535.1 14187.6
PSID151     0  19   12  white       0        0   337.0  2519.0  7118.2
PSID152     0  52    0 hispan       1        1   773.9  2506.5     0.0
PSID153     0  21   12  white       0        0  2903.6  2456.3  4787.8
PSID154     0  24   12  white       0        0  9784.6  2413.4     0.0
PSID155     0  35    8  white       1        1  2241.4  2399.0  9460.4
PSID156     0  20   13  white       0        0     0.0  2352.5     0.0
PSID157     0  17    7  black       0        1  1054.1  2286.2  1613.7
PSID158     0  18   10  black       0        1   311.5  2284.5  8154.1
PSID159     0  28   12  black       0        0  6285.3  2255.8  7310.3
PSID160     0  25   14 hispan       1        0  1622.3  2239.7  1893.0
PSID161     0  40   12 hispan       0        0 13616.9  2229.0   876.3
PSID162     0  50    3  white       1        1  3136.8  2203.9 13976.3
PSID163     0  48    8  white       1        1 16050.3  2116.2 11600.1
PSID164     0  17    7 hispan       0        1     0.0  2082.1  6460.6
PSID165     0  30   12  white       1        0  7347.3  2080.4 14475.8
PSID166     0  30    7  white       1        1   574.1  2010.5   366.5
PSID167     0  22   11  white       1        1  3031.0  1976.5     0.0
PSID168     0  27   12  white       1        0 11493.1  1906.7 13419.2
PSID169     0  25    9  white       1        1 23378.0  1901.3  1898.9
PSID170     0  21   14  white       0        0    80.3  1890.6  6389.7
PSID171     0  17   10  white       0        1     0.0  1888.8 19993.6
PSID172     0  39    7  white       0        1  7786.1  1844.0  9206.2
PSID173     0  18    9  black       0        1  1183.4  1822.5   803.9
PSID174     0  25   12  white       1        0  2721.4  1754.5  1037.4
PSID175     0  20    8  white       1        1  2360.9  1742.0     0.0
PSID176     0  19   13  white       0        0  2366.8  1709.8     0.0
PSID177     0  19   11  white       0        1     0.0  1693.6  9853.5
PSID178     0  22   12  white       0        0 10137.2  1679.3 25564.7
PSID179     0  18   11  black       0        1  2069.0  1623.8 20243.4
PSID180     0  21   10  white       0        1  1767.3  1555.8  7675.3
PSID181     0  24   12  white       1        0  7643.1  1546.8  3262.8
PSID182     0  18   11  white       0        1  1273.5  1532.5 12489.8
PSID183     0  17   10  white       0        1   568.2  1525.4  6231.6
PSID184     0  17   10  white       0        1     0.0  1503.9  7843.8
PSID185     0  18   10  white       0        1     0.0  1491.3   237.9
PSID186     0  53   10 hispan       0        1  7878.2  1489.5 13171.0
PSID187     0  18   11  black       0        1  1191.2  1478.8  3684.0
PSID188     0  17   10 hispan       0        1     0.0  1453.7  6918.7
PSID189     0  26   12  black       0        0     0.0  1448.4     0.0
PSID190     0  39    5  white       1        1 13082.0  1434.0 18323.8
PSID191     0  18   12  black       0        0  1579.2  1409.0  3057.4
PSID192     0  23   13  white       0        0   601.5  1394.7  4975.5
PSID193     0  18    8  white       0        1  5023.6  1391.1  6756.2
PSID194     0  28   10  white       1        1  7578.4  1383.9  2404.3
PSID195     0  32    4  white       1        1     0.0  1378.5     0.0
PSID196     0  18   11  black       0        1     0.0  1367.8    34.0
PSID197     0  40   10  white       1        1  1543.9  1342.7     0.0
PSID198     0  21   14  white       0        0  8456.2  1330.2 16967.3
PSID199     0  29   10 hispan       0        1  3732.4  1323.0  6694.1
PSID200     0  31    6  white       0        1  2666.6  1321.3     0.0
PSID201     0  46    7  white       1        1 19171.4  1317.7     0.0
PSID202     0  20    9 hispan       1        1     0.0  1283.7     0.0
PSID203     0  36   18  white       1        0  3273.9  1269.3 18227.8
PSID204     0  45   12  white       1        0 16559.7  1265.8  7987.1
PSID205     0  16   10  white       0        1  1026.7  1224.6  6847.8
PSID206     0  18   12  white       0        0   819.0  1208.5  2232.8
PSID207     0  40   12 hispan       0        0 11867.3  1195.9  3873.1
PSID208     0  16    9  white       0        1     0.0  1188.8  2451.5
PSID209     0  16   10  white       0        1   574.1  1181.6  5578.4
PSID210     0  28    5 hispan       1        1 10968.0  1178.0   239.4
PSID211     0  20   12  white       0        0     0.0  1147.6 15554.5
PSID212     0  19    8  white       1        1    39.2  1136.9  5327.2
PSID213     0  16    8  white       0        1     0.0  1113.6   542.3
PSID214     0  20   11  white       1        1  2547.0  1099.3     0.0
PSID215     0  35   10  white       1        1  4964.8  1086.7  1745.2
PSID216     0  32    6 hispan       1        1   979.6  1036.6     0.0
PSID217     0  32   16  black       0        0 17135.8  1031.2     0.0
PSID218     0  17    9  black       0        1     0.0   981.1  8900.3
PSID219     0  16    7  white       0        1     0.0   975.7  4728.7
PSID220     0  32   15  white       0        0   489.8   968.6  7684.2
PSID221     0  19   12  white       0        0   815.1   965.0 12059.7
PSID222     0  40   12  white       1        0 16851.7   961.4 17717.9
PSID223     0  50    7  white       1        1 11473.5   956.0     0.0
PSID224     0  39   11  white       0        1     0.0   931.0     0.0
PSID225     0  18    8 hispan       0        1     0.0   902.3  1306.3
PSID226     0  39   10  black       0        1   844.4   889.8   701.9
PSID227     0  17   11 hispan       0        1     0.0   873.7  7759.5
PSID228     0  17    5  black       0        1    96.0   868.3     0.0
PSID229     0  19   12  white       0        0  2425.6   861.1  2587.5
PSID230     0  27   15  white       0        0     0.0   857.6  3392.9
PSID231     0  18   11  black       0        1   587.8   841.5  7933.9
PSID232     0  20   14  white       1        0     0.0   805.6  1454.1
PSID233     0  20   12  white       1        0 12145.5   791.3 13683.8
PSID234     0  19   13  black       0        0  1714.4   786.0  9067.3
PSID235     0  24    8  white       1        1   213.6   760.9  2340.7
PSID236     0  27   12  white       1        0  4222.2   751.9     0.0
PSID237     0  19    9  white       0        1   773.9   676.7  5647.9
PSID238     0  52    8  black       1        1  5454.6   666.0     0.0
PSID239     0  18   11 hispan       0        1     0.0   630.2     0.0
PSID240     0  16   10 hispan       0        1     0.0   630.2  3892.3
PSID241     0  18   12 hispan       0        0     0.0   630.2  4844.0
PSID242     0  45   12  white       0        0  4473.0   608.7     0.0
PSID243     0  21   14  white       0        0  9708.2   594.4  2256.5
PSID244     0  36    8  white       1        1  2715.5   585.4     0.0
PSID245     0  21   13  white       0        0   513.3   578.3     0.0
PSID246     0  41    7  white       1        1 19573.1   565.7     0.0
PSID247     0  18    7  white       0        1   491.8   558.6   642.8
PSID248     0  39    9  white       0        1 11230.5   537.1  5752.8
PSID249     0  19    3  white       1        1     0.0   537.1     0.0
PSID250     0  32   13  white       1        0 12553.0   524.6 15353.6
PSID251     0  16    9  white       0        1     0.0   485.2  4112.5
PSID252     0  16    7  white       0        1   658.3   479.8  6210.9
PSID253     0  21    9  black       0        1  1030.6   470.9  1223.6
PSID254     0  22   12  white       1        0 12096.5   469.1 14289.6
PSID255     0  23   11 hispan       1        1  8946.0   469.1  4776.0
PSID256     0  17    8  black       0        1     0.0   451.2     0.0
PSID257     0  21    8  white       1        1  5699.5   388.5  8844.2
PSID258     0  18   10  white       0        1     0.0   386.7     0.0
PSID259     0  24   12  white       1        0  9051.8   327.6  8547.2
PSID260     0  24   12  black       1        0  4232.0   320.5  1273.8
PSID261     0  16    9  white       0        1     0.0   320.5  3707.6
PSID262     0  20    8  white       1        1   621.1   306.1  5551.8
PSID263     0  42    8  white       0        1 17925.3   300.8 14116.7
PSID264     0  17    8 hispan       0        1   391.9   300.8 18891.3
PSID265     0  19    8 hispan       0        1   368.3   300.8 18510.0
PSID266     0  17    9  black       0        1     0.0   297.2    54.7
PSID267     0  21   14  white       0        0   107.8   293.6  7699.0
PSID268     0  16    9  black       0        1     0.0   277.5  3984.0
PSID269     0  23   13  black       0        0   172.4   272.1   582.2
PSID270     0  16    9  white       0        1   411.4   254.2  1726.0
PSID271     0  17   11 hispan       0        1   803.3   248.9  5173.5
PSID272     0  46    7  white       0        1  1081.5   245.3     0.0
PSID273     0  32   10  white       1        1  4145.8   238.1  8245.7
PSID274     0  18   11  white       0        1   131.3   218.4  7503.9
PSID275     0  23   12 hispan       1        0     0.0   216.6     0.0
PSID276     0  18   10  white       1        1     0.0   211.3 14053.2
PSID277     0  19   10  black       0        1  1056.0   205.9     0.0
PSID278     0  16    7  black       0        1   133.2   205.9  6145.9
PSID279     0  26    7  white       1        1  1538.0   189.8   650.2
PSID280     0  16   10  white       0        1     0.0   189.8  2136.8
PSID281     0  17   10  white       0        1     0.0   182.6  6423.7
PSID282     0  17   10  white       0        1     0.0   171.9  1483.6
PSID283     0  23    8  white       1        1    33.3   166.5     0.0
PSID284     0  29   12  white       1        0 14641.6   162.9  9473.7
PSID285     0  17   10  white       0        1     0.0   152.2 10301.2
PSID286     0  49    8  white       1        1 14684.7   136.1 14963.5
PSID287     0  20   10  white       1        1  6563.5   134.3 15363.9
PSID288     0  40   16  white       1        0     0.0   114.6     0.0
PSID289     0  19   10  white       0        1  1933.8   112.8   675.3
PSID290     0  18   11  white       0        1  1481.2    57.3  1421.6
PSID291     0  16    6  black       0        1     0.0    44.8     0.0
PSID292     0  22    8  white       1        1   105.8    43.0   209.8
PSID293     0  31   12  black       1        0     0.0    43.0 11023.8
PSID294     0  20   11  white       1        1  4478.9    39.4  6280.3
PSID295     0  17   11 hispan       0        1   601.5    10.7  1913.7
PSID296     0  50   12  white       1        0 25862.3     0.0 25564.7
PSID297     0  49   14  white       1        0 25862.3     0.0 25564.7
PSID298     0  47    9  white       1        1 25862.3     0.0 25564.7
PSID299     0  34   11 hispan       1        1 22198.5     0.0     0.0
PSID300     0  22    8  black       1        1 16961.4     0.0   959.0
PSID301     0  27   12  white       1        0 15509.6     0.0 12593.2
PSID302     0  30   10  white       1        1 14913.9     0.0 11563.2
PSID303     0  52   12  white       1        0 14780.7     0.0 25564.7
PSID304     0  43   12  white       1        0 13321.0     0.0 16860.9
PSID305     0  27    9 hispan       1        1 12829.3     0.0     0.0
PSID306     0  35   13  white       0        0  9537.7     0.0 11269.1
PSID307     0  45   12  white       1        0  9277.1     0.0 12108.5
PSID308     0  22   11  black       1        1  9049.9     0.0  9088.0
PSID309     0  22   12  white       1        0  9022.4     0.0  3342.6
PSID310     0  23   11  white       1        1  8910.7     0.0  4183.4
PSID311     0  55    7  white       1        1  8832.4     0.0     0.0
PSID312     0  26   14  white       0        0  8411.1     0.0     0.0
PSID313     0  34   12  white       0        0  8125.1     0.0  6032.1
PSID314     0  22   11  white       0        1  8013.4     0.0  5748.4
PSID315     0  31   12  white       0        0  6156.0     0.0  4094.8
PSID316     0  19   12  white       0        0  5797.5     0.0  2160.4
PSID317     0  24   10  white       1        1  5523.2     0.0  5040.5
PSID318     0  36   12  white       1        0  5374.3     0.0     0.0
PSID319     0  20    9  white       1        1  5229.3     0.0 15893.0
PSID320     0  23    8  white       1        1  4610.2     0.0     0.0
PSID321     0  35   11  white       1        1  3975.4     0.0 21963.5
PSID322     0  23   12  white       0        0  3893.1     0.0 16324.5
PSID323     0  29   10  white       0        1  3752.0     0.0   251.2
PSID324     0  24    9  white       1        1  3438.5     0.0   818.7
PSID325     0  18   10  white       0        1  3360.1     0.0     0.0
PSID326     0  45    8  black       0        1  3299.4     0.0    31.0
PSID327     0  21   13 hispan       0        0  3015.3     0.0 17627.8
PSID328     0  29   13  white       1        0  2780.2     0.0 14339.9
PSID329     0  21   15  white       0        0  2629.3     0.0  1717.1
PSID330     0  22   16  black       0        0  2564.7     0.0   116.7
PSID331     0  24   12  black       1        0  2355.0     0.0  2448.6
PSID332     0  20   14  white       0        0  2210.1     0.0  2813.6
PSID333     0  19    6  black       0        1  1955.3     0.0 14998.9
PSID334     0  19    9 hispan       0        1  1822.1     0.0  3372.2
PSID335     0  19   12  black       0        0  1681.1     0.0     0.0
PSID336     0  20   13  white       0        0  1657.5     0.0   913.2
PSID337     0  19   12  black       0        0  1655.6     0.0     0.0
PSID338     0  26    5  white       1        1  1573.3     0.0  3700.2
PSID339     0  26    9 hispan       0        1  1563.5     0.0  2862.4
PSID340     0  23   12  white       0        0  1504.7     0.0     0.0
PSID341     0  20    9 hispan       0        1  1500.8     0.0 12618.3
PSID342     0  20   10  white       0        1  1412.6     0.0  6290.7
PSID343     0  36   11  white       1        1  1404.8     0.0     0.0
PSID344     0  39   12  white       1        0  1289.2     0.0  1202.9
PSID345     0  17    9  black       0        1  1222.6     0.0   422.6
PSID346     0  55    3  white       0        1  1208.9     0.0     0.0
PSID347     0  28    8  white       1        1  1203.0     0.0 19516.3
PSID348     0  19   12 hispan       0        0  1058.0     0.0  8924.0
PSID349     0  37    7  white       1        1   964.0     0.0     0.0
PSID350     0  16    9  white       1        1   920.9     0.0 15997.9
PSID351     0  17   10  white       0        1   646.6     0.0  9438.2
PSID352     0  24   12  black       0        0   566.2     0.0  2284.6
PSID353     0  19   11  white       0        1   540.8     0.0  3406.2
PSID354     0  50    5  black       1        1   411.4     0.0  9166.3
PSID355     0  19    9  black       0        1   384.0     0.0     0.0
PSID356     0  36    1  black       0        1   348.7     0.0     0.0
PSID357     0  18   11  white       0        1   321.3     0.0  7722.6
PSID358     0  16    7 hispan       0        1   290.0     0.0  7515.7
PSID359     0  21   11  white       1        1   246.9     0.0  6708.9
PSID360     0  55    6  white       1        1   111.7     0.0     0.0
PSID361     0  37   12  white       0        0    49.0     0.0   877.8
PSID362     0  26   12 hispan       1        0    47.0     0.0     0.0
PSID363     0  54   12  white       1        0     0.0     0.0     0.0
PSID364     0  50   12  white       1        0     0.0     0.0     0.0
PSID365     0  16    8  white       0        1     0.0     0.0  2559.4
PSID366     0  16    9 hispan       0        1     0.0     0.0     0.0
PSID367     0  18   10  black       0        1     0.0     0.0  2281.6
PSID368     0  40   11  black       1        1     0.0     0.0     0.0
PSID369     0  16    8  white       0        1     0.0     0.0     0.0
PSID370     0  16    9  black       0        1     0.0     0.0  2159.0
PSID371     0  26   14  white       0        0     0.0     0.0  6717.7
PSID372     0  20    9  black       0        1     0.0     0.0  6083.8
PSID373     0  20   12  black       0        0     0.0     0.0     0.0
PSID374     0  18   11  black       0        1     0.0     0.0     0.0
PSID375     0  46   11  black       1        1     0.0     0.0  2821.0
PSID376     0  17    8  black       0        1     0.0     0.0 12760.2
PSID377     0  16    9  white       0        1     0.0     0.0  4974.0
PSID378     0  30   10  white       1        1     0.0     0.0  3152.0
PSID379     0  33   12 hispan       1        0     0.0     0.0  5841.5
PSID380     0  34   12  black       1        0     0.0     0.0 18716.9
PSID381     0  21   13  black       0        0     0.0     0.0 17941.1
PSID382     0  29   11  white       1        1     0.0     0.0     0.0
PSID383     0  19   12  white       0        0     0.0     0.0     0.0
PSID384     0  31    4 hispan       0        1     0.0     0.0  1161.5
PSID385     0  19   12 hispan       0        0     0.0     0.0 18573.5
PSID386     0  20   12  black       0        0     0.0     0.0 11594.2
PSID387     0  55    4  black       0        1     0.0     0.0     0.0
PSID388     0  19   11  black       0        1     0.0     0.0 16485.5
PSID389     0  18   11  black       0        1     0.0     0.0  7146.3
PSID390     0  48   13  white       1        0     0.0     0.0     0.0
PSID391     0  16    9 hispan       1        1     0.0     0.0  6821.2
PSID392     0  17   10  black       0        1     0.0     0.0     0.0
PSID393     0  38   12  white       1        0     0.0     0.0 18756.8
PSID394     0  34    8  white       1        1     0.0     0.0  2664.3
PSID395     0  53   12  white       0        0     0.0     0.0     0.0
PSID396     0  48   14  white       1        0     0.0     0.0  7236.4
PSID397     0  16    9  white       0        1     0.0     0.0  6494.6
PSID398     0  17    8  black       0        1     0.0     0.0  4520.4
PSID399     0  27   14  black       0        0     0.0     0.0 10122.4
PSID400     0  37    8  black       0        1     0.0     0.0   648.7
PSID401     0  17   10  black       0        1     0.0     0.0  1053.6
PSID402     0  16    8  white       0        1     0.0     0.0     0.0
PSID403     0  48   12  white       1        0     0.0     0.0  1491.0
PSID404     0  55    7  white       0        1     0.0     0.0     0.0
PSID405     0  21   15  white       0        0     0.0     0.0     0.0
PSID406     0  16   10  black       0        1     0.0     0.0  1730.4
PSID407     0  23   12  white       0        0     0.0     0.0  3902.7
PSID408     0  46   11  black       1        1     0.0     0.0     0.0
PSID409     0  17   10  white       0        1     0.0     0.0 14942.8
PSID410     0  42   16  white       0        0     0.0     0.0 23764.8
PSID411     0  18   10  black       0        1     0.0     0.0  5306.5
PSID412     0  53   12  black       0        0     0.0     0.0     0.0
PSID413     0  17   10  white       1        1     0.0     0.0  3859.8
PSID414     0  17    6  white       0        1     0.0     0.0     0.0
PSID415     0  43    6  white       1        1     0.0     0.0     0.0
PSID416     0  34   12  black       0        0     0.0     0.0     0.0
PSID417     0  16    8 hispan       0        1     0.0     0.0 12243.0
PSID418     0  27   12  white       1        0     0.0     0.0  1533.9
PSID419     0  51    4  black       0        1     0.0     0.0     0.0
PSID420     0  39    2  black       1        1     0.0     0.0   965.0
PSID421     0  55    8  white       1        1     0.0     0.0     0.0
PSID422     0  16    9  white       0        1     0.0     0.0  5551.8
PSID423     0  27   10  black       0        1     0.0     0.0  7543.8
PSID424     0  25   14  white       0        0     0.0     0.0     0.0
PSID425     0  18   11  white       0        1     0.0     0.0 10150.5
PSID426     0  24    1 hispan       1        1     0.0     0.0 19464.6
PSID427     0  21   18  white       0        0     0.0     0.0     0.0
PSID428     0  32    5  black       1        1     0.0     0.0   187.7
PSID429     0  16    9  white       0        1     0.0     0.0  1495.5

I heard you were working on these in Steve’s class.

ATT estimates

n_replicates <- 1e3
set.seed(22102026)

m_orig <- matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
    data = lalonde, method = "nearest",
    estimand = "ATT")

dr_orig <- lm(re78 ~ treat + age + educ + race + married + nodegree + re74 + re75,
    data = lalonde, weights = m_orig$weights)

dr_tidy <- tidy(dr_orig) |>
    filter(term == "treat")
dr_tidy
# A tibble: 1 × 5
  term  estimate std.error statistic p.value
  <chr>    <dbl>     <dbl>     <dbl>   <dbl>
1 treat    1345.      790.      1.70  0.0894

Bootstrapped ATT estimates

Tidily bootstrap our estimates.

## Keep everything, do the matching and weighting inside the boot
boot_out <- tibble(boot_id = 1:n_replicates) |>
  mutate(
    boot_data = map(boot_id, \(x) slice_sample(lalonde, 
                                               n = nrow(lalonde),
                                               replace = TRUE)),
    match_obj = map(boot_data, \(bd) {  
      matchit(treat ~ age + educ + race + married + nodegree + re74 + re75,
              data = bd,
              method = "nearest",
              estimand = "ATT")
      }),
    model = map2(boot_data, match_obj, \(bd, mo) { 
      lm(re78 ~ treat + age + educ + race + married + nodegree + re74 + re75,
         data = bd,
         weights = mo$weights)
      }),
    coefs = map(model, tidy) 
  )

Four map() calls including one to map2(). We could separately write named functions for the steps inside the mapping calls, but we use lambdas instead and write them on the fly.

Bootstrapped ATT estimates

What we get:

boot_out
# A tibble: 1,000 × 5
   boot_id boot_data      match_obj model  coefs            
     <int> <list>         <list>    <list> <list>           
 1       1 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 2       2 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 3       3 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 4       4 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 5       5 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 6       6 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 7       7 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 8       8 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
 9       9 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
10      10 <df [614 × 9]> <matchit> <lm>   <tibble [10 × 5]>
# ℹ 990 more rows

Bootstrapped ATT estimates

Unnest the coefs column:

boot_out |>
  unnest(coefs)
# A tibble: 10,000 × 9
   boot_id boot_data match_obj model  term  estimate std.error statistic p.value
     <int> <list>    <list>    <list> <chr>    <dbl>     <dbl>     <dbl>   <dbl>
 1       1 <df>      <matchit> <lm>   (Int… -8.28e+3  3318.       -2.50  1.30e-2
 2       1 <df>      <matchit> <lm>   treat  1.07e+3   811.        1.32  1.88e-1
 3       1 <df>      <matchit> <lm>   age    9.08e+1    45.1       2.01  4.50e-2
 4       1 <df>      <matchit> <lm>   educ   8.82e+2   221.        3.98  8.25e-5
 5       1 <df>      <matchit> <lm>   race…  1.54e+3  1384.        1.11  2.67e-1
 6       1 <df>      <matchit> <lm>   race…  6.24e+2   964.        0.648 5.17e-1
 7       1 <df>      <matchit> <lm>   marr…  7.09e+2  1065.        0.666 5.06e-1
 8       1 <df>      <matchit> <lm>   node…  2.34e+3  1225.        1.91  5.67e-2
 9       1 <df>      <matchit> <lm>   re74   3.10e-2     0.102     0.305 7.61e-1
10       1 <df>      <matchit> <lm>   re75   4.24e-1     0.169     2.52  1.23e-2
# ℹ 9,990 more rows

Bootstrapped ATT estimates

# Simplest way to calculate bootstrap ci's; could get
# much fancier using boot() or any number of other fns for the
# bootstrap stage
boot_tbl <- boot_out |>
  unnest(coefs) |>
  filter(term == "treat") |>
  summarize(
    boot_att = mean(estimate),
    boot_se = sd(estimate),
    boot_ci_lower = quantile(estimate, 0.025),
    boot_ci_upper = quantile(estimate, 0.975)
  )

tinytable::tt(boot_tbl, digits = 3)
boot_att boot_se boot_ci_lower boot_ci_upper
1276 761 -152 2744

Bootstrapped ATT estimates

tmp_tbl <- boot_tbl |>
  mutate(y = 0.1)

boot_out |>
  unnest(coefs) |>
  select(boot_id, term, estimate) |>
  filter(term == "treat") |>
  ggplot(mapping = aes(x = estimate, y = after_stat(scaled),
        color = term, fill = term)) +
  geom_vline(xintercept = 0, color = "gray30") +
  geom_density(alpha = 0.5) +
  geom_pointrange(data = tmp_tbl,
    mapping = aes(x = boot_att, y = y, xmin = boot_ci_lower,
        xmax = boot_ci_upper),
    color = "gray20", size = rel(0.7), linewidth = rel(0.9),
    inherit.aes = FALSE) +
  labs(title = "Distribution of Bootstrapped Treatment Estimates",
       subtitle = "Pointrange shows point estimate and 95% quantile range.",
       x = "Bootstrap Estimate", y = "Scaled Density") +
  guides(color = "none", fill = "none")

Bootstrapped ATT estimates

Distribution of Treatment Estimates

Midwest PCA Example

Midwest data

midwest
# A tibble: 437 × 28
     PID county  state  area poptotal popdensity popwhite popblack popamerindian
   <int> <chr>   <chr> <dbl>    <int>      <dbl>    <int>    <int>         <int>
 1   561 ADAMS   IL    0.052    66090      1271.    63917     1702            98
 2   562 ALEXAN… IL    0.014    10626       759      7054     3496            19
 3   563 BOND    IL    0.022    14991       681.    14477      429            35
 4   564 BOONE   IL    0.017    30806      1812.    29344      127            46
 5   565 BROWN   IL    0.018     5836       324.     5264      547            14
 6   566 BUREAU  IL    0.05     35688       714.    35157       50            65
 7   567 CALHOUN IL    0.017     5322       313.     5298        1             8
 8   568 CARROLL IL    0.027    16805       622.    16519      111            30
 9   569 CASS    IL    0.024    13437       560.    13384       16             8
10   570 CHAMPA… IL    0.058   173025      2983.   146506    16559           331
# ℹ 427 more rows
# ℹ 19 more variables: popasian <int>, popother <int>, percwhite <dbl>,
#   percblack <dbl>, percamerindan <dbl>, percasian <dbl>, percother <dbl>,
#   popadults <int>, perchsd <dbl>, percollege <dbl>, percprof <dbl>,
#   poppovertyknown <int>, percpovertyknown <dbl>, percbelowpoverty <dbl>,
#   percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, category <chr>

Extract numeric vars

mw_pca <- midwest |>
    group_by(state) |>
    select(where(is.numeric)) |>
    select(-PID)

mw_pca
# A tibble: 437 × 25
# Groups:   state [5]
   state  area poptotal popdensity popwhite popblack popamerindian popasian
   <chr> <dbl>    <int>      <dbl>    <int>    <int>         <int>    <int>
 1 IL    0.052    66090      1271.    63917     1702            98      249
 2 IL    0.014    10626       759      7054     3496            19       48
 3 IL    0.022    14991       681.    14477      429            35       16
 4 IL    0.017    30806      1812.    29344      127            46      150
 5 IL    0.018     5836       324.     5264      547            14        5
 6 IL    0.05     35688       714.    35157       50            65      195
 7 IL    0.017     5322       313.     5298        1             8       15
 8 IL    0.027    16805       622.    16519      111            30       61
 9 IL    0.024    13437       560.    13384       16             8       23
10 IL    0.058   173025      2983.   146506    16559           331     8033
# ℹ 427 more rows
# ℹ 17 more variables: popother <int>, percwhite <dbl>, percblack <dbl>,
#   percamerindan <dbl>, percasian <dbl>, percother <dbl>, popadults <int>,
#   perchsd <dbl>, percollege <dbl>, percprof <dbl>, poppovertyknown <int>,
#   percpovertyknown <dbl>, percbelowpoverty <dbl>, percchildbelowpovert <dbl>,
#   percadultpoverty <dbl>, percelderlypoverty <dbl>, inmetro <int>

PCA Helper function

We could do this anonymously, with a lambda function, too.

do_pca <- function(df){
  prcomp(df,
         center = TRUE, scale = TRUE)
}

PCA on the whole dataset

out_pca <- mw_pca |>
    ungroup() |>
    select(-state) |>
    do_pca()

out_pca
Standard deviations (1, .., p=24):
 [1] 3.10e+00 2.21e+00 1.65e+00 1.19e+00 1.12e+00 8.98e-01 8.86e-01 8.19e-01
 [9] 6.92e-01 5.65e-01 5.44e-01 4.85e-01 3.80e-01 3.58e-01 3.09e-01 2.50e-01
[17] 2.09e-01 1.92e-01 9.65e-02 3.47e-02 1.33e-02 3.86e-03 2.89e-09 9.54e-16

Rotation (n x k) = (24 x 24):
                          PC1     PC2      PC3     PC4       PC5      PC6
area                 -0.02619  0.0150 -0.10362  0.2508 -0.626804  0.48689
poptotal             -0.31299  0.0515  0.11077  0.0544  0.002497  0.04058
popdensity           -0.29205  0.0236  0.04558 -0.1042  0.139053  0.15186
popwhite             -0.31410  0.0238  0.08117  0.0278  0.016318  0.07259
popblack             -0.28773  0.1074  0.14902  0.0598  0.006193  0.04880
popamerindian        -0.26919  0.0916 -0.01213 -0.1236 -0.196022  0.14136
popasian             -0.28406  0.0578  0.14522  0.2132 -0.083397 -0.16286
popother             -0.25831  0.0807  0.19651  0.2163 -0.110491 -0.26225
percwhite             0.18251 -0.1756  0.25486  0.4371  0.082341  0.07589
percblack            -0.19880  0.1119 -0.15319 -0.2542  0.343979  0.25296
percamerindan        -0.00153  0.1745 -0.18427 -0.4065 -0.522049 -0.28835
percasian            -0.19122 -0.1272 -0.33296  0.2078  0.057549 -0.06347
percother            -0.17367 -0.0506  0.03017 -0.0944 -0.021881 -0.58380
popadults            -0.31212  0.0529  0.11638  0.0545  0.000792  0.04190
perchsd              -0.09422 -0.3559 -0.18125 -0.0433 -0.185756  0.03657
percollege           -0.15307 -0.2554 -0.34484  0.0812 -0.024319  0.08217
percprof             -0.14308 -0.2065 -0.39278  0.1601  0.115989 -0.03980
poppovertyknown      -0.31259  0.0522  0.11422  0.0527  0.001560  0.04174
percpovertyknown      0.01603  0.0074  0.35975 -0.3495 -0.078744  0.25725
percbelowpoverty      0.02186  0.4029 -0.23735  0.0821  0.044412 -0.00326
percchildbelowpovert  0.00948  0.4121 -0.15012 -0.0412  0.051210  0.07182
percadultpoverty      0.01409  0.3590 -0.30902  0.1546  0.035637 -0.03790
percelderlypoverty    0.08536  0.3681  0.00766  0.1855  0.141928  0.08282
inmetro              -0.13767 -0.1924 -0.12507 -0.3310  0.218386  0.15729
                          PC7       PC8     PC9     PC10    PC11    PC12
area                 -0.45357  0.107932  0.1266 -0.09418 -0.1123  0.0431
poptotal              0.04886 -0.028501  0.0155  0.00878  0.0179  0.0150
popdensity            0.06936 -0.066500 -0.1390  0.34249 -0.1779 -0.0595
popwhite              0.05617 -0.000858  0.0127  0.05245 -0.0656  0.0344
popblack              0.01683 -0.122024 -0.0269 -0.00446  0.1794 -0.0630
popamerindian         0.05258 -0.134705 -0.0411  0.44923 -0.1095 -0.3797
popasian              0.12879  0.047243  0.1011 -0.23737  0.0263  0.1368
popother              0.03907  0.052661  0.1772 -0.30565  0.2244  0.0863
percwhite             0.13000  0.159783  0.1321  0.28485  0.0554 -0.0947
percblack            -0.38305 -0.177172 -0.2160 -0.30716  0.1082  0.1591
percamerindan         0.32453 -0.146125  0.0645 -0.12749 -0.1147 -0.0439
percasian             0.11659  0.180479 -0.0714 -0.01768 -0.5930  0.3612
percother            -0.59580  0.390170 -0.0894  0.17667 -0.0671 -0.2075
popadults             0.05210 -0.029834  0.0164  0.00544  0.0206  0.0206
perchsd               0.04100  0.039628 -0.1430  0.02198  0.5422 -0.0818
percollege            0.12256  0.181171 -0.1849 -0.08584  0.1542 -0.2181
percprof              0.19379  0.101079 -0.1270 -0.13267  0.0155 -0.1796
poppovertyknown       0.04962 -0.027370  0.0146  0.00836  0.0192  0.0147
percpovertyknown      0.23760  0.682652 -0.2952 -0.13789 -0.0294  0.0681
percbelowpoverty      0.05392  0.182119  0.0472  0.11013  0.1439  0.0262
percchildbelowpovert -0.00510  0.224482  0.0394  0.19609  0.2277  0.2252
percadultpoverty      0.07611  0.179307  0.0725  0.20097  0.1930  0.0739
percelderlypoverty    0.02486  0.074240 -0.0498 -0.39718 -0.2045 -0.6657
inmetro              -0.00301  0.220252  0.8179 -0.05357 -0.0222 -0.1318
                          PC13     PC14     PC15     PC16     PC17     PC18
area                  1.49e-01  0.04925 -0.04158  0.06918 -0.08005 -0.03174
poptotal              8.09e-02  0.09421 -0.04704 -0.01902  0.22285  0.08703
popdensity            1.51e-01  0.47017  0.07373  0.13787 -0.57401 -0.28223
popwhite              1.03e-01  0.22760  0.15543  0.12548  0.46636  0.07769
popblack              8.92e-02 -0.16452 -0.64727 -0.42151 -0.25573  0.19708
popamerindian        -3.34e-01 -0.55797  0.17430  0.08862  0.04284 -0.03870
popasian             -1.24e-01 -0.01828  0.23649  0.12741 -0.09705 -0.03676
popother             -1.07e-01 -0.20652  0.19229  0.06486 -0.31519 -0.25803
percwhite             1.61e-02  0.00537  0.00696 -0.00619 -0.00690  0.04870
percblack            -9.25e-02 -0.14098  0.06451  0.05433  0.02579 -0.12478
percamerindan         1.25e-01  0.14325 -0.05045 -0.03463 -0.01622  0.04164
percasian            -4.17e-01 -0.01410 -0.17323 -0.11467 -0.05198  0.10725
percother             6.38e-02  0.05381 -0.05133 -0.00728  0.02705  0.04714
popadults             8.47e-02  0.11195 -0.02766 -0.00765  0.22575  0.08734
perchsd              -5.16e-01  0.32791 -0.18139  0.23073 -0.00181  0.09494
percollege            1.65e-01  0.01746  0.41803 -0.64004 -0.00284 -0.07362
percprof              4.73e-01 -0.29347 -0.16077  0.51738 -0.08001  0.13539
poppovertyknown       7.98e-02  0.09550 -0.04780 -0.02371  0.22682  0.08897
percpovertyknown     -8.29e-05 -0.13054 -0.09925  0.04744  0.01969 -0.10975
percbelowpoverty     -2.61e-02  0.04201 -0.05359  0.00659  0.04648 -0.06958
percchildbelowpovert  5.67e-03  0.00225  0.28393  0.03067 -0.23656  0.62012
percadultpoverty      4.30e-03  0.01308 -0.22556 -0.01478  0.21660 -0.54943
percelderlypoverty   -2.38e-01  0.23711 -0.03068  0.03817 -0.02334  0.11150
inmetro              -3.86e-02  0.01112 -0.05590 -0.02538 -0.04174 -0.00287
                          PC19      PC20     PC21      PC22      PC23      PC24
area                  0.008671  0.004681  0.00061  8.37e-04  9.26e-11  1.31e-16
poptotal             -0.076193  0.023054  0.26237 -2.70e-01 -3.56e-09 -8.10e-01
popdensity           -0.005458  0.011704  0.00671  3.68e-03  2.87e-10  9.46e-17
popwhite             -0.140728  0.030297  0.34182 -3.43e-01 -4.99e-09  5.44e-01
popblack              0.104823  0.008645  0.11364 -1.40e-01 -2.79e-09  2.14e-01
popamerindian         0.021665  0.004648 -0.00781 -2.08e-03 -9.94e-11  2.36e-03
popasian              0.781087  0.034020  0.01827 -1.95e-02 -1.64e-10  2.58e-02
popother             -0.554636 -0.010890  0.03550 -2.63e-02  5.21e-10  5.03e-02
percwhite            -0.007309  0.015581 -0.00107 -1.13e-04  7.15e-01  2.90e-10
percblack             0.034614  0.004477 -0.00140  3.42e-04  5.18e-01  2.10e-10
percamerindan        -0.019199 -0.026849  0.00482 -2.22e-04  4.58e-01  1.86e-10
percasian            -0.127336 -0.015781 -0.00556  8.87e-04  6.33e-02  2.57e-11
percother             0.049001 -0.002061 -0.00432 -6.03e-04  8.45e-02  3.43e-11
popadults            -0.076871 -0.091715 -0.87903 -1.03e-01 -1.77e-08 -3.37e-19
perchsd              -0.015055  0.001811  0.00211 -1.11e-04  2.27e-10  1.49e-16
percollege            0.019522  0.007993 -0.00197 -1.66e-03 -7.27e-11 -1.01e-16
percprof             -0.005332 -0.006801 -0.00209  3.81e-03 -1.66e-10  2.85e-17
poppovertyknown      -0.069062  0.003553  0.13036  8.82e-01  2.86e-08  4.03e-15
percpovertyknown      0.000399  0.000865 -0.00122 -2.70e-03 -2.29e-10  3.54e-16
percbelowpoverty     -0.017411  0.825060 -0.08306  4.75e-03  4.43e-09  5.60e-16
percchildbelowpovert -0.051527 -0.281404  0.03353 -7.31e-04 -8.29e-10 -5.68e-16
percadultpoverty      0.090370 -0.461037  0.04164 -2.95e-03 -2.97e-09 -1.86e-16
percelderlypoverty   -0.036945 -0.121558  0.01264 -1.06e-03 -6.50e-10 -4.95e-17
inmetro               0.032697  0.000802 -0.00267 -5.15e-05 -9.24e-11 -3.55e-16

Tidier view, whole dataset

tidy_pca <- tidy(out_pca, matrix = "pcs")

tidy_pca
# A tibble: 24 × 4
      PC std.dev percent cumulative
   <dbl>   <dbl>   <dbl>      <dbl>
 1     1   3.10   0.400       0.400
 2     2   2.21   0.203       0.603
 3     3   1.65   0.113       0.717
 4     4   1.19   0.0593      0.776
 5     5   1.12   0.0524      0.829
 6     6   0.898  0.0336      0.862
 7     7   0.886  0.0327      0.895
 8     8   0.819  0.0280      0.923
 9     9   0.692  0.0200      0.943
10    10   0.565  0.0133      0.956
# ℹ 14 more rows

Using augment()

This puts the original data points back in.

aug_pca <- augment(out_pca, data = mw_pca[,-1])
aug_pca <-  aug_pca |>
  tibble::add_column(midwest$state, midwest$county, .before = TRUE) |>
  rename(state = `midwest$state`, county = `midwest$county`)

aug_pca
# A tibble: 437 × 51
   state county    .rownames  area poptotal popdensity popwhite popblack
   <chr> <chr>     <chr>     <dbl>    <int>      <dbl>    <int>    <int>
 1 IL    ADAMS     1         0.052    66090      1271.    63917     1702
 2 IL    ALEXANDER 2         0.014    10626       759      7054     3496
 3 IL    BOND      3         0.022    14991       681.    14477      429
 4 IL    BOONE     4         0.017    30806      1812.    29344      127
 5 IL    BROWN     5         0.018     5836       324.     5264      547
 6 IL    BUREAU    6         0.05     35688       714.    35157       50
 7 IL    CALHOUN   7         0.017     5322       313.     5298        1
 8 IL    CARROLL   8         0.027    16805       622.    16519      111
 9 IL    CASS      9         0.024    13437       560.    13384       16
10 IL    CHAMPAIGN 10        0.058   173025      2983.   146506    16559
# ℹ 427 more rows
# ℹ 43 more variables: popamerindian <int>, popasian <int>, popother <int>,
#   percwhite <dbl>, percblack <dbl>, percamerindan <dbl>, percasian <dbl>,
#   percother <dbl>, popadults <int>, perchsd <dbl>, percollege <dbl>,
#   percprof <dbl>, poppovertyknown <int>, percpovertyknown <dbl>,
#   percbelowpoverty <dbl>, percchildbelowpovert <dbl>, percadultpoverty <dbl>,
#   percelderlypoverty <dbl>, inmetro <int>, .fittedPC1 <dbl>, …

Grouped PCA

Let’s do that grouped by State

mw_pca <- mw_pca |>
    group_by(state) |>
    nest()

mw_pca
# A tibble: 5 × 2
# Groups:   state [5]
  state data               
  <chr> <list>             
1 IL    <tibble [102 × 24]>
2 IN    <tibble [92 × 24]> 
3 MI    <tibble [83 × 24]> 
4 OH    <tibble [88 × 24]> 
5 WI    <tibble [72 × 24]> 

Grouped PCA

state_pca <- mw_pca |>
    mutate(pca = map(data, do_pca))

state_pca
# A tibble: 5 × 3
# Groups:   state [5]
  state data                pca     
  <chr> <list>              <list>  
1 IL    <tibble [102 × 24]> <prcomp>
2 IN    <tibble [92 × 24]>  <prcomp>
3 MI    <tibble [83 × 24]>  <prcomp>
4 OH    <tibble [88 × 24]>  <prcomp>
5 WI    <tibble [72 × 24]>  <prcomp>

Grouped PCA

Alternatively, write a lambda function:

mw_pca |>
    mutate(pca = map(data, \(x) prcomp(x, center = TRUE, scale = TRUE)))
# A tibble: 5 × 3
# Groups:   state [5]
  state data                pca     
  <chr> <list>              <list>  
1 IL    <tibble [102 × 24]> <prcomp>
2 IN    <tibble [92 × 24]>  <prcomp>
3 MI    <tibble [83 × 24]>  <prcomp>
4 OH    <tibble [88 × 24]>  <prcomp>
5 WI    <tibble [72 × 24]>  <prcomp>

Tidy it

# broom has methods for different models, including prcomp, and they take
# arguments specific to those models. Here we want the principal components.
do_tidy <- function(pr){
    broom::tidy(pr, matrix = "pcs")
}

state_pca  <- mw_pca |>
    mutate(pca = map(data, do_pca),
           pcs = map(pca, do_tidy))

state_pca
# A tibble: 5 × 4
# Groups:   state [5]
  state data                pca      pcs              
  <chr> <list>              <list>   <list>           
1 IL    <tibble [102 × 24]> <prcomp> <tibble [24 × 4]>
2 IN    <tibble [92 × 24]>  <prcomp> <tibble [24 × 4]>
3 MI    <tibble [83 × 24]>  <prcomp> <tibble [24 × 4]>
4 OH    <tibble [88 × 24]>  <prcomp> <tibble [24 × 4]>
5 WI    <tibble [72 × 24]>  <prcomp> <tibble [24 × 4]>

Tidy it

Or again, write a lambda function

mw_pca |>
    mutate(pca = map(data, \(x) prcomp(x, center = TRUE, scale = TRUE)),
           pcs = map(pca, \(x) tidy(x, matrix = "pcs")))
# A tibble: 5 × 4
# Groups:   state [5]
  state data                pca      pcs              
  <chr> <list>              <list>   <list>           
1 IL    <tibble [102 × 24]> <prcomp> <tibble [24 × 4]>
2 IN    <tibble [92 × 24]>  <prcomp> <tibble [24 × 4]>
3 MI    <tibble [83 × 24]>  <prcomp> <tibble [24 × 4]>
4 OH    <tibble [88 × 24]>  <prcomp> <tibble [24 × 4]>
5 WI    <tibble [72 × 24]>  <prcomp> <tibble [24 × 4]>

Unnest to inspect

mw_pca |>
  mutate(pca = map(data, \(x) prcomp(x, center = TRUE, scale = TRUE)),
           pcs = map(pca, \(x) tidy(x, matrix = "pcs"))) |>
  unnest(cols = c(pcs)) |>
  ggplot(aes(x = PC, y = percent)) +
  geom_line(linewidth = 1.1) +
  facet_wrap(~ state, nrow = 1) +
  labs(x = "Principal Component",
       y = "Variance Explained")

And finally, augment()

do_aug <- function(pr){
    broom::augment(pr)
}


state_pca  <- mw_pca |>
    mutate(pca = map(data, do_pca),
           pcs = map(pca, do_tidy),
           fitted = map(pca, do_aug))

state_pca
# A tibble: 5 × 5
# Groups:   state [5]
  state data                pca      pcs               fitted             
  <chr> <list>              <list>   <list>            <list>             
1 IL    <tibble [102 × 24]> <prcomp> <tibble [24 × 4]> <tibble [102 × 25]>
2 IN    <tibble [92 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [92 × 25]> 
3 MI    <tibble [83 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [83 × 25]> 
4 OH    <tibble [88 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [88 × 25]> 
5 WI    <tibble [72 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [72 × 25]> 

… or a lambda function

mw_pca |>
  mutate(pca = map(data, \(x) prcomp(x, center = TRUE, scale = TRUE)),
         pcs = map(pca, \(x) tidy(x, matrix = "pcs")),
         fitted = map(pca, \(x) augment(x)))
# A tibble: 5 × 5
# Groups:   state [5]
  state data                pca      pcs               fitted             
  <chr> <list>              <list>   <list>            <list>             
1 IL    <tibble [102 × 24]> <prcomp> <tibble [24 × 4]> <tibble [102 × 25]>
2 IN    <tibble [92 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [92 × 25]> 
3 MI    <tibble [83 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [83 × 25]> 
4 OH    <tibble [88 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [88 × 25]> 
5 WI    <tibble [72 × 24]>  <prcomp> <tibble [24 × 4]> <tibble [72 × 25]> 

In one breath

out <- midwest |>
    group_by(state) |>
    select_if(is.numeric) |>
    select(-PID) |>
    nest() |>
    mutate(pca = map(data, \(x) prcomp(x, center = TRUE, scale = TRUE)),
           pcs = map(pca, \(x) tidy(x, matrix = "pcs")),
           fitted = map(pca, \(x) augment(x))) |>
    unnest(cols = c(fitted)) |>
    add_column(county = midwest$county)

out
# A tibble: 437 × 30
# Groups:   state [5]
   state data     pca      pcs      .rownames .fittedPC1 .fittedPC2 .fittedPC3
   <chr> <list>   <list>   <list>   <chr>          <dbl>      <dbl>      <dbl>
 1 IL    <tibble> <prcomp> <tibble> 1              0.403     -0.233    0.00488
 2 IL    <tibble> <prcomp> <tibble> 2              0.413      8.14     3.33   
 3 IL    <tibble> <prcomp> <tibble> 3              0.908      0.326   -0.303  
 4 IL    <tibble> <prcomp> <tibble> 4             -0.344     -2.37    -0.907  
 5 IL    <tibble> <prcomp> <tibble> 5              0.940      1.61     1.06   
 6 IL    <tibble> <prcomp> <tibble> 6              0.482     -1.07    -0.744  
 7 IL    <tibble> <prcomp> <tibble> 7              1.61       1.83    -1.29   
 8 IL    <tibble> <prcomp> <tibble> 8              0.835     -0.428   -0.839  
 9 IL    <tibble> <prcomp> <tibble> 9              1.28       0.408   -1.08   
10 IL    <tibble> <prcomp> <tibble> 10            -3.49      -3.13     7.39   
# ℹ 427 more rows
# ℹ 22 more variables: .fittedPC4 <dbl>, .fittedPC5 <dbl>, .fittedPC6 <dbl>,
#   .fittedPC7 <dbl>, .fittedPC8 <dbl>, .fittedPC9 <dbl>, .fittedPC10 <dbl>,
#   .fittedPC11 <dbl>, .fittedPC12 <dbl>, .fittedPC13 <dbl>, .fittedPC14 <dbl>,
#   .fittedPC15 <dbl>, .fittedPC16 <dbl>, .fittedPC17 <dbl>, .fittedPC18 <dbl>,
#   .fittedPC19 <dbl>, .fittedPC20 <dbl>, .fittedPC21 <dbl>, .fittedPC22 <dbl>,
#   .fittedPC23 <dbl>, .fittedPC24 <dbl>, county <chr>