Tasks for Week 05

Published

September 19, 2024

UK Election Data

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

1. Install the UK Election Data package

It’s not on CRAN, it’s on my GitHub.

# You only need to do this once
remotes::install_github("kjhealy/ukelection2019")

2. Load the package

library(ukelection2019)

ukvote2019
# A tibble: 3,320 × 13
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 W07000… Aberavon          50747 Labour     Stephen … 17008               53.8
 2 W07000… Aberavon          50747 Conservat… Charlott…  6518               20.6
 3 W07000… Aberavon          50747 The Brexi… Glenda D…  3108                9.8
 4 W07000… Aberavon          50747 Plaid Cym… Nigel Hu…  2711                8.6
 5 W07000… Aberavon          50747 Liberal D… Sheila K…  1072                3.4
 6 W07000… Aberavon          50747 Independe… Captain …   731                2.3
 7 W07000… Aberavon          50747 Green      Giorgia …   450                1.4
 8 W07000… Aberconwy         44699 Conservat… Robin Mi… 14687               46.1
 9 W07000… Aberconwy         44699 Labour     Emily Ow… 12653               39.7
10 W07000… Aberconwy         44699 Plaid Cym… Lisa Goo…  2704                8.5
# ℹ 3,310 more rows
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

Each row is a candidate standing in a particular constituency (in US speak, a district) for a particular party or as an independent candidate.

3. Get familiar with the data

Use sample_n() to sample n rows of your tibble.

ukvote2019 |> 
  sample_n(10)
# A tibble: 10 × 13
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 E14000… Reading East      77465 Conservat… Craig Mo… 21178               37.9
 2 E14000… Birmingham …      72006 Green      Kefentse…   845                2  
 3 E14000… Old Bexley …      66104 Christian… Carol Va…   226                0.5
 4 E14001… Westminster…      65519 Labour     Karen Bu… 23240               54.2
 5 S14000… Lanark & Ha…      77659 Scottish … Angela C… 22243               41.9
 6 E14001… Twickenham        84901 Liberal D… Munira W… 36166               56.1
 7 W07000… Delyn             54552 Labour     David Ha… 15891               41.4
 8 E14000… Oldham East…      72173 Independe… Amoy Lin…   233                0.5
 9 E14000… Newbury           83414 Conservat… Laura Fa… 34431               57.4
10 S14000… Ochil & Sou…      78776 Scottish … John Nic… 26882               46.5
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

A vector of unique constituency names:

ukvote2019 |> 
  distinct(constituency)
# A tibble: 650 × 1
   constituency                   
   <chr>                          
 1 Aberavon                       
 2 Aberconwy                      
 3 Aberdeen North                 
 4 Aberdeen South                 
 5 Aberdeenshire West & Kincardine
 6 Airdrie & Shotts               
 7 Aldershot                      
 8 Aldridge-Brownhills            
 9 Altrincham & Sale West         
10 Alyn & Deeside                 
# ℹ 640 more rows

Tally them up:

ukvote2019 |> 
  distinct(constituency) |> 
  tally()
# A tibble: 1 × 1
      n
  <int>
1   650

That is, there are 650 electoral constituencies in Great Britain and Northern Ireland.

A quicker way of establishing how many constituencies there are:

ukvote2019 |> 
  count(constituency) 
# A tibble: 650 × 2
   constituency                        n
   <chr>                           <int>
 1 Aberavon                            7
 2 Aberconwy                           4
 3 Aberdeen North                      6
 4 Aberdeen South                      4
 5 Aberdeenshire West & Kincardine     4
 6 Airdrie & Shotts                    5
 7 Aldershot                           4
 8 Aldridge-Brownhills                 5
 9 Altrincham & Sale West              6
10 Alyn & Deeside                      5
# ℹ 640 more rows

Which parties fielded the most candidates?

ukvote2019 |> 
  count(party_name) |> 
  arrange(desc(n))
# A tibble: 69 × 2
   party_name                     n
   <chr>                      <int>
 1 Conservative                 636
 2 Labour                       631
 3 Liberal Democrat             611
 4 Green                        497
 5 The Brexit Party             275
 6 Independent                  224
 7 Scottish National Party       59
 8 UKIP                          44
 9 Plaid Cymru                   36
10 Christian Peoples Alliance    29
# ℹ 59 more rows

What are the Top 5 parties by n candidates?

ukvote2019 |> 
  count(party_name) |> 
  slice_max(order_by = n, n = 5)
# A tibble: 5 × 2
  party_name           n
  <chr>            <int>
1 Conservative       636
2 Labour             631
3 Liberal Democrat   611
4 Green              497
5 The Brexit Party   275

Bottom 5? Does this make sense?

ukvote2019 |> 
  count(party_name) |> 
  slice_min(order_by = n, n = 5)
# A tibble: 25 × 2
   party_name                              n
   <chr>                               <int>
 1 Ashfield Independents                   1
 2 Best for Luton                          1
 3 Birkenhead Social Justice Party         1
 4 British National Party                  1
 5 Burnley & Padiham Independent Party     1
 6 Church of the Militant Elvis Party      1
 7 Citizens Movement Party UK              1
 8 CumbriaFirst                            1
 9 Heavy Woollen District Independents     1
10 Independent Network                     1
# ℹ 15 more rows

4. Filtering

Filtering is subsetting the rows according to a condition in one or more of the columns

Show me all and only the Green party candidates.

ukvote2019 |> 
  filter(party_name == "Green")
# A tibble: 497 × 13
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 W07000… Aberavon          50747 Green      Giorgia …   450                1.4
 2 S14000… Aberdeen No…      62489 Green      Guy Inge…   880                2.4
 3 S14000… Airdrie & S…      64008 Green      Rosemary…   685                1.7
 4 E14000… Aldershot         72617 Green      Donna Wa…  1750                3.7
 5 E14000… Aldridge-Br…      60138 Green      Bill McC…   771                2  
 6 E14000… Altrincham …      73096 Green      Geraldin…  1566                2.9
 7 E14000… Amber Valley      69976 Green      Lian Piz…  1388                3  
 8 E14000… Arundel & S…      81726 Green      Isabel T…  2519                4.1
 9 E14000… Ashfield          78204 Green      Rose Woo…   674                1.4
10 E14000… Ashford           89550 Green      Mandy Ro…  2638                4.4
# ℹ 487 more rows
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

Show me all candidates named “Michael”.

ukvote2019 |> 
  filter(fname == "Michael")
# A tibble: 25 × 13
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 E14000… Basildon So…      74441 Liberal D… Michael …  1957                4.3
 2 N06000… Belfast Sou…      69984 Ulster Un… Michael …  1259                2.7
 3 E14000… Blaydon           67853 The Brexi… Michael …  5833               12.8
 4 E14000… Bosworth          81537 Liberal D… Michael …  9096               16.1
 5 E14000… Bury South        75152 Independe… Michael …   277                0.6
 6 E14000… Canterbury        80203 Independe… Michael …   505                0.8
 7 W07000… Cardiff Nor…      68438 Green      Michael …   820                1.6
 8 E14000… Dorset Mid …      65426 Conservat… Michael … 29548               60.4
 9 S14000… Dundee East       66210 Liberal D… Michael …  3573                7.9
10 E14000… Durham Nort…      72166 Liberal D… Michael …  2831                5.9
# ℹ 15 more rows
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

Show me all Green party candidates named “Michael”.

ukvote2019 |> 
  filter(party_name == "Green" & fname == "Michael")
# A tibble: 3 × 13
  cid      constituency electorate party_name candidate votes vote_share_percent
  <chr>    <chr>             <int> <chr>      <chr>     <int>              <dbl>
1 W070000… Cardiff Nor…      68438 Green      Michael …   820                1.6
2 E140007… Gloucester        81332 Green      Michael …  1385                2.6
3 E140008… Preston           59672 Green      Michael …   660                2  
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

5. Grouping

Who won in each constituency?

ukvote2019 |> 
  group_by(constituency) |> 
  slice_max(votes)
# A tibble: 650 × 13
# Groups:   constituency [650]
   cid     constituency electorate party_name candidate votes vote_share_percent
   <chr>   <chr>             <int> <chr>      <chr>     <int>              <dbl>
 1 W07000… Aberavon          50747 Labour     Stephen … 17008               53.8
 2 W07000… Aberconwy         44699 Conservat… Robin Mi… 14687               46.1
 3 S14000… Aberdeen No…      62489 Scottish … Kirsty B… 20205               54  
 4 S14000… Aberdeen So…      65719 Scottish … Stephen … 20388               44.7
 5 S14000… Aberdeenshi…      72640 Conservat… Andrew B… 22752               42.7
 6 S14000… Airdrie & S…      64008 Scottish … Neil Gray 17929               45.1
 7 E14000… Aldershot         72617 Conservat… Leo Doch… 27980               58.4
 8 E14000… Aldridge-Br…      60138 Conservat… Wendy Mo… 27850               70.8
 9 E14000… Altrincham …      73096 Conservat… Graham B… 26311               48  
10 W07000… Alyn & Dees…      62783 Labour     Mark Tami 18271               42.5
# ℹ 640 more rows
# ℹ 6 more variables: vote_share_change <dbl>, total_votes_cast <int>,
#   vrank <int>, turnout <dbl>, fname <chr>, lname <chr>

What happens if you leave out group_by() in the chunk of code above?

How do I count the number of seats each party won?

ukvote2019 |> 
  group_by(constituency) |> 
  slice_max(votes) |> 
  group_by(party_name) |> 
  tally() |> 
  arrange(desc(n))
# A tibble: 10 × 2
   party_name                           n
   <chr>                            <int>
 1 Conservative                       366
 2 Labour                             202
 3 Scottish National Party             48
 4 Liberal Democrat                    11
 5 Democratic Unionist Party            8
 6 Sinn Féin                            7
 7 Plaid Cymru                          4
 8 Social Democratic & Labour Party     2
 9 Alliance Party                       1
10 Green                                1

Group and Summarize

ukvote2019 |> 
  group_by(constituency) |> 
  slice_max(votes) |> 
  ungroup() |> 
  summarize(mean_winner_share = mean(vote_share_percent))
# A tibble: 1 × 1
  mean_winner_share
              <dbl>
1              54.4

What happens if you leave out ungroup() in the chunk above?

4. Have a go

Can you find …

  • The candidate who won the most votes in the country?
  • The candidate with the largest vote share in the country?
  • The median vote share of winning candidates?
  • The largest vote share swing from previous election?
  • Overall turnout for the whole country?
  • Median turnout across constituencies?