Skip to content

Latest commit

 

History

History
80 lines (59 loc) · 5.15 KB

readme.md

File metadata and controls

80 lines (59 loc) · 5.15 KB

US House Election Results

It's election day in the United States! To celebrate, the data this week comes from the MIT Election Data and Science Lab (MEDSL). Hat tip this week to the RStudio GitHub Copilot integration, which suggested the MEDSL.

From the MEDSL's report New Report: How We Voted in 2022:

The Survey of the Performance of American Elections (SPAE) provides information about how Americans experienced voting in the most recent federal election. The survey has been conducted after federal elections since 2008, and is the only public opinion project in the country that is dedicated explicitly to understanding how voters themselves experience the election process.

We're specifically providing data on House elections from 1976-2022. Check out the MEDSL website for additional datasets and tools.

Be sure to cite the MEDSL in your work:

@data{DVN/IG0UN2_2017,
author = {MIT Election Data and Science Lab},
publisher = {Harvard Dataverse},
title = {{U.S. House 1976–2022}},
UNF = {UNF:6:A6RSZvlhh8eRZ4+mvT/HRQ==},
year = {2017},
version = {V12},
doi = {10.7910/DVN/IG0UN2},
url = {https://doi.org/10.7910/DVN/IG0UN2}
}

The Data

# Option 1: tidytuesdayR package 
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2023-11-07')
## OR
tuesdata <- tidytuesdayR::tt_load(2023, week = 45)

house <- tuesdata$house

# Option 2: Read directly from GitHub

house <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/main/data/2023/2023-11-07/house.csv')

How to Participate

  • Explore the data, watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about causation in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
  • Create a visualization, a model, a shiny app, or some other piece of data-science-related output, using R or another programming language.
  • Share your output and the code used to generate it on social media with the #TidyTuesday hashtag.

Data Dictionary

house.csv

variable class description
year double year in which election was held
state character state name
state_po character U.S. postal code state abbreviation
state_fips double State FIPS code
state_cen double U.S. Census state code
state_ic double ICPSR state code
office character U.S. House (constant)
district character district number. At-large districts are coded as 0 (zero)
stage character electoral stage (gen = general elections, pri = primary elections)
runoff logical runoff election
special logical special election
candidate character name of the candidate as it appears in the House Clerk report
party character party of the candidate (always entirely lowercase) (Parties are as they appear in the House Clerk report. In states that allow candidates to appear on multiple party lines, separate vote totals are indicated for each party. Therefore, for analysis that involves candidate totals, it will be necessary to aggregate across all party lines within a district. For analysis that focuses on two-party vote totals, it will be necessary to account for major party candidates who receive votes under multiple party labels. Minnesota party labels are given as they appear on the Minnesota ballots. Future versions of this file will include codes for candidates who are endorsed by major parties, regardless of the party label under which they receive votes.)
writein logical vote totals associated with write-in candidates
mode character mode of voting; states with data that doesn't break down returns by mode are marked as "total"
candidatevotes double votes received by this candidate for this particular party
totalvotes double total number of votes cast for this election
unofficial logical TRUE/FALSE indicator for unofficial result (to be updated later); this appears only for 2018 data in some cases
version double date when this dataset was finalized
fusion_ticket logical A TRUE/FALSE indicator as to whether the given candidate is running on a fusion party ticket, which will in turn mean that a candidate will appear multiple times, but by different parties, for a given election. States with fusion tickets include Connecticut, New Jersey, New York, and South Carolina.

Cleaning Script

Clean data and dictionary downloaded from the Harvard Dataverse