This project explores the likelihood of the Spoiler Effect in Ranked Choice Voting (RCV) systems.
In 2019, New York City (NYC) voted to adopt a ranked choice system of voting. Being eligible to vote in this election, one of this team's members began researching RCV systems' pros & cons. One 'pro' that stood out was the claim that RCV systems did away with something called the "Spoiler Effect."
This project determines if that claim is true.
(Note that this project is being built in the team's free time & is still a work in progress.)
Check out this video for a more in depth explanation.
Below is a nice example to read through:
Even though the Spoiler Effect can happen in almost any electoral system, proponents of Ranked Choice Voting say that it is so uncommon a phenomenon that voters shouldn't have to worry about it. On the other hand, though, a good number of people also say the opposite. So, what's really going on?
tktk -- diff btwn exhausted ballot & spoiled effect
Ranked Choice Voting is "an electoral system in which voters rank candidates by preference on their ballots." A typical ballot in an RCV election looks like this:
As you can see, instead of voting for a single candidate, voters are asked to grade all possible candidates according to preference.
Note: sites like Election Science also call RCV voting "Instant Runoff Voting."
Big thanks goes to all the researchers & authors who helped us understand the intricacies of different voting systems throughout this project. Particular gratitude goes to:
- Jon Tingvold, the author of the
pyrankvote
package without which this project would have been extremely difficult - Paul Butler, a self-proclaimed quant out of NYC whose website https://ranked.vote helped us model our synthetic data for this project
- Aaron Hamlin, the Executive Director of The Center for Election Science who put us in touch with Paul & generally helped us navigate the complicated world of voting systems.
Finding data with which to train our model with was difficult. There are hardly any public datasets that include 1+ spoiled elections (in an RCV system), let alone tens of thousands. For this reason, we chose one of the most famous spoiled RCV elections as a model for our synthetic data: Burlington's 2009 mayoral election. In this election, a candidate named Kurt Wright acted as a Spoiler. Without Wright in the contest, Andy Montroll would have won. After this election, Burlington voted to repeal RCV in 2010.
After inspecting the normalized data from this election, we concluded that it took the following features to make an "election":
- A certain amount of noise across the ballots
- A specific number of candidates
- A likely distribution of votes
- A specific number of ballots
Noise
To model the 'noise' that one would expect to see in real-world elections, we created "partial" ballots. To make these partial ballots, our script chooses a random number between 1-15 that would represent the percentage of total ballots in a specific election that would be missing a vote for a certain candidate. For example, if there were 4 possible candidates for an election, a 'noisy' (i.e. partial) ballot would be one that had votes for 2/4 candidates, instead of 4/4.
Number of Candidates
After inspecting the number of candidates in various real-world RCV elections, we found that most elections had between 3-8 candidates. Since the simplest type of election to model the Spoiler Effect would be an election with 3 candidates, our script picks a number of candidates for each election between 3-8, with 3 candidates being the most likely to occur.
Distribution of votes
Since it's unlikely in reality that each candidate would get an equal share of votes in an election, we decided to randomly generate weights & assign them to each candidate per election ("ballot-weight" below). The higher a candidate's weight, the more votes that candidate received.
A specific number of ballots
Taking into account the number of candidates in each election + each candidate's ballot-weight, we generated a certain number of ballots per election. We constrained the number of ballots per election to be between 50 and 50,000.
What we chose not to model
We chose not to model subjective features that come with any election, such as:
- Party affiliation of voters & candidates
- Location & date of election
- Voter & candidate gender, sex, & socio-economic demographics
We chose not to include these features in our modeled data in order to mitigate bias. Since the Burlington '09 election was our only real source of a spoiled RCV election, we did not feel comfortable using it to model these more subjective features. We also did not want to allow for the possibility that our data could lead to spurious correlations, such as "elections with all-male candidates result in the Spoiler Effect."
A simple way to identify a spoiled election is if the winning candidate is not also the Condorcet winner. The Condorcet winner is the candidate who wins the majority vote in all pairwise contests, as visualized below:
You can see that a 1
indicates the winning candidate, and a 0
indicates the losing candidate. This matrix
represents a single ballot. In order to calculate the Condorcet winner across an entire election, you simply add
each ballot's Condorcet matrix together & take the candidate who won the most head-to-head/pairwise
contests.
Using these matrices, we calculate the
Condorcet winner, then calculate the pyrankvote
winner, and then compare the two. If the
winners were
the same, the election was not spoiled; if they differed, the election was spoiled.
We validated this method on the Burlington '09 dataset.
Note: our algorithm does not identify which one of the candidates acted as the Spoiler.
- We have noticed that the more candidates are in an election, the more spoiled elections there are. This is in line
with intuition: since there are more candidates, there are more head-to-head contests (i.e. a larger pairwise matrix)
, which results in an increased likelihood that the Condorcet winner is not also the actual (i.e.
pyrankvote
) winner.
The histogram below has been normalized for the number of ballots in each election.
- Streamlit app tktk
tktk
- http://math.hws.edu/eck/math110_f08/voting.html
- Note: Spoiler Effect can happen in any election, but it is apparently extremely unlikely in ranked choice voting systems
- https://www.washingtonpost.com/news/the-fix/wp/2014/10/08/how-often-do-third-party-candidates-actually-spoil-elections-not-very/
- https://mainecampus.com/2020/03/ranked-choice-voting-is-unconstitutional-and-undemocratic/
- Will confuse voters
- https://www.themainewire.com/2020/02/ranked-choice-voting-in-alaska-rcv-fails-to-deliver-on-its-promises-to-voters/
- Even in RCV, 2 parties will still dominate: https://www.rangevoting.org/TarrIrv.html
- https://minguo.info/election_methods/irv