What effects would Approval Voting or Score Voting have for San Francisco?

Contents

  1. Purpose
  2. What are they?
    1. 2.1 Approval Voting
    2. 2.2 Score Voting
  3. Pros and cons
    1. 3.1 Complexity/Simplicity
      1. 3.1.1 Ballot spoilage
      2. 3.1.2 Voting machines
      3. 3.1.3 Precinct summability (precinct subtotals)
      4. 3.1.4 Risk of tie (or near-tie) recount nightmares
      5. 3.1.5 Intuitive understanding
    2. 3.2 Resistance to tactical voting
    3. 3.3 Spoilers
    4. 3.4 Paradoxes
      1. 3.4.1 Independence of Irrelevant Alternatives
      2. 3.4.2 Participation / No-show paradox
      3. 3.4.3 Non-additive
      4. 3.4.4 Non-monotonicity
      5. 3.4.5 Reversal symmetry
    5. 3.5 Better average voter satisfaction
  4. Criticisms
    1. 4.1 Doesn’t Approval Voting violate “one person one vote”?
    2. 4.2 Score/Approval Voting can fail to elect the favorite of a majority
      1. 4.2.1 Burlington, Vermont – 2009 mayoral election
      2. 4.2.2 Frome electoral district, South Australia – 2009 House of Assembly by-election
    3. 4.3 Score/Approval Voting will degenerate into Plurality, due to “bullet voting”
      1. 4.3.1 Empirical data
      2. 4.3.2 Other uses of Approval Voting
  5. FairVote

Purpose

The purpose of this document is to discuss the properties of Score Voting (aka Range Voting), as well as its simplified variant known as Approval Voting. Specifically, we compare these systems to Instant Runoff Voting (aka IRV, Ranked Choice Voting, or RCV), as well as to the Plurality+Runoff (aka Top-Two Runoff or TTR) system previously used in San Francisco.

What are they?

Approval Voting

Approval Voting is virtually identical to Plurality Voting, except it removes the prohibition against voting for multiple candidates. So-called “over votes” are counted instead of being treated as “spoiled” ballots. Here is a picture of what an Approval Voting ballot might look like. Note that it is just a Plurality Voting ballot, with slightly different instructions indicating that the voter may vote for more than one candidate.
 

an Approval Voting ballot

 
This voter has voted for (“approved”) three of the six candidates. Here is a more in-depth description of Approval Voting by NYU political science professor Steve Brams, who has been a proponent of Approval Voting since the 1970′s.
 
 

Score Voting

With Score Voting, voters rate the candidates on a scale such as 0-9 or 1-5. This system is widely used to rate products, movies, and restaurants, particularly on the internet (e.g. Yelp, Netflix). Note that Approval Voting is effectively/mathematically just Score Voting on a 0-1 scale. Here is a picture of what a Score Voting ballot might look like, using a simplified 0-2 scale.
a Score Voting ballot
 

The darkened circles represent the voter’s ratings for each candidate. Three candidates get the maximum score, one candidate gets the middle score, and two candidates get the minimum score.

Pros and cons

We begin by looking at the more objective/provable aspects of these systems, and then move into some more theoretical aspects.

Complexity/Simplicity

Score Voting and Approval Voting are simpler than IRV according to a number of objective metrics.

Ballot spoilage

Since its introduction in San Francisco, IRV has exhibited about seven times as many spoiled ballots, on average.
 
In contrast, Score Voting and Approval Voting have experimentally resulted in even lower ballot spoilage than Plurality Voting. Approval Voting exhibits about one fifth as many spoiled ballots as Plurality Voting (i.e. IRV exhibits around 35 times as many spoiled ballots as Approval Voting; 7*5=35). Score Voting is almost as good as Approval Voting, and still better than Plurality Voting in this regard.
 
This makes sense when you consider that Approval Voting is effectively nothing more than “counting over votes” rather than treating them as spoiled. Score Voting is a bit more complex, but it allows voters to assign the same score to multiple candidates, whereas you can’t assign the same rank to multiple candidates with IRV — that will be counted as a spoiled ballot, or a partially spoiled ballot, depending on which column is affected. And even if you assign two scores to the same candidate (which can’t be counted), we can still count all the scores for all the other candidates in that race.

Voting machines

Approval Voting and Score Voting can be handled by any kind of voting machine capable of handling multiple Plurality elections, i.e. every voting machine in the USA, without modification. This is not the case with IRV. FairVote executive director, and prominent IRV advocate, Rob Richie admitted in print in 2008 that no voting machine in the USA comes, out of the box, ready to handle IRV elections. Many computerized voting machines have been decertified by their states, in, e.g. California, Colorado, Ohio.
 
One might argue that this is irrelevant, since San Francisco already has voting machines that can handle IRV elections. But over time, these machines wear out and have to be replaced, so more complex voting machines will incur a greater ongoing cost than simple mechanical “dumb totaling” Plurality machines. The cost of buying such machines for just one typical county far exceeds the entire nationwide budgets all of the USA’s “third parties” combined. Based on this monetary comparison, there is a legitimate question of who has more power — voting machine manufacturers, or voting-method reformers.
 
Cost aside, a bigger concern is election integrity. The great advantage of mechanical lever machines is precisely that they do not contain computers, and therefore tampering is fairly hard to do and easy to detect. (Low-tech is better, many voting advocates say, and we think with some justification.)

Now for this reason, we think many punch card machines and optical scan machines are intentionally designed not to have computers, i.e. to have counters, mechanical or electromechanical or hardwired electrical. Or if they do have computers, they are designed to be computers ultrahard to reprogram, e.g. soldered-in single chip computers with program in built-in ROM.

If you want to know the answers to these questions for any particular machine type, good luck. Basically, the voting machine manufacturers lie constantly and make up different stories depending who they think is listening. (Check this about how their creation of the myth of “independent testing authorities” for voting machines, and this about a lawsuit against Diebold Inc. for telling lies.) If they think the listener wants security and incredible tamperproofness, meet story number 1. So you generally cannot trust what they say. There are numerous examples of that.

Precinct summability (precinct subtotals)

For IRV, the situation is even worse still, because IRV is non-additive, and therefore cannot be tabulated based on precinct subtotals. So even if your machines have computers, and even if those computers are totally programmable, then those machines still cannot handle counting IRV votes, unless they are all connected together in a giant network, or unless just one machine has all the votes, i.e. all the ballots in the whole city are first shipped to a single central counting agency (i.e. City Hall). That is because, in IRV, one machine needs to know the totals from all the others just in order to count its own personal stash of votes. For instance, this message appeared on the San Francisco city government website for several weeks after their 2008 IRV elections.

 

“Due to the requirement that all ballots must be centrally tallied in City Hall and not at the polling places, the Department of Elections has not set a date for releasing any preliminary results using the ranked-choice voting method.”

 
In contrast, with Score Voting or Approval Voting, a machine can total its own personal stash without knowing anything about the rest of the world. And the precinct subtotals can be summed to produce the final result.

Risk of tie (or near-tie) recount nightmares

IRV is most likely to lead to tie and near-tie chad-counting lawsuit nightmare scenarios. Score Voting is least likely. While these situations are rare, they can be quite costly and time-consuming when they do occur.

Intuitive understanding

One basic “sanity check” of voter comprehension is to have random individuals fill out a ballot, and then ask them how they intuitively think the result is tabulated. (Some have argued that it doesn’t matter whether voters understand the counting system, so long as they know how to correctly fill out a ballot. We’ll explore that at the end of this section.)
 
In my experience talking to voters, Score Voting does remarkably well in this regard. For instance, here’s a small Score Voting exit poll I conducted in the southeast Texas town of Beaumont, in 2006. The voters had no prior education on Score Voting, nor any warning that an exit poll would be conducted. Yet they all seemed to understand how the system worked, without any indication of confusion. (The only mistake was that one voter assigned two different scores to the same candidate. But we were still able to use the scores that voter assigned to all of the other candidates.) My impression was that Score Voting was familiar to these voters precisely because of its prevalence in modern society.
 
Approval Voting is admittedly less intuitive than Score Voting. Because of the prevalence of Plurality Voting, a voter who does not read the instructions to “vote for one or more candidates” will likely vote Plurality-style, for a single candidate (which is still valid and countable, not spoiled). However, because the actual operation of Approval Voting is so simple to describe, we believe it will be much easier to educate the public (and to have that education stick) than with IRV. Simply put, there’s not an “algorithm” to remember — just a single clear change to the rules of Plurality Voting. “Count all the votes.”
 
Instant Runoff Voting generally fails this sanity check, because its actual operation does not match most voters’ intuitive expectations. In December 2011, an IRV usability study was conducted by Dana Chisnell, a user interface and usability expert who served for five years as the mayor’s appointee to the Ballot Simplification Committee (it used an Alameda ballot which is very similar to San Francisco’s). Here are some of the highlights from her initial observations and insights.
  • Voters don’t read instructions (hence, voter education is probably not the only answer). It’s unclear that reading the instructions for ranked choice voting would help people vote as they intend, because the instructions are only about how to mark the ballot, not about how the votes are counted or what the consequences of ranking are. In addition, the instructions and explanation of ranked choice voting are separate from the ballot or too far away from where and how voters make their ranking decisions.
  • Some participants remarked that because they didn’t understand how the votes were counted that they didn’t trust the system.
  • Very few people accurately described how ranked choice votes are counted, even in zip codes with high education and socio-economic levels; people in poorer neighborhoods had even more difficulty describing how their ranked choices were counted. Most of these participants had voted in the most recent election. In Rockridge, most of the participants said they’d voted in the Oakland mayoral election in 2010. One participant in Oakland was a congressional aide who was tentative about his understanding of how ranked choice voting worked.
  • Many participants theorized that ranked choice operates on a weighted or point system. A few participants suggested that it was for breaking ties, but they could not describe how the second place votes were tallied. It was not uncommon for participants to talk themselves into corners as they tried to describe how the counting was done. Most ended with “I don’t know.”
This last observation matches my own polling of San Francisco residents. The vast majority cannot accurately describe how IRV works, and tend to assume it’s a “weighted or point system”. Here’s an example (online instant message) conversation I had with a very intelligent software engineer I used to work with, who has voted in IRV elections in San Francisco.
 
me
hey man.
can you do me a favor?
 
coworker
sure
what’s up
 
me
okay, i need to use you as a guinea pig for a voting op-ed i’m writing.
it will just take a minute.
 
coworker
k
 
me
don’t look up anything.
i want to see how much a typical sf voter knows about our voting system.
so you know how we rank our choices on the ballot?
so like, X>Y>Z means the voter likes X first, then Y, then Z.
make sense?
 
coworker
yes
 
me
okay, so look at this list and tell me who wins.
% of voters – their ranking
35% W > Y > Z > X
17% X > Y > Z > W
32% Y > Z > X > W
16% Z > X > Y > W
 
coworker
well… i could do math here right?
do you want me to or not?
 
me
yes.
tell me who wins.
without looking anything up.
 
coworker
looks like Y
 
me
why do you think it’s Y?
how did you arrive at that answer?
 
coworker
because in the first row it’s next to the winning W, and in between row#2, and #3, it’s pretty high up
basically according to weighted average it’d roughly be a leader
 
me
nope.
the winner is X.
 
San Francisco has tried to remedy this problem with a large ongoing public education campaign. See this example, and another, and another. The problem is that this campaign focuses on how to correctly fill out a ballot, not how the counting is performed. This leads us back to the question…
 
Why does it matter whether voters understand how the counting is performed, as long as they know how to fill out a ballot?

Some people have suggested that voters don’t need to understand how IRV works, as long as they understand how to correctly cast a ballot. For instance, Supervisor Malia Cohen made the analogy that she doesn’t need to understand the laws of gravity or the tax code in order to use them.
 
But there are some important consequences that arise when a significant portion of the electorate does not understand the underlying voting system. One finding that sticks out from Chisnell’s study is the simple observation that many people do not feel a sense of trust in a system they don’t understand. An arguably more significant issue is the effect that arises when voters naively assume that IRV is a “weighted or point system”. This belief tends to lead voters into using what we have dubbed the “Naive Exaggeration Strategy“, in which voters “polarize” the presumed frontrunners. Here is a simplified example, using four candidates from the 2011 San Francisco mayoral race (presumed frontrunners’ names in red).
 
Sincere preferences
1st 2nd 3rd 4th
Rees Avalos Lee Hall
Tactical exaggeration
1st 2nd 3rd 4th
Avalos Rees Hall Lee
 
The idea here is that the voter is trying to help ensure that Avalos (the “lesser evil” frontrunner) will defeat Lee (the “greater evil” frontrunner). The typical response from an IRV advocate will be that this “strategy” doesn’t make sense, because if the voter ranks Rees in first place, and Rees doesn’t win, then that vote will go to Avalos. Our response:
  1. That assumes that the voter understands the IRV counting mechanism, which as we explained above, a large segment of the population does not understand.
  2. Actually, that point about the vote transferring from Rees to Avalos isn’t necessarily correct, because it’s possible with IRV for a situation to exist in which the voter’s second choice is eliminated before the first choice. And in such situations a voter (or group of voters) can get a better result by insincerely ranking the second choice in first place, to avoid this pitfall.
This latter situation is precisely what occurred in the 2009 IRV mayoral race in Burlington, Vermont. A group of voters who preferred Republican over Democrat over Progressive could have caused the Democrat to win instead of the Progressive (their “second choice” instead of their “third choice”) if even a small fraction of them had insincerely ranked the Democrat ahead of the Republican. An IRV advocate may have told them, “it’s safe to vote for the Republican, because if he loses, your vote will go to the Democrat.” But that is wrong because the Democrat had already been eliminated before the Republican, even though the Democrat would have defeated the Progressive (and the Republican) if he had made it to the final round. We know this because we have the full ranked ballot data.
 
A common defense against this criticism is that voters would have to know that this was going to happen in order to take advantage of such a strategy. But statistical analysis reveals that this is simply not correct. In the same way that a Nader supporter who tactically votes for Gore under the Plurality system does not have to know that a vote for Nader will cause Bush to win in order to know that it’s tactically better to cast a strategic vote for Gore. That Nader supporter only has to know that a vote for Nader is more likely to cause Bush to defeat Gore than to cause Nader to win. The tactical voter uses strategy probabilistically, just in case it helps. The same goes for IRV.
 
In any case, this digression is mostly academic since, as we previously stated, the vast majority of voters who use this tactic do it not because they are incredibly informed about the voting system and the tactics required to game it successfully, but precisely because they donot understand the system. They will just do it “naively”, based on their intuition of how IRV works. And as the link in the preceding paragraph explains, that strategy causes IRV to degenerate toward Plurality Voting, the more people use it.
 
To really drive this point home, I note that I placed a call to the Australian Green Party on June 14, 2010 at 02 6140-3217. Australia has used IRV in its House of Representatives (which has 150 seats) since 1919. The man who took my call explained that one of the most common calls he gets is essentially, “why should I vote for the Green Party, when that’s just wasting my vote?” Why would people say such a thing if they have voted with IRV their whole lives? He explains there’s widespread voter misunderstanding on the preferential system. Even though they start learning about it in grade school.
 
Score Voting and Approval Voting don’t degenerate to Plurality Voting under this tactic

As we explained two links back, the “Naive Exaggeration Strategy” causes IRV (and almost any ranked voting system) to degenerate toward Plurality Voting, so long as voters have a reasonably good sense of who the frontrunners are. But this problem does not happen with Score Voting or Approval Voting. Let us slightly modify the previous example. Again, the perceived frontrunners’ names are in red.
 
Sincere preferences
Rees Avalos Lee Hall
10 7 4 0
Tactical exaggeration
Rees Avalos Lee Hall
10 10 0 0
Perceiving Avalos and Lee to be the frontrunners, and not wanting to waste one iota of vote power, this tactical voter has “polarized” Avalos and Lee, giving Avalos the maximum score of “10″, and Lee the minimum score of “0″ (this example additionally bolsters our refutation of the “bullet voting” criticism toward the end of this document). But note that unlike with a ranked ballot, this in no way necessitates pushing Rees down or elevating Hall. In election theory parlance we say that Score and Approval Voting satisfy the Favorite Betrayal Criterion, meaning that theynever give a voter an incentive not to maximally support his or her sincere favorite candidate. This powerfully mitigates the negative effects of tactical behavior. It means that if Rees has enough support, she can win if enough voters favor her, even if 100% of them think she has no chance. The same basic principle applies to Approval Voting: if the voter was only going to sincerely approve Rees, then the tactical route would be to approve Rees and Avalos — but this does not give the voter any incentive to betray Rees. Likewise, if the voter was going to sincerely approve everyone but Hall, the tactical route would be to not approve Lee, so as to help Avalos defeat Lee. But once again, the voter has no reason not to support Rees (or anyone else preferred to the frontrunners).
 
This frees voters to focus more on whether a candidate should win rather than whether that candidate can win. We theorize that this mitigates the impact of “indicators of electability” such as wealth disparities.

Resistance to tactical voting

The preceding example illustrates one respect in which Score Voting and Approval Voting are superior to IRV in this regard. However there are two other theorems about the behavior of Score and Approval in response to tactical voting.
  1. Approval Voting will, under plausible voter tactics, elect a Condorcet (“beats-all”) winner whenever one exists.
  2. Approval Voting will maximize the number of “pleasantly surprised” voters.

Spoilers

Score Voting and Approval Voting completely eliminate the “spoiler candidate” problem, whereas IRV merely mitigates it. (I.e. Score and Approval Voting satisfy the Independence of irrelevant alternatives, whereas IRV, and every other ranked voting method plus Plurality, do not).

Paradoxes

IRV has an extraordinary number of paradoxes. To be clear, by “paradoxes” we don’t simply mean “counter-intuitive behaviors”. We specifically mean logically contradictory behaviors.

Independence of Irrelevant Alternatives

This was mentioned above in the context of spoilers. To elaborate, failure of IIA means that a system can switch the winner from X to Y merely due to the entry of Z into the race, even if all the voters have exactly the same preferences for X and Y in both scenarios. It would be as if someone asked you what you preferred between chocolate and vanilla, and you said “chocolate”, then when asked what you preferred between chocolate, vanilla, and strawberry, you said “vanilla” — even if your preference for chocolate and vanilla was exactly the same in both cases.
 
To be fair to IRV here, a famous theorem known as “Arrow’s Theorem” shows that every ordinal (“ranked”) voting method fails this criterion. Only cardinal (“rated”) systems satisfy it. That means Score Voting and Approval Voting.

Participation / No-show paradox

A group of people who prefer X to Y can cause Y to win instead of X by voting. Or the converse of that, a group of people who prefer X to Y can cause X to win instead of Y by not voting.

Non-additive

As we stated previously, IRV is non-additive. For instance, candidate X could win at every single precinct, yet candidate Y could win when all the ballots are combined and counted. This necessitates centralized tabulation instead of simply summing precinct subtotals.

Non-monotonicity

A candidate can change from winner to loser by experiencing an increase in support, or change from loser to winner by experiencing a decrease in support. More here.

Reversal symmetry

If we reverse every ballot, trying to determine the worst candidate, we get the same result. IRV says the best and worst candidate is the same person. More here.
 
You can probably guess what we’re going to say. Score Voting and Approval Voting have none of these problems.

Better average voter satisfaction

The bottom line metric of voting method performance is how well election outcomes represent the will of the voters. The graph below, from page 239 of William Poundstone’s book Gaming the Vote expresses average voter satisfaction for a variety of voting methods, using an objective economic metric called Bayesian Regret. These figures were calculated by a Princeton math Ph.D. named Warren Smith, who has been studying election theory for over a decade.
graph of Bayesian Regret values for several voting systems
 
The further left the system lies on this graph, the better its performance. The two benchmarks are “magic best winner” (bottom-left) and “random winner” (top-right). The former represents the a hypothetical system which reads the voters’ minds and “magically” elects the candidate who would make the most voters the most satisfied. Whereas “random winner” means simply picking the winner at random, i.e. non-democracy. The width of each bar represents the range of performance for that system as a function of the rate of tactical voting. All of these systems do best with 100% sincere voters, and worst with 100% tactical voters. But the difference between best and worst performance for the different systems varies considerably, which explains the differing length of the bars.
 
One noteworthy observation is that Score Voting (referred to as “Range” in the graph) as well as Approval Voting perform better with 100% tactical voters than IRV performs with 100% sincere voters. This counters common criticisms (primarily from IRV advocates) that Score Voting and Approval Voting are “too vulnerable to tactical voting”.
 
How is Bayesian Regret calculated? It is actually a somewhat involved process, conducted via computer simulation, since voter satisfaction cannot be measured with real humans because there is no technology that can read voters’ minds and precisely measure their satisfaction with election outcomes. This almost invariably leads to skepticism from people newly introduced to the concept. Because, after all, we can’t even reliably predict weather, so how can we possibly simulate something as complex as human choice?
 
It actually turns out that these election simulations are vastly simpler than e.g. weather simulations, because we don’t have to simulate how a human brain decides upon preferences for a set of options. All we’re doing is saying, given a group of voters have some particular set of preferences (e.g. John Doe thinks Avalos=10, Chiu=8, Rees=5, etc.), and given that the voters vote based on those preferences (necessarily skewed by factors like ignorance and strategy), how close are election outcomes to the ideal?
 
There are a number of parameters (“knobs”) that can be tuned. We can adjust things like:
  • the number of candidates
  • the number of voters
  • the amount of “voter ignorance” (disparity between how well a particular candidate will ultimately satisfy a voter, and how satisfactory that voter presumes the candidate will be based on things like speeches and previous accomplishments)
  • the amount of tactical voting
  • the utility distribution algorithm (e.g. n-dimensional issue space, or simple random utilities)
When Warren Smith first conducted his simulations, he assumed his results would be similar to those of his predecessor Samuel Merrill. Merrill had used a less sophisticated algorithm, and had discovered that different voting systems came out as “best” depending on how the aforementioned “knobs” were tuned. For instance, Condorcet might do best with lots of tactical voting and three candidates, whereas Borda might do best with lots of sincere voting and five candidates. In that case, the results would be open to interpretation based on which knob settings were assumed to be a better approximation of reality.
 
Smith’s results differed not only in that he used more knobs, and more permutations of knob settings, but also because unlike Merrill, he included Score Voting. Which, to his surprise, came out best in every permutation of knob settings. And generally by a very wide margin that left a great deal of room for error.
 
We don’t expect this account of things to quash all doubts about the efficacy and value of these simulations. However, the source code has been available online for many years, and not one critic has yet to spot any biases or errors whose fixing changes the result. (Smith has since discovered some exotic voting systems which perform slightly better than Score Voting, but they are all far too complicated to ever consider for public elections.)
 
Bottom line: despite any potential flaws, these Bayesian Regret calculations are the most extensive that have been performed (to our knowledge), and are at the very least the best indication we’ve got as to the relative democratic-ness of these different systems. (A pervasive fallacy is to rely on voting method criteria instead of Bayesian Regret figures.)

Criticisms

Doesn’t Approval Voting violate “one person one vote”?

No. The term “one person one vote” refers to the weight of votes (i.e. ratio of voters to representatives), not to how votes are expressed. And in Approval Voting, all ballots have the same weight.

The U.S. Supreme Court made the “one person one vote” rule explicit in Reynolds v. Sims (377 U.S. 533). The rule stated that no vote should count more than any other so that it has unequal weight. This unequal weight would violate the Equal Protection Clause of the Constitution. And it was Baker v. Carr (369 U.S. 186) that extended the Equal Protection Clause to districting issues. In Reynolds, the state of Alabama set up its districts so that they varied wildly in population. The districting was so bad that it gave some voters’ ballots as much as 41 times more weight than others. Because the weights of the ballots were different between districts, that violated the “one person one vote” rule.

Additionally, consider a three-candidate Approval Voting example where all three candidates are tied. You vote for X, while a voter with your exact opposite preferences votes for Y and Z. After that, all three candidates are still tied. Those two ballots have an equal but opposite effect. The key here is that no voter can vote more than once for the same candidate. Another way to think about it is that every voter casts an “up” or “down” vote for every candidate, so all voters are technically casting an opinion on every candidate.

 
Bottom line: because Approval Voting weights all ballots the same it does not violate “one person one vote.”

Score/Approval Voting can fail to elect the favorite of a majority

"Approval voting would challenge our notions of majority rule: Adoption of approval voting could cause the defeat of a candidate who was the favorite candidate of 51% of voters. If this result were to happen the system would likely be repealed."
FairVote
 
There are numerous flaws with this argument. First, the concern about repeal seems ironic, since IRV has been repealed in four different USA municipalities in recent years (in all cases, shortly after its introduction), and is currently facing a credible threat of repeal in San Francisco. In all these cases, one of the primary arguments in favor of repeal has been general complexity and confusion, which we have noted are substantially alleviated with Score Voting and Approval Voting. (Indeed, this was cited as the primary reason why Dartmouth’s Student Assembly elections were switched from IRV to Approval Voting starting in the 2011 election.) And to take a practical (perhaps cynical) perspective, this problem would not even be detectable with Approval Voting (because there would be no way to know which of a voters’ approved candidates was the favorite on any given ballot); thus it could not realistically serve as a catalyst for repeal. The same would be true with Score Voting, if even a small fraction of the voters used tactical Approval-style voting (max and min scores only) for a significant portion of the candidate pool.
 
As for the criticism itself, it should be noted that IRV can do arguably much worse things. Here is an example IRV election with over one million voters, in which the winner is the favorite of only two voters, and would lose a head-to-head election against 19 of the 20 other candidates. An IRV advocate would rightfully criticize this example as being incredibly improbable. We agree! But so too is the Approval Voting scenario in which a majority-favored candidate would lose. The point is that it is not prudent to judge voting systems based on extremely unlikely hypothetical worst case scenarios. Instead we should judge them based on their typical/average performance over the course of a number of elections.
 
And it turns out that FairVote agrees with us! Consider this passage from page 268 of Gaming the Vote.
 
 
[FairVote] does not seem to dispute that nonmonotonicity is a bad thing. Its position is that such paradoxes are too rare to worry about. “We’ve had thousands of elections and its not an issue,” Richie says. Steven Hill, a senior analyst with [FairVote], dismisses “these mathematical ‘paradoxes’ that, while in theory are interesting for mathematicians to doodle around with on their sketch pads, in fact have no basis in reality … It’s also possible that a meteorite will strike the Earth and wipe out life as we know it—though not probably likely for a few more million years.”
 
 

Unfortunately for Steve Hill, at least two clear examples of non-monotonic IRV elections occurred in 2009, just one year after the publication ofGaming the Vote. From our page on monotonicity:

Burlington, Vermont – 2009 mayoral election

The 2009 mayoral election in Burlington, Vermont formed one half of a non-monotonic election pair. Bob Kiss won, but would have lost if some voters had ranked him higher. Or in other words, Kiss won because some voters didn’t prefer him strongly enough. FairVote’s response to this was to try to redefine monotonicity and insist (despite irrefutable empirical proof) that Burlington’s election did not experience non-monotonicity (FairVote argues that non-monotonic elections aren’t a problem if they aren’t strategically exploited and don’t affect how candidates or voters behave, proving that they don’t understand the actual problem with non-monotonicity.)

Frome electoral district, South Australia – 2009 House of Assembly by-election

This was the opposite type of non-monotonicity as the Burlington race. In the 2009 Frome state by-election, the independent won. But if 31 to 321 of the voters who preferred Liberal over Labor over independent had less strongly supported the Liberal (causing Liberal to be ranked lower than Labor) it would have caused the Liberal to win the IRV election. I.e. the Liberal lost because some voters ranked him too high.
 
Contra Hill, we didn’t have to wait for a few more million years to see non-monotonicity in action. (And presumably we would have many more examples like these, were it not for the fact that the publication of full ballot sets has historically been rare.)
 
One other noteworthy thing about that Burlington election is that the Progressive won, even though the Democrat was preferred to the Progressive by a sizable 54% to 46% majority — whereas the Progressive defeated the Republican in the final round by a comparably small a 51.5% to 48.5% majority. FairVote contends this was justified because the Democrat didn’t have enough first-place votes (or “depth” of support, as they put it). But IRV can actually elect X instead of Y, even when Y is preferred to X by a majority of voters and got more first-place votes than X.
 
Lastly, Score Voting and Approval Voting can never prevent a majority from getting their way, if they simply “bullet vote” for their favorite candidate. But here’s why they may want to “hedge their bets” and support lesser liked candidates. Say a voter has preferences such as X=10, Y=8, Z=0, and fears that Z has a realistic shot of winning, it can be quite reasonable for that voter to strategically vote for both X and Y. The risk is that this could cause Y to win instead of X, causing the voter to lose “two points of happiness”. But the potential reward is that this could cause Y to win instead of Z, causing the voter to gain a much greater eight points of happiness. Should Y end up narrowly defeating X, this voter really has no reason to feel regret, because the vote for Y was effectively an “insurance policy” whose potential payoff was much bigger than the potential cost — i.e. “a wise investment”. If you experience slightly less comfort by wearing your seatbelt during a car ride in which you reach your destination without incident, do you complain about the slightly reduced enjoyment of your trip? Or do you consider it a wise choice whose potential upside far outweighs the downside? All complex theoretical economics aside, let’s return to the practical reality — it is incredibly unlikely that any voter will ever discover that he was part of a majority that failed to get its way by hedging its bets. And if the voter is really worried about this, he has every right to bullet vote for his favorite. We acknowledge that some voters will do this, but the majority of voters will not, unless their favored candidate is a frontrunner, in which case it isn’t a problem (we elaborate on that in the subsequent section).
 
To summarize, any concerns about the “majoritarian-ness” of Approval Voting have equally or more concerning analogs which apply to IRV. And specifically with regard to the risk of such problems leading to repeal (back to Top-Two Runoff), we believe IRV has substantially more properties which would lead to a risk of repeal compared to Approval Voting or Score Voting.

Score/Approval Voting will degenerate into Plurality, due to “bullet voting”

A common (although extremely flawed) criticism of Score Voting and/or Approval Voting is that they will degrade into (sincere) Plurality Voting, because voters won’t want to hurt their favorite candidates by voting for anyone else. An example.
 
because "approving" a second choice may help defeat the voter's first choice, most experts agree that it [Approval Voting] is likely to devolve to typical vote-for-one pluarlity [sic] voting.
Terrill Bouricius
[Source]
 
FairVote activist Steve Hill echoes this sentiment.
 
 
But if range voting is used for public elections, once again smart candidates will urge their supporters to vote strategically by not rating other candidates—that is, to bullet vote. So range voting also would tend to regress to plurality voting.
 
In short, range and approval voting sound good in theory but have serious shortcomings that become apparent when one takes into account human psychology and the blood sport of politics, with their disincentives to honest voting.
Steve Hill
 
We have created this detailed response to the bullet voting criticism, but we shall go over some key elements here.
 
The most straightforward and devastating rebuttal to this criticism is the USA’s current widespread use of Plurality Voting, in which voters areforced to bullet vote and yet still often do not vote for their favorite candidate. An example that is seared into the collective American third party consciousness is the 2000 US presidential election, in which the vast majority of voters who preferred Green Party candidate Ralph Nader actually voted for someone other than Nader (most of them for Democrat Al Gore).
 
Steve Hill claims that “smart candidates will urge their supporters to vote strategically” for Nader, because obviously a vote for Gore would help Gore defeat Nader. But as we are well aware, Nader’s supporters ignored that order en masse, even when a vote for someone other than Nader required them to relinquish their right to vote for Nader (which doesn’t happen with Approval Voting or Score Voting). Why? Because they didn’t think Nader could win.
 
Astonishingly, FairVote themselves make this very same point in their criticism of Plurality Voting.
 
..many minor candidates genuinely seek to raise important issues. Their supporters must make a tough decision: to vote for their favorite candidate, knowing that the candidate won't win and might even throw the race to the supporters' least preferred candidate, or to settle on a less preferred candidate who has a chance to win. In other words, voters must accurately judge not only which candidate they prefer, but whether that candidate has a chance of winning.

Rob Richie, Caleb Kleppner, and Terrill Bouricius

Here FairVote is directly acknowledging the reality that voters often do not vote for their favorite candidate. (Additionally, the bullet voting argument directly contradicts their previously addressed criticism, that a majority-favored candidate may lose — they want to have it both ways.)

 
Based on the 2000 US presidential election, we now ask the reader to consider what would theoretically result if a Nader supporter who had cast a strategic vote for Gore had instead been given the option to vote for an unlimited number of candidates. Do you believe that most voters under such circumstances would most likely:
  1. Cast an additional vote for Nader (and any other candidates preferred to Gore), or
  2. Switch from Gore to Nader, still casting only a single vote (i.e. sincere Plurality Voting instead of strategic Plurality Voting)
If you chose the first option, then we think you agree with us that bullet voting will certainly not cause a regression to Plurality Voting. If you chose the second option (which we consider to be extremely implausible), then at the very least that is an upgrade from tactical Plurality Voting tosincere Plurality Voting, meaning that third party and independent candidates will get a true representative share of the vote, and that election outcomes will be substantially better (every election system I’m aware of results in lower Bayesian Regret the more sincere the voters are). So even that “worst case scenario” is quite an improvement over the ordinary tactical Plurality Voting that FairVote expressed concerns about.

Empirical data

The preceding was, to an extent, a theoretical argument. Because even though we cited a widespread trend within real US elections, those were still Plurality Voting elections, not Score Voting or Approval Voting elections. We now look at empirical data from various elections and exit polls.We would like to emphasize that our analysis of several comparable results actually showed that IRV elections had higher rates of bullet voting than did Score Voting or Approval Voting elections.
 
The German Pirate Party
In recent years there have been some significant contentious Approval Voting elections of political consequence, due to its adoption by the German branch of the Pirate Party (the Piratenpartei).
 
While it may have a controversial (or some might even say silly) name, the Piratenpartei has seen rapid growth and electoral success since its founding just a few years ago. For instance, they won 10% (15 of 152) of the seats in the Berlin parliament elections in September of 2011 (Berlin is 1 of the 16 German states, so this is not merely a city election), completely booting out Angela Merkel’s ruling coalition partner, the FDP. So again, their party elections are indeed contentious struggles for real political power. Because the Piratenpartei publishes their election results, we were able to track down and study a number of them, all of which exhibited more than one approval per ballot — most by a substantial amount.
 
In January 2011, their party chairman Sebastian Nerz was chosen from among eight candidates, with an average of 1.73 approvals per ballot. I emailed Sebastian to ask him his thoughts on Approval Voting. Aside from the greeting and salutation, here was my verbatim email to Sebastian.
 
My name is Clay Shentrup and I am a voting methods researcher in San Francisco.
 
I was hoping to get your feedback on the Approval Voting system used by the Piraten Partei. Do you feel that it worked well and was simple? Is there any other system you would prefer?
 
Here is a verbatim excerpt of his reply.
 
I think that the Approval Voting system is working quite well! Most party members have no problems understanding the system itself, voting is fast and intuitiv [sic].
 
I know that several other systems are used by different parts of the Pirate Party Germany (e.g some districts use a simple Plurality voting
system because it is easier if there are only few candidates). But Approval Voting is by far the most common voting system in the Pirate Party.
 
Personally I like Approval Voting. A normal Plurality vote gives too much weight to the favorites – Pirates would need to elect the favorite
best matching their ideas, not the one they really want. And instant-runneroff-systems tend to be more complex. I believe that one should be able to explain the voting mechanisms in two sentences.
 
Note that Sebastian’s comments echo many of the positive claims we made above.
 
The UN Secretary General
The UN secretary general election of 2006 (approval voting, 6 candidates) featured 39 approvals, 35 disapprovals, and 16 “no opinion” votes from 15 voters, an approval fraction of 260%. Since the ballots were secret I do not actually know the percentage of approve-1-disapprove-rest “bullet style” ballots, but it is possible to tell from the data they did publish, that at most 3 of the 15 voters cast a bullet-style ballot. I.e, the percentage of bullet-voters was at most 20%. This is only an upper bound. The lower bound is 0. There overall were more approvals than disapprovals, the exact opposite of what would have happened if there had been a lot of bullet voting. Also, if there really were 3 bullet-ballots (meeting the upper bound) then the remaining 12 ballots would each have had to have approved exactly 3 of the 6 candidates – or somebody must have approved at least 4 of the 6. The uniqueness of this (1,1,1,3,3,3,3,3,3,3,3,3,3,3,3) configuration and the fact it contains a “gap” at 2 both make it seem unlikely; and it also seems unlikely (especially to believers in the prevalence of “bullet voting”) that any voter approved 4. Therefore it is likely that the 20% upper bound can be decreased to 13.3%. Hence the true rate of bullet voting was either 0, 1/15=6.7%, or 2/15=13.3%.
 
Dartmouth College Alumni Association
Dartmouth College’s alumni association used Approval Voting during 1990-2007 to fill vacancies as they arose on its 18-member Board of Trustees. Each election involved 3 “nominated” candidates plus perhaps additional “petition” candidates (usually 3 or 4 in all). The final Approval Voting election, held in 2007, had 4 candidates. It was won by S.F. Smith with 9984 approvals on 18186 ballots (54.9% approval). There were 32941 approvals in all, i.e. 181% (1.81 approvals per ballot). This implies that at most 59.5% of the ballots were bullet-style, and the only way it would be possible to meet this upper bound would be if every ballot approved either 1 or 3 candidates (never 2). If instead every ballot approved either 1 or 2 then the fraction of approve-1 ballots would have had to be 19%. So the bullet fraction, we estimate, was between 19% and 59.5%. Robert Z. Norman, a Dartmouth math professor and vocal Approval Voting advocate, explains:

the claims about bullet voting in the Dartmouth Alumni election [by Rob Richie and other IRV proponents] remind me that with a per voter average of voting for 1.8 candidates, the proportion of bullet votes has to be fairly small. The alternative..is that nearly everyone voted for one or three candidates but not two. Unlikely as that might be, it would suggest that most of those who voted followed a strategy of either voting for the petition candidate or voting for all [3 opposing] nominated candidates, in which case Richie’s claim that the opposition was disorganized falls apart, as does the claim by some of the Alumni Council people that in a 1 on 1 situation the petition candidate would [have] been defeated.

After this election, Dartmouth repealed Approval Voting in favor of Plurality Voting. FairVote predictably cited this decision as an indictment of Approval Voting, and confirmation of the bullet voting problem. However, as we just discussed, this election featured an average of 1.81 approvals per ballot — nowhere near a regression to Plurality Voting. Looking into Dartmouth’s published reasons for the repeal, essentially all of them are unfounded and contain serious mathematical/logical fallacies. So we think there is something deeper going on here.
 
We further reiterate that Dartmouth’s Student Assembly switched from IRV to Approval Voting starting in the spring of 2011, fully aware of the previous decision by the Alumni Association. This year (2012), a former FairVote intern named Will Hix tried to get Approval Voting repealed in favor of a ranked system. This was unanimously rejected.
 
French Approval Voting study
In the French approval voting study (thousands of voters, 16 candidates, presidential election of 2002; probably the largest approval voting study ever), the Plurality vote totalled 100% and the Approval votes totalled 315%, and the percentage of “bullet style” (approves exactly one) ballots was 11.1%.
 
German Approval Voting study
A subsequent similar study was conducted at three voting stations in Messel, Germany, with government approval/cooperation. 1,909 voters participated (72.6% of the voters at those election sites).

In the district elections, the average voter voted for 1.86 candidates out of 8.

 
In the state elections, the average voter voted for 2.25 parties out of 17.

Example result: In the state elections, the Greens went from having 18% as many votes as the first-ranking SPD (in the real election), to having 67% as many votes as the SPD in the Approval Voting election (which was held at the polling site, simultaneously with the real voting). This demonstrates the more accurate representation of support that presumably “unelectable” entities see with Approval Voting and Score Voting.

Other uses of Approval Voting

San Francisco State University uses Approval Voting ”for Academic Senate, for University committees, and for College committees”.
In the early 2000s the Boston Tea Party became apparently the first US political party in modern times to employ Approval Voting. Approval Voting is also used by the state Libertarian Party in Colorado and Texas.
 
Several large organizations, with membership well in excess of the number of citizens in many US cities, use Approval Voting:
  • Mathematical Association of America (MAA), with about 32,000 members;
  • American Mathematical Society (AMS), with about 30,000 members;
  • Institute for Operations Research and Management Sciences (INFORMS), with about 12,000 members;
  • American Statistical Association (ASA), with about 15,000 members;

Smaller societies that use Approval Voting include the Society for Judgment and Decision Making, the Social Choice and Welfare Society, the International Joint Conference on Artificial Intelligence, the Public Choice Society, and the European Association for Logic, Language and Information.

Approval and Score Voting were the foundation of government in renaissance Venice, and Ancient Sparta, respectively. These were two of the longest lasting (perhaps the two longest lasting) democracies ever. Also, Cardinals used Approval Voting for centuries to elect the Catholic Pope (at the time the most powerful elected person on the planet).

FairVote

Many of our responses to criticisms from FairVote members consist of citations of concrete examples and figures which directly contradict FairVote claims, as opposed to subjective differences about e.g. the relative priorities of different facts on which we basically agree. As such, we understand that the reader may ask, “why do you think FairVote would be disingenuous?” This is actually a volume unto itself, but just a little history on the organization will shed some light on the subject.
 
From this page at FairVote’s web site.

The ignition: Four separate pro-PR organizations form in 1991-1992 with the name Citizens for Proportional Representation (CPR). Former congressional aide Matthew Cossolotto starts a national group in Washington, D.C., writes a Christian Science Monitor commentary and appears on C-SPAN.  Rob Richie and Cynthia Terrell start a regional group in Olympia, Washington and Steven Hill a third in Seattle. A new campaign in Cincinnati is called CPR. Richie and Terrell go to Cincinnati to assist the campaign, which falls just short. Richie, Cossolotto and Cincinnati’s Bill Collins join together to organize a founding conference.

1992: In June 1992 reformers from 17 states come to Cincinnati for the founding conference of Citizens for Proportional Representation. Ted Berry, Cincinnati’s first black mayor, welcomes reformers with an inspirational speech, and a sterling mix of activists and scholars make the case for change and launch the organization. Richie is named director, Cossolotto the president and former Congressman John Anderson soon becomes the head of the national advisory board that includes Jack Gargan, Manning Marable, Arend Lijphart, Eleanor Smeal and Sam Smith. CPR opens operations in Alexandria, Virginia.

As you can see, FairVote came into the world as an organization whose primary mission was to reintroduce proportional representation into American government. Specifically, the Single Transferable Vote system (STV) which is used in Australia’s Senate. It should come as no surprise that IRV is closely related to STV. IRV is in fact the single-winner form of STV. As such, FairVote sees IRV as a useful “stepping stone” from IRV to STV, because it introduces the ranked ballot, as well as a simplified form of the STV algorithm.

 
We believe this explains FairVote’s zealous defense of IRV for single-winner elections, as well as their criticism (usually in blatant contradiction of the available evidence) of systems like Approval Voting and Score Voting. This explanation fits with some of the more egregious assertions by FairVote members. For instance, see this recent bullet voting slander against Approval Voting by FairVote head Rob Richie. The winner of the cited election was approved by 40.7% of the voters, and there was an average of 1.48 approvals per ballot. Yet Richie used this election to flog the bullet voting argument once again.
 
MrsB wins "poster of year" at UK blog w/just 28% in approval voting election. Approval often devolves to plurality vote .
 
This blatant mischaracterization of the election only makes sense if Richie has some ulterior motive. And the IRV/STV “stepping stone” strategy is a fitting explanation which is harmonious with FairVote’s statements and actions throughout the past 20 years. For instance, in 1996 Steve Hill was one of the lead advocates of Measure H, which would have instituted proportional representation in San Francisco via the STV system.
 
This should in no way be misconstrued as a criticism of proportional representation. We have looked into PR extensively, and we agree that PR systems can produce a number of benefits. However, this should not bias our judgment of single-winner election methods. And further, because San Francisco uses single-member districts, we are not likely to see PR in the near future. This is all the more reason to work for the best single-winner system we can realistically implement.
 
Clay Shentrup (with numerous contributions from others)
Noe Valley
[email protected]
206.801.0484
email