User:Adamw/Draft/Board Election analysis

Since 2013, the Wikimedia Foundation governance elections have relied on a flawed formula, which in one case unfairly ranked the winners. These elections nominate three seats on the Board of Trustees, and appoint the Funds Dissemination Committee and Ombuds.

This analysis is meant to illuminate the issues, and gives us tools for evaluating alternative election systems.

How votes were tallied

We'll focus on the 2015 Board of Trustees election, a tight race in which the popular vote winners were not elected. This was entirely due to the novel "support percentage" vote tallying method.

For comparison, a fully elected Board of ten could have been selected from the field of 22 candidates. Candidates in the top ten did well enough to get the support of anywhere from 25% to 42% of voters, while getting less than 15% oppose votes each. 88% of voters would have seen at least one of their preferred candidates elected. However, the Wikimedia Foundation's Board only has three elected seats.

Choosing three winners was going to be difficult, and the task of the Election Commission was to somehow weed out a large number of qualified and popular candidates.

The rules^[1] the Commission decided on for the 2013 election seem to have no precedent. There had been some criticism of the Schulze method used in previous elections, so the Commission came up with something new on the fly.^{[citation needed]}

These elections were conducted using a system where each voter can choose to Support or Oppose each candidate, or vote Neutral if they don't care. The votes are tallied using a "support percentage", where the number of Support votes is divided by the number of Support plus the number of Oppose votes. This score is supposed to yield the percentage of voters who expressed a strong opinion and who preferred the candidate. Winners were chosen by taking whose with the highest support percentage scores.

An anonymous election commission member^{[citation needed]}^[2] reported that the election commission chose this measure in order to weed out controversial candidates, because negative votes would presumably only be cast for bad candidates rather than tactically, to benefit other candidates.

Impact on the election

The actual choice of winners would have been substantially different, depending on the tally method used. See the graphs below for a comparison of how the results would have changed if we had used the next most reasonable alternative, a simple approval tally of votes in support of each candidate:

Winners by "support percentage"

Winners by popular vote

These graphs also show that the exotic tally method was unjustified—we did not weed out any controversial candidates. All of the top 5 candidates have a support percentage between 73% and 78%, and the "undesirable" candidates were already eliminated, far down in the list of support votes.

The problem, in brief

The vote was tallied using "percentage of support", or ${\dfrac {S}{S+O}}$ , which is incorrect because it's insensitive to the number of people voting for a candidate. For example, see how it applies to these two candidates:

Candidate A gets 2 support votes and 1 oppose vote. ${\dfrac {2}{2+1}}=67\%$ "support percentage".

Candidate B gets 200 support votes and 100 oppose votes. ${\dfrac {200}{200+100}}=67\%$ "support percentage".

It's counterintuitive that the candidates are tied. As you can see, if one Candidate B voter flips to Candidate A, the scales are tipped and the obscure candidate will win with only 3 votes. Before this tiebreaker, both candidates had the same ratio of oppose votes, so it would be unreasonable to say that this was a desirable rejection of the unpopular candidate.

We'll look at some formal evaluation criteria which are notably failed by this election. There are no rules saying that we must honor any of these criteria, but I'm presenting them so that we can consider which properties are desirable in future elections.

One person, one vote

The one person, one vote principle, that each person's vote is equal in weight to every other person's vote, is a law which has applied to all public elections in the United States since 1946, but has been recognized as a basic democratic requirement since at least ancient Rome and the Haudenosaunee Confederacy. This principle is often imperfectly implemented, for example with the lack of women's suffrage, gerrymandering, and so on.

We failed this test: some votes had more than five times the weight of others, depending on which candidate they were cast for and how other people voted.

Voters are equal

We satisfied this criterion. If voter $a$ and voter $b$ cast the same vote, their votes have the same weight and the same effect on the total score.

Let our tally method $M$ give a number for the total score over a set of votes $V$ for each candidate $i$ .

$M(V_{i})\to \mathbb {R} ^{+}$

This metric is used to rank candidates and choose a winner.

To state that voters are equal, if $v_{a}=v_{b}$ , then $M(V+v_{a})=M(V+v_{b})$ .

No candidate is privileged

This was a big fail. Votes should have equal weight regardless of which candidate they support.

Given a voter $a$ who supports candidate $i$ and voter $b$ who supports candidate $j$ , the contribution of each vote to the total score of each candidate should be equal.

$M(v_{a,i})=M(v_{b,j})$

Proof that we failed is a corrolary of the next example.

Vote weight should be independent of other votes

The impact of a vote should be constant regardless of how other people voted. We failed this criterion as another bizarre effect of the tally method used.

$M(V_{i}+v_{a})-M(V_{i})$ is constant for all $V_{i}$ .

In our election, votes for less popular candidates paradoxically had much more weight than votes for a popular candidate. Each additional vote changed from oppose to support for User:Peteforsyth would have added

${\dfrac {108}{108+439}}-{\dfrac {109}{109+438}}={\dfrac {1}{547}}=0.183\%$

weight to his total score, but the same vote swing for User:Raystorm would only have added

${\dfrac {2184}{2184+775}}-{\dfrac {2185}{2185+774}}={\dfrac {1}{2959}}=0.034\%$

to her tally. This is a difference in vote weight of more than 5-fold.

Monotonicity criterion

The "Support-Neutral-Oppose" voting system fails the monotonicity criterion, by providing a way to harm a candidates' ranking beyond simply not voting in support. For example, if a voter chooses to vote for Candidate A, but casts an oppose vote for all other candidates, it's roughly equivalent to voting twice for the candidate they support. Another way to phrase the idea behind this criterion is that voters should have exactly one vote with which to support their preferred candidate.

Later-no-harm criterion

Our tally method, the S-N-O system, and also a simple approval vote for multiple seats, all fail the later-no-harm criterion. Voters who supported candidates who were not their first choice actually harmed their preferred candidate's chances for election.^{[citation needed]}. By changing these votes to "oppose", they would have instead improved their candidate's prospects.