Information Theoretic Positional Scoring
This would be a specific positional scoring scheme where the values of the positions are determined as the Shannon information content of the event that a candidate is in the top position of a ballot.
For example, if there are N candidates, a priori the event that a candidate is in the very top position is 1/N, and therefore has a Shannon information content of log(N).
Given the top candidate, the event that a second candidate is in the second highest position has probability 1/(N-1), and has a Shannon information content of log(N-1), and so on.
So in total this scheme assigns position K the score log(K), and this way a candidate is scored based on the information content of them being in the highest available position given that the top positions are already filled.
With this scheme the total Shannon information content of a ballot is always log(N!). This is identical to the product scoring scheme, I just wanted to indicate some of its theoretical backing, which if accepted makes the cardinal score values somewhat less arbitrary.
Andy Dienes last edited by
"Conditioned on the candidate ranked 27th, the event that another candidate is in the 144th position has probability 1/(N-1) so we should assign it a score of log(N-1)"
I don't think this line of reasoning is sufficient to motivate a log(n) positional scoring rule.
@brozai But that isn’t the reasoning I used!
Being in the highest available position is a positive (the most positive possible) event for a candidate, and the Shannon information content models the amount of surprise of an event. So more or less this reasoning is that the candidate with the highest level of positive surprise should win the election.
Obviously it can be scrutinized, but ideally not with a straw man please. The fact that one different, flawed argument leads to the same conclusion does not mean that the conclusion or the original argument itself is flawed.
Andy Dienes last edited by
@cfrank A candidate being in the 27th position is also surprising. If a candidate is ranked exactly 27th on every single ballot that would be incredibly surprising, but obviously that wouldn't mean they should win!
I am not trying to straw man, and if you'd like I can try to evaluate a log(n) positional scoring rule on its own merits. However, I think if you're trying to go down the route of choosing the winner which 'best explains' the ballots in terms of information content, there's really no reason to use anything except Kemeny-Young.
@brozai that’s perfectly OK. I was not trying to rationalize the product scoring method, although it is not difficult to do. I was just trying to investigate this line of reasoning and was pleasantly surprised that the product scoring scheme came out naturally. Being in the 27th position is not even well-defined in every election, nor even if so is it necessarily a positive event.
I wasn’t trying to select a winner based on whether they best explain the ballots, the positional score value log(K) literally is just the Shannon information content of the event that a candidate is in the Kth slot given already that the candidate is not in any higher slot.
This means that the score given to a candidate under the product score scheme is essentially the amount of information that is lost in trying to determine the position of the candidate in a sequence of ballots, assuming that the candidate is uniformly distributed across the positions rather than assuming that the candidate is always found in the highest available position. More to the point, if we divide by the size of the electorate, then as the number of voters increases, it also converges to the average number of slots in a random ballot that will not need to be checked in order to determine the position of the candidate if we always make that assumption that they are in the highest unchecked slot, starting from the highest position and moving down by one slot if the candidate is not found.
I definitely think that a concave scoring scheme is important so that compromises are emphasized. The logarithm accomplishes this and has this information theoretic translation so that at least the scores are theoretically meaningful in one sense.
Just as an illustration of this effect, suppose we had just two ballots and three candidates a,b,c, and let the ballots be
Under a standard scoring scheme, all of the candidates are tied with a total of 4. Under the product scoring scheme, in contrast, b beats a and c with a score of 4 compared with scores of 3, arriving at the only reasonable compromise available from these two ballots.