Rule X extended to score ballots

@toby-pereira Is your issue that JR-like criteria are clumsy or that they are weak?

I find the definitions pretty appealing, although there may be other important parts of the story to consider (e.g. stability, Pareto).

There are many ways to strengthen ideas like PJR, although they usually are not satisfiable over the entire ballot domain, so you can also look at compliance of a method when it is possible.

I see JR / PJR / EJR as more of a litmus test; passing it does not tell you a lot about the method, but NOT passing it requires a very good explanation for the method to be considered further.

Toby Pereira

@brozai I suppose I find them both clumsy and weak. Take Justified Representation. It refers to the case where a Hare quota of voters all approve one candidate. Intuitively, you might think that as a solid Hare quota, they should all get this candidate elected. But further reflection suggests this isn't always possible.

For example, there might be two candidates to be elected in an election. Three candidates are approved by 50%, 51% and 51% of the electorate respectively with no correlation between voting for one or any of the others. In this case the two on 51% would be elected, so not the candidate on 50%.

Of the voters who approved the 50% candidate (it has a full Hare quota), slightly under a quarter of them would not have a candidate elected.

It would be impossible to guarantee that every voter in a Hare quota group gets a candidate elected. So how do we make a criterion around it? Well, they went for the minimum possible option - one of the voters must have a candidate elected. At that stage it just seems like a pointless nod towards something they were probably trying to achieve but realised they had to back out of.

@toby-pereira I think this example actually shows my point. In this case, (with very high probability if the approval sets are truly uncorrelated) any two of the winners will satisfy EJR (and thus also PJR/JR), so it is not restrictive at all. I'm sure you have seen this paper thrown around before, but it gives some nice examples of how two different committees, both satisfying EJR, are not necessarily all of the same quality.

Unless I'm misunderstanding the setup, I'm not sure why you are saying the two 51% will necessarily be elected (although, in this case it does seem like the 'right' choice).

I agree that it is a somewhat weak criterion in the grand scheme of things, and I would like to simultaneously optimize a metric like maxPhragmen (or related). I don't see them as clumsy at all though. In fact, every quota-spending rule (that sequentially selects a candidate with >= quota of support when they exist) satisfies PJR and thus JR.

You can find the versions of JR I think you're referring to as "strong" and "semi-strong" JR, where a solid Hare quota gets exactly that consensus candidate elected. Unfortunately, as you said, it's not always possible.

Edit: I should probably mention there is another intuitive criterion, perfect representation. This is when the voters can be exactly divided into quotas such that each quota gets a unanimous winner. Obviously, this is also not always possible, but more importantly it is incompatible with EJR. This is one reason maybe it's reasonable to consider EJR 'clumsy.' However, it is compatible with PJR. It seems to me that PJR is weak enough such that any noncompliance is likely indicative of deeper problems. In particular, optimization of the maxPhragmen metric implies PJR. Your 'squared load' metric I believe is equivalent to the varPhragmen objective function, which implies JR.

Ted Stern

@keith-edmonds @brozai , is there a way to get the Wikipedia page author to also put a page on electowiki.org? Seems like it might be faster since no approval would be required.

Marylander

@andy-dienes said in Rule X extended to score ballots:

I see JR / PJR / EJR as more of a litmus test; passing it does not tell you a lot about the method, but NOT passing it requires a very good explanation for the method to be considered further.

Although didn't the paper say that anything passing EJR was going to be difficult to compute efficiently? That suggests that for some applications, there is a good explanation to consider sacrificing EJR: you can't spare either the time or the complexity.

On the other hand, that's still no excuse to use a method that doesn't pass PJR, even a sequential one.

@marylander I suppose it depends on how you define 'efficient.' Rule X satisfies EJR, and I would not say it is much more complicated than any other quota-spending method! Computationally it is the same amount of work.

no excuse to use a method that doesn't pass PJR

Completely agree. This is why welfarist rules like RRV are basically a non-starter in my eyes.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

This is why welfarist rules like RRV are basically a non-starter in my eyes.

Agreed. So it comes down to SSS, MES and Allocated score.

Consider this 5 winner example with clones for each candidate
Red: 61% vote A:5, B:3, C:0
Blue: 39% vote A:0, B:3, C:5

RRV Gives ['A1', 'C1', 'A2', 'B1', 'B2']
MES Gives ['A1', 'A2', 'A3', 'C1', 'B1']
SSS Gives ['A1', 'B1', 'B2', 'B3', 'B4']
Allocated score Gives ['A1', 'B1', 'A2', 'B2', 'A3']
STV Gives ['A1', 'A2', 'A3', 'C1', 'C2']

I could have made a calculational error but I did it with code which I can post if people want to look for bugs. If correct this is super interesting. They all give different results.

Which sets are in the core? If any?

Keith Edmonds

OK lets look at these results more deeply

RRV Gives ['A1', 'C1', 'A2', 'B1', 'B2']
-- Total Utilities for each group of 16 and 11

MES Gives ['A1', 'A2', 'A3', 'C1', 'B1']
-- Total Utilities for each group of 18 and 8

SSS Gives ['A1', 'B1', 'B2', 'B3', 'B4']
-- Total Utilities for each group of 17 and 12

Allocated score Gives ['A1', 'B1', 'A2', 'B2', 'A3']
-- Total Utilities for each group of 21 and 6

STV Gives ['A1', 'A2', 'A3', 'C1', 'C2']
-- Total Utilities for each group of 15 and 10

The fist thing to check for is if the winner sets are stable. The larger group is 61% so it should control 3/5 of the seats. The best set of size 3 for them is [A,A,A] giving utility of 15. Each set above gives the larger group at least 15 so [A,A,A] does not block. Similarly for the smaller group controlling 1 seat [C] does not block. For sets of size 5 it must be preferred by every voter. The set given by SSS blocks both STV's and RRV's set meaning that those sets are not stable. I think the other 3 are.

I am not sure how to choose between the three remaining sets but it is interesting to consider that the members of 61% group would be expected to have about 61% of the total utility. For STV this is nearly exact since 15/(15 + 10) = 60% this is why people tend to think it does well.

RRV: 16 / (16 + 11) = 0.59
MES: 18 / (18 + 8 ) = 0.69
SSS: 17 / (17 + 12) = 0.58
Allocated score: 21 / (21 + 6) = 0.77
STV: 15/(15 + 10) = 0.60

Interestingly RRV and STV both do very well even though they were eliminated by being unstable. SSS does well too but MES and Allocated Score do not.

Another way to think about it is total utility
RRV: 16 + 11 = 27
MES: 18 + 8 = 26
SSS: 17 + 12 = 29
Allocated score: 21 + 6 = 27
STV: 15 + 10 = 25

Again SSS is best. As the inventor of SSS I am trying to not be biased. Is there something I am missing? Is this just a special case? Perhaps we should simulate it like we did before to see what typical results are

@keith-edmonds Purely qualitatively, I think given this profile, Red needs to win more than Blue since it is equally cohesive and 50% larger. Again qualitatively, I think a cohesive group representing 40% of the population should get at least one of their top winners.

This leaves AAABC (from MES) and AAACC (from STV) as what I would identify as the 'best' performance on this particular ballot profile. However, this one is a bit of an edge case and the numbers work out very closely. I always feel uneasy about making too large conclusions from single examples, since any method can be made to look bad.

The results will also look extra weird because the ratios of supports do not play well with 5 winners for this specific scenario. With 4 winners I think AABC will look very reasonable, and with 6 winners AAABCC or AABBBC is right.

In particular, if the Red voters are strategic really at all, it looks like they can squeeze out more winners in RRV and SSS by burying B. I am interested in using the definition of "balanced stable priceability" here https://www.cs.toronto.edu/~nisarg/papers/priceability.pdf to measure stability---I am not sure how or if it relates to the definition based on blocking sets.

Marylander

@keith-edmonds said in Rule X extended to score ballots:

Interestingly RRV and STV both do very well even though they were eliminated by being unstable. SSS does well too but MES and Allocated Score do not.

On the other hand, RRV and STV choose winner sets where all voters are strictly worse off than under the SSS winner set, so if we make the assumption* that the sum of the scores can be used to determine which overall committee the voter would approve, then could be interpreted as quite a bad example for RRV and STV.

* I'm not calling it an unreasonable assumption, but it is an assumption and so I'm stating it. Perhaps we could test it with surveys, although in my opinion the meaning of scores depends on the voting system to some extent, so it might not be easy.

Edit:

@andy-dienes said in Rule X extended to score ballots:

@keith-edmonds Purely qualitatively, I think given this profile, Red needs to win more than Blue since it is equally cohesive and 50% larger. Again qualitatively, I think a cohesive group representing 40% of the population should get at least one of their top winners.

I started writing this post before you replied @Andy-Dienes (so I posted before I could comment on what you said), and I just want to observe that this objection I think relates to my discussion about about whether we can assume that "greater sum score = better winner set" at the level of the individual.

@marylander said in Rule X extended to score ballots:

the meaning of scores depends on the voting system to some extent

1million% agree.

Imagine voting rule F, which is just pure sum-of-score
winner(F) = maximize sum of score(voter, candidate)

now imagine voting rule G, which is sum of squared score
winner(G) = maximize sum of score(voter, candidate)^2

Then almost all voters will change their ballots depending on the rule being used such that the outcome is the same, i.e. square all their scores. If we just measure linear utility of both, we will get different results, even though the strategic dynamics are pretty much the same and the winner is the same.

Also, just to wax philosophical & speculative for a moment (and then I promise I will return to quantitative thoughts) I think we all agree that allowing for compromise & centrist candidates is a good thing. However, it seems that if a voting rule has too much of a centrist bias then voters will exaggerate their preferences to compensate. When voters exaggerate their preferences, it seems like they might start to believe those preferences, and then ironically a more centrist voting rule creates a more polarized electorate. This would be taken to the extreme in something like Block Score, which is the maximally-centrist method but would very plausibly lead to divisiveness in an electorate.

Of course, the above is just a hypothesis; I won't claim to be able to understand or predict long-term societal dynamics. I will say, I think all else held equal, a proportional representation method should attempt to simply replicate the distribution of voters. Any bias towards centrism must be minimal enough to not create an unstable outcome. In the case of a method on that profile choosing AAABB or ABBBB, I would certainly feel 'cheated' if I were a Blue voter, and in subsequent elections I would probably be less willing to compromise.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

In the case of a method on that profile choosing AAABB or ABBBB, I would certainly feel 'cheated' if I were a Blue voter, and in subsequent elections I would probably be less willing to compromise.

This contradicts you prior statement about how a voter will adjust their scoring to the system.
Suppose the the voter is using a mental model to map utility, u, to score, s.

s = S(u)

What we want is for this function to obey Cauchy's functional equation such that.

S(u1 + u2) = S(u1) + S(u2)

This mental model is derived from how the system treats scores. In SSS and MES the scores are treated linearly so that such a model arises naturally. In RRV and Allocated Score this is not the case so they will have to adjust the S(u) function to compensate. Even though it is not clear how.

Having this additive property is nice since it means that if you like a candidate have as much as another then you should score them half as much. Simplicity is important.

Consider the Blue group comparing ['A1', 'B1', 'B2', 'B3', 'B4'] to ['A1', 'A2', 'A3', 'C1', 'C2']

The expressed the scores B=3 and C = 5

B + B + B + B =12
C + C = 10

B + B + B + B > C + C
S(uB) + S(uB) +S(uB) +S(uB) > S(uC) + S(uC)
S(uB +uB + uB + uB) > S(uC + uC)
uB +uB + uB + uB > uC + uC

Which proves they are happier with 4 Bs than 2 Cs. If they are not happier then they did not use that mental model to map utility. In this sense, SSS and MES can punish strategic voting.

@keith-edmonds said in Rule X extended to score ballots:

This contradicts you prior statement about how a voter will adjust their scoring to the system.

Hm, fair enough.

I think my mental model of voter preferences is something like:

Voters (generally) have a better sense of a preference ranking than they do actual utility distributions over candidates
Voters' utilities (whether they acknowledge it or not) tend to decay somewhat geometrically over their preferences
Even given the above two assumptions, voters nonetheless tend to report utilities in more linear way over their preferences

This third point led me to tacitly apply some superlinear transformation to the utilities when I am thinking about who should win qualitatively, but I see what you are saying that if voters wanted this superlinear transformation interpretation they would have just voted that way. It still would feel weird to me for C not to get even 1 winner, but I will try to quantify that in a more robust way.

Toby Pereira

@andy-dienes said in Rule X extended to score ballots:

@toby-pereira I think this example actually shows my point. In this case, (with very high probability if the approval sets are truly uncorrelated) any two of the winners will satisfy EJR (and thus also PJR/JR), so it is not restrictive at all.

My point wasn't that it was restrictive, but that it seemed a bit weak, making requirements about just one voter (but you acknowledged that later in your post).

Unless I'm misunderstanding the setup, I'm not sure why you are saying the two 51% will necessarily be elected (although, in this case it does seem like the 'right' choice).

Well, the two 51% candidates should be elected under any reasonable method given the lack of any correlation. But in any case, the point is that it shows that approximately 1/8 of the electorate will be unrepresented despite being part of a Hare quota.

Edit: I should probably mention there is another intuitive criterion, perfect representation. This is when the voters can be exactly divided into quotas such that each quota gets a unanimous winner. Obviously, this is also not always possible, but more importantly it is incompatible with EJR. This is one reason maybe it's reasonable to consider EJR 'clumsy.' However, it is compatible with PJR. It seems to me that PJR is weak enough such that any noncompliance is likely indicative of deeper problems. In particular, optimization of the maxPhragmen metric implies PJR. Your 'squared load' metric I believe is equivalent to the varPhragmen objective function, which implies JR.

There are cases where it is arguably undesirable to have perfect representation. I added an example to the wiki page. But to copy and paste:

Consider the following election with two winners, where A, B, C and D are candidates, and the number of voters approving each candidate are as follows:

100 voters: A, B, C

100 voters: A, B, D

1 voter: C

1 voter: D

A method passing the perfect representation criterion must elect candidates C and D despite near universal support for candidates A and B. This could be seen as an argument against perfect representation as a useful criterion.

Also, on PJR, it's worth pointing out that Sainte-Laguë/Webster can in some circumstances fail the lower quota rule, so presumably fails this criterion. See example on Warren Smith's site. And I would generally consider this to be a fair system.

Toby Pereira

@marylander said in Rule X extended to score ballots:

On the other hand, RRV and STV choose winner sets where all voters are strictly worse off than under the SSS winner set, so if we make the assumption* that the sum of the scores can be used to determine which overall committee the voter would approve, then could be interpreted as quite a bad example for RRV and STV.

* I'm not calling it an unreasonable assumption, but it is an assumption and so I'm stating it. Perhaps we could test it with surveys, although in my opinion the meaning of scores depends on the voting system to some extent, so it might not be easy.

Even aside from scores, and looking at full approvals, there are scenarios where a "Pareto dominated" result is arguably better. See the archive here.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

Voters (generally) have a better sense of a preference ranking than they do actual utility distributions over candidates

Agreed but they do have some sense and that helps. This is the same as adding noise the the system and it will largely average out

Voters' utilities (whether they acknowledge it or not) tend to decay somewhat geometrically over their preferences

I do not agree. This is what people say about money but not candidates

Even given the above two assumptions, voters nonetheless tend to report utilities in more linear way over their preferences

I think that even if it is flawed it would be good to be able to make this as an honest recommendation for how to vote.

@toby-pereira said in Rule X extended to score ballots:

Also, on PJR, it's worth pointing out that Sainte-Laguë/Webster can in some circumstances fail the lower quota rule, so presumably fails this criterion

True! And moreover, anything satisfying priceability (which is a very intuitive criterion to me, implies PJR) must be an extension of D'Hondt.

The example you have given for perfect representation is an example of how it is incompatible with Pareto efficiency. I definitely agree this is a mark against it.

I think we are really on the same page: we both agree that PJR is desirable and weak, just I view failures of PJR as more damning than you do. FWIW, I also prefer D'Hondt to St. Lague

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

anything satisfying priceability (which is a very intuitive criterion to me, implies PJR)

Can you (or somebody) make a priceability electowiki page?

@keith-edmonds

I wanted to see how these committees fared in terms of
a) the Maximin Support objective, equivalent to the max-Phragmen objective
b) stable priceability, which implies core

I transformed scores to approvals by having each voter choose a uniform random approval threshold in (0,1). It took me a little bit to formulate the linear programs, so there may be bugs, but my calculations give, averaged over 1000 trials,

AAABC: maximin support = 1 (quota), stable priceable probability = 0.4
ABBBB: maximin support = 0.75 (quota), stable priceable probability = 0
AAABB: maximin support = 0.84 (quota), stable priceable probability = 0
AAACC: maximin support = 0.96 (quota), stable priceable probability = 0.25
AABBC: maximin support = 1 (quota), stable priceable probability = 0.46

Important note: these are the values for the committees not for the selection rules that originally found them. The example as given is really a very edge case with tight numbers, so the randomness in approval thresholds helps smooth that out a little. I expect the committees these rules return would change a lot with a little noise added.

BTW, I think this might illustrate my heuristic objection to defining stability (and preferences over sets) as just pure linear utility. With the literal linear utility interpretation, ABBBB is stable and blocks both AABBC and AAACC. However, if we interpret the score 3/5 as a 60% probability of approval, suddenly ABBBB is never stable, and AABBC & AAACC both have non-trivial stability probability. Furthermore, if either (or both) 1) that score of 3 is lowered slightly to like 2.8 or 2) some small fraction of voters choose to bullet vote, all of a sudden ABBBB does not look particularly good either from a stability standpoint or utility.

Toby Pereira

@marylander said in Rule X extended to score ballots:

On the other hand, RRV and STV choose winner sets where all voters are strictly worse off than under the SSS winner set,

It's weird that RRV has done that since its mechanism is just to maximise the "satisfaction" score for each voter. I presume then that this is to do with electing sequentially rather than something fundamental to RRV itself. And I would also presume that electing sequentially can throw out weird anomalies for any voting method, and I don't see any particular reason why any method should be more susceptible than any other method to this.

As an aside, regardless of what one thinks of Thiele methods in general, I do not consider RRV to be a good implementation of it.