Rule X extended to score ballots

@keith-edmonds Purely qualitatively, I think given this profile, Red needs to win more than Blue since it is equally cohesive and 50% larger. Again qualitatively, I think a cohesive group representing 40% of the population should get at least one of their top winners.

This leaves AAABC (from MES) and AAACC (from STV) as what I would identify as the 'best' performance on this particular ballot profile. However, this one is a bit of an edge case and the numbers work out very closely. I always feel uneasy about making too large conclusions from single examples, since any method can be made to look bad.

The results will also look extra weird because the ratios of supports do not play well with 5 winners for this specific scenario. With 4 winners I think AABC will look very reasonable, and with 6 winners AAABCC or AABBBC is right.

In particular, if the Red voters are strategic really at all, it looks like they can squeeze out more winners in RRV and SSS by burying B. I am interested in using the definition of "balanced stable priceability" here https://www.cs.toronto.edu/~nisarg/papers/priceability.pdf to measure stability---I am not sure how or if it relates to the definition based on blocking sets.

Marylander

@keith-edmonds said in Rule X extended to score ballots:

Interestingly RRV and STV both do very well even though they were eliminated by being unstable. SSS does well too but MES and Allocated Score do not.

On the other hand, RRV and STV choose winner sets where all voters are strictly worse off than under the SSS winner set, so if we make the assumption* that the sum of the scores can be used to determine which overall committee the voter would approve, then could be interpreted as quite a bad example for RRV and STV.

* I'm not calling it an unreasonable assumption, but it is an assumption and so I'm stating it. Perhaps we could test it with surveys, although in my opinion the meaning of scores depends on the voting system to some extent, so it might not be easy.

Edit:

@andy-dienes said in Rule X extended to score ballots:

@keith-edmonds Purely qualitatively, I think given this profile, Red needs to win more than Blue since it is equally cohesive and 50% larger. Again qualitatively, I think a cohesive group representing 40% of the population should get at least one of their top winners.

I started writing this post before you replied @Andy-Dienes (so I posted before I could comment on what you said), and I just want to observe that this objection I think relates to my discussion about about whether we can assume that "greater sum score = better winner set" at the level of the individual.

@marylander said in Rule X extended to score ballots:

the meaning of scores depends on the voting system to some extent

1million% agree.

Imagine voting rule F, which is just pure sum-of-score
winner(F) = maximize sum of score(voter, candidate)

now imagine voting rule G, which is sum of squared score
winner(G) = maximize sum of score(voter, candidate)^2

Then almost all voters will change their ballots depending on the rule being used such that the outcome is the same, i.e. square all their scores. If we just measure linear utility of both, we will get different results, even though the strategic dynamics are pretty much the same and the winner is the same.

Also, just to wax philosophical & speculative for a moment (and then I promise I will return to quantitative thoughts) I think we all agree that allowing for compromise & centrist candidates is a good thing. However, it seems that if a voting rule has too much of a centrist bias then voters will exaggerate their preferences to compensate. When voters exaggerate their preferences, it seems like they might start to believe those preferences, and then ironically a more centrist voting rule creates a more polarized electorate. This would be taken to the extreme in something like Block Score, which is the maximally-centrist method but would very plausibly lead to divisiveness in an electorate.

Of course, the above is just a hypothesis; I won't claim to be able to understand or predict long-term societal dynamics. I will say, I think all else held equal, a proportional representation method should attempt to simply replicate the distribution of voters. Any bias towards centrism must be minimal enough to not create an unstable outcome. In the case of a method on that profile choosing AAABB or ABBBB, I would certainly feel 'cheated' if I were a Blue voter, and in subsequent elections I would probably be less willing to compromise.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

In the case of a method on that profile choosing AAABB or ABBBB, I would certainly feel 'cheated' if I were a Blue voter, and in subsequent elections I would probably be less willing to compromise.

This contradicts you prior statement about how a voter will adjust their scoring to the system.
Suppose the the voter is using a mental model to map utility, u, to score, s.

s = S(u)

What we want is for this function to obey Cauchy's functional equation such that.

S(u1 + u2) = S(u1) + S(u2)

This mental model is derived from how the system treats scores. In SSS and MES the scores are treated linearly so that such a model arises naturally. In RRV and Allocated Score this is not the case so they will have to adjust the S(u) function to compensate. Even though it is not clear how.

Having this additive property is nice since it means that if you like a candidate have as much as another then you should score them half as much. Simplicity is important.

Consider the Blue group comparing ['A1', 'B1', 'B2', 'B3', 'B4'] to ['A1', 'A2', 'A3', 'C1', 'C2']

The expressed the scores B=3 and C = 5

B + B + B + B =12
C + C = 10

B + B + B + B > C + C
S(uB) + S(uB) +S(uB) +S(uB) > S(uC) + S(uC)
S(uB +uB + uB + uB) > S(uC + uC)
uB +uB + uB + uB > uC + uC

Which proves they are happier with 4 Bs than 2 Cs. If they are not happier then they did not use that mental model to map utility. In this sense, SSS and MES can punish strategic voting.

@keith-edmonds said in Rule X extended to score ballots:

This contradicts you prior statement about how a voter will adjust their scoring to the system.

Hm, fair enough.

I think my mental model of voter preferences is something like:

Voters (generally) have a better sense of a preference ranking than they do actual utility distributions over candidates
Voters' utilities (whether they acknowledge it or not) tend to decay somewhat geometrically over their preferences
Even given the above two assumptions, voters nonetheless tend to report utilities in more linear way over their preferences

This third point led me to tacitly apply some superlinear transformation to the utilities when I am thinking about who should win qualitatively, but I see what you are saying that if voters wanted this superlinear transformation interpretation they would have just voted that way. It still would feel weird to me for C not to get even 1 winner, but I will try to quantify that in a more robust way.

Toby Pereira

@andy-dienes said in Rule X extended to score ballots:

@toby-pereira I think this example actually shows my point. In this case, (with very high probability if the approval sets are truly uncorrelated) any two of the winners will satisfy EJR (and thus also PJR/JR), so it is not restrictive at all.

My point wasn't that it was restrictive, but that it seemed a bit weak, making requirements about just one voter (but you acknowledged that later in your post).

Unless I'm misunderstanding the setup, I'm not sure why you are saying the two 51% will necessarily be elected (although, in this case it does seem like the 'right' choice).

Well, the two 51% candidates should be elected under any reasonable method given the lack of any correlation. But in any case, the point is that it shows that approximately 1/8 of the electorate will be unrepresented despite being part of a Hare quota.

Edit: I should probably mention there is another intuitive criterion, perfect representation. This is when the voters can be exactly divided into quotas such that each quota gets a unanimous winner. Obviously, this is also not always possible, but more importantly it is incompatible with EJR. This is one reason maybe it's reasonable to consider EJR 'clumsy.' However, it is compatible with PJR. It seems to me that PJR is weak enough such that any noncompliance is likely indicative of deeper problems. In particular, optimization of the maxPhragmen metric implies PJR. Your 'squared load' metric I believe is equivalent to the varPhragmen objective function, which implies JR.

There are cases where it is arguably undesirable to have perfect representation. I added an example to the wiki page. But to copy and paste:

Consider the following election with two winners, where A, B, C and D are candidates, and the number of voters approving each candidate are as follows:

100 voters: A, B, C

100 voters: A, B, D

1 voter: C

1 voter: D

A method passing the perfect representation criterion must elect candidates C and D despite near universal support for candidates A and B. This could be seen as an argument against perfect representation as a useful criterion.

Also, on PJR, it's worth pointing out that Sainte-Laguë/Webster can in some circumstances fail the lower quota rule, so presumably fails this criterion. See example on Warren Smith's site. And I would generally consider this to be a fair system.

Toby Pereira

@marylander said in Rule X extended to score ballots:

On the other hand, RRV and STV choose winner sets where all voters are strictly worse off than under the SSS winner set, so if we make the assumption* that the sum of the scores can be used to determine which overall committee the voter would approve, then could be interpreted as quite a bad example for RRV and STV.

* I'm not calling it an unreasonable assumption, but it is an assumption and so I'm stating it. Perhaps we could test it with surveys, although in my opinion the meaning of scores depends on the voting system to some extent, so it might not be easy.

Even aside from scores, and looking at full approvals, there are scenarios where a "Pareto dominated" result is arguably better. See the archive here.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

Voters (generally) have a better sense of a preference ranking than they do actual utility distributions over candidates

Agreed but they do have some sense and that helps. This is the same as adding noise the the system and it will largely average out

Voters' utilities (whether they acknowledge it or not) tend to decay somewhat geometrically over their preferences

I do not agree. This is what people say about money but not candidates

Even given the above two assumptions, voters nonetheless tend to report utilities in more linear way over their preferences

I think that even if it is flawed it would be good to be able to make this as an honest recommendation for how to vote.

@toby-pereira said in Rule X extended to score ballots:

Also, on PJR, it's worth pointing out that Sainte-Laguë/Webster can in some circumstances fail the lower quota rule, so presumably fails this criterion

True! And moreover, anything satisfying priceability (which is a very intuitive criterion to me, implies PJR) must be an extension of D'Hondt.

The example you have given for perfect representation is an example of how it is incompatible with Pareto efficiency. I definitely agree this is a mark against it.

I think we are really on the same page: we both agree that PJR is desirable and weak, just I view failures of PJR as more damning than you do. FWIW, I also prefer D'Hondt to St. Lague

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

anything satisfying priceability (which is a very intuitive criterion to me, implies PJR)

Can you (or somebody) make a priceability electowiki page?

@keith-edmonds

I wanted to see how these committees fared in terms of
a) the Maximin Support objective, equivalent to the max-Phragmen objective
b) stable priceability, which implies core

I transformed scores to approvals by having each voter choose a uniform random approval threshold in (0,1). It took me a little bit to formulate the linear programs, so there may be bugs, but my calculations give, averaged over 1000 trials,

AAABC: maximin support = 1 (quota), stable priceable probability = 0.4
ABBBB: maximin support = 0.75 (quota), stable priceable probability = 0
AAABB: maximin support = 0.84 (quota), stable priceable probability = 0
AAACC: maximin support = 0.96 (quota), stable priceable probability = 0.25
AABBC: maximin support = 1 (quota), stable priceable probability = 0.46

Important note: these are the values for the committees not for the selection rules that originally found them. The example as given is really a very edge case with tight numbers, so the randomness in approval thresholds helps smooth that out a little. I expect the committees these rules return would change a lot with a little noise added.

BTW, I think this might illustrate my heuristic objection to defining stability (and preferences over sets) as just pure linear utility. With the literal linear utility interpretation, ABBBB is stable and blocks both AABBC and AAACC. However, if we interpret the score 3/5 as a 60% probability of approval, suddenly ABBBB is never stable, and AABBC & AAACC both have non-trivial stability probability. Furthermore, if either (or both) 1) that score of 3 is lowered slightly to like 2.8 or 2) some small fraction of voters choose to bullet vote, all of a sudden ABBBB does not look particularly good either from a stability standpoint or utility.

Toby Pereira

@marylander said in Rule X extended to score ballots:

On the other hand, RRV and STV choose winner sets where all voters are strictly worse off than under the SSS winner set,

It's weird that RRV has done that since its mechanism is just to maximise the "satisfaction" score for each voter. I presume then that this is to do with electing sequentially rather than something fundamental to RRV itself. And I would also presume that electing sequentially can throw out weird anomalies for any voting method, and I don't see any particular reason why any method should be more susceptible than any other method to this.

As an aside, regardless of what one thinks of Thiele methods in general, I do not consider RRV to be a good implementation of it.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

I wanted to see how these committees fared in terms of
a) the Maximin Support objective, equivalent to the max-Phragmen objective
b) stable priceability, which implies core

Do these need to be defined in terms of Approval? Can you give the formula you used for clarity?

@andy-dienes said in Rule X extended to score ballots:

The example as given is really a very edge case with tight numbers

As intended. It is only going to be such cases where they differ in results. @BTernaryTau Can you make a Ternary plot for MES. It would be interesting to see the differences between. SSS, MES, Allocated Score, RRV and STV.

@andy-dienes said in Rule X extended to score ballots:

I expect the committees these rules return would change a lot with a little noise added.

This was how I started the simulations from last time. I simulated the supporters as gaussian blobs that I put in the 2D plane. The default example in vote_sim.py is somewhat similar to this example.

@toby-pereira said in Rule X extended to score ballots:

It's weird that RRV has done that since its mechanism is just to maximise the "satisfaction" score for each voter.

Simulation have show that RRV gets higher total utility more often. I suspect this is just a weird example for RRV.

@toby-pereira said in Rule X extended to score ballots:

I presume then that this is to do with electing sequentially rather than something fundamental to RRV itself. And I would also presume that electing sequentially can throw out weird anomalies for any voting method, and I don't see any particular reason why any method should be more susceptible than any other method to this.

I would not say that. Systems like SSS and MES are designed to be sequential. That may be the issue with RRV but SSS and MES do not have the same excuse. DSV is the sequential implementation of Thiele for score. Or at least I designed it to be a close to SPAV as I could.

Perhaps an Optimal (non sequential) variant of MES could be made where the rho is the same for all winners. This should minimize free riding like with SSQ but do it more cleanly

@toby-pereira said in Rule X extended to score ballots:

As an aside, regardless of what one thinks of Thiele methods in general, I do not consider RRV to be a good implementation of it.

You prefer Sequential Proportional Score Voting, correct?

@keith-edmonds

As far as I can tell, there is no particularly obvious way to extend the definitions to score while maintaining poly-time computability (at least, not that I can theoretically motivate). That is why I chose to transform to approval with random thresholds.

I am computing maximin support as described in Theorem 2 of this paper https://arxiv.org/pdf/1609.05370.pdf

And stable priceability as defined in section 3 of this paper https://www.cs.toronto.edu/~nisarg/papers/priceability.pdf

also

Perhaps an Optimal (non sequential) variant of MES could be made where the rho is the same for all winners. This should minimize free riding like with SSQ but do it more cleanly

I would be surprised if this is always possible, but it's an interesting idea. I am assuming you mean to still choose the winners sequentially, given some uniform rho?

edit: at the very least, it will be possible when every candidate has at least one bullet voter... you can set rho to be very high 1/(minimum over all scores awarded). Then this is the greedy chamberlin-courant rule where a score > 0 is interpreted as an approval. It might still have the issue where not every winner gets a full quota.

Keith Edmonds

@andy-dienes said in Rule X extended to score ballots:

As far as I can tell, there is no particularly obvious way to extend the definitions to score while maintaining poly-time computability

Perhaps the Kotze-Pereira transformation

Toby Pereira

@keith-edmonds said in Rule X extended to score ballots:

I would not say that. Systems like SSS and MES are designed to be sequential. That may be the issue with RRV but SSS and MES do not have the same excuse. DSV is the sequential implementation of Thiele for score. Or at least I designed it to be a close to SPAV as I could.

When you gave the results for your example in the post a few above you said RRV, but do you mean that you actually used SDV? Edit - In any case I don't think RRV is the method worth calculating results for.

@toby-pereira said in Rule X extended to score ballots:

As an aside, regardless of what one thinks of Thiele methods in general, I do not consider RRV to be a good implementation of it.

You prefer Sequential Proportional Score Voting, correct?

Yeah, that's the one that's SPAV + KP, right? I think that's still my preferred Thiele-based option. But either that or SDV are likely superior to RRV.

Keith Edmonds

@toby-pereira said in Rule X extended to score ballots:

When you gave the results for your example in the post a few above you said RRV, but do you mean that you actually used SDV? Edit - In any case I don't think RRV is the method worth calculating results for.

It was RRV. The only good thing about RRV is that it is simple to calculate. It was included to be illustrative. I think STV is garbage but I included that too as a reference point. I think DSV is the best Thiele system but I do not like Thiele systems so I never bothered to code it.

@toby-pereira said in Rule X extended to score ballots:

Yeah, that's the one that's SPAV + KP, right? I think that's still my preferred Thiele-based option.

Yes that is what it is called on this page. https://electowiki.org/wiki/Kotze-Pereira_transformation

There is no proper page for it. If you want to advocate for it you should make one

Marylander

@keith-edmonds Also if we don't include RRV, people who learn about some of this stuff through scorevoting.net might read the report and ask, "where's RRV?"

If we didn't include STV, we'd have the same problem, just with different entry points.

@marylander said in Rule X extended to score ballots:

through scorevoting.net

That url makes me shudder ...

Just to add---I also now computed the var-Phragmen objective for each of those committees. It is a bit slower since it is a nonlinear objective so I only did 400 trials not 1000.

but the results are:

AAABC: 0.00018
ABBBB: 0.247
AAABB: 0.186
AAACC: 0.00042
AABBC: 0.00011

For context, the var-Phragmen objective tries to minimize the variance of the 'voter load.' It is equivalent to Sainte-Lague when voters vote along party-lists. There is a stripped-down (much easier to compute but less useful information) version of this metric sometimes referred to as 'Ebert cost.'